Caches all the way down: Infrastructure for Data Science
Oxford e-Research Centre, 7 Keble Road, Oxford
We are pleased to welcome Professor David Abramson, a Visiting Professor at the Centre. David has been involved in computer architecture and high performance computing research since 1979 and is currently Director of the Research Computing Centre at the University of Queensland. He will present a seminar entitled "Caches all the way down: Infrastructure for Data Science".
The rise of big data science has created new demands for modern computer systems. While floating performance has driven computer architecture and system design for the past few decades, there is renewed interest in the speed at which data can be ingested and processed. Early exemplars such as Gordon, the NSF funded system at the San Diego Supercomputing Centre, shifted the focus from pure floating point performance to memory and IO rates. At the University of Queensland they have continued this trend with the design of FlashLite, a parallel cluster equiped with large amounts of main memory, Flash disk, and a distributed shared memory system (ScaleMP’s vSMP). This allows applications to place data “close” to the processor, enhancing processing speeds. Further, they have built a geographically distributed multi-tier hierarchical data fabric called MeDiCI, which provides an abstraction of very large data stores across the metropolitan area. MeDiCI leverages industry solutions such as IBM’s Spectrum Scale and SGI’s DMF platforms.
Caching underpins both FlashLite and MeDiCI. In this talk Professor Abramson will describe the design decisions and illustrate some early application studies that benefit from the approach.