Quantitative and Comparative Analysis of Manycore HPC Nodes
7 Keble Road, OX1 3QG
One of the main challenges that HPC community has taken is building a computing system that is capable of at least a billion billion (EXA) floating point calculations per second (EXAFLOP).
Contemporary cores in HPC systems are able to perform around 16 FLOPS per cycle thanks to fused multiply add instructions. Therefore a contemporary single socket that has 12 cores that are operating at 2.6 GHz would have a performance of 500 GFLOPS. In order to build an HPC system that will have a performance of 1 EFLOPS we will need 2 million sockets or 24 million cores. Initially DARPA suggested that future exascale HPC systems should ideally have power consumption around 20MW. If we assume that 60% of that power budget will be spent on cooling, storage, interconnects and memory this leaves us with power budget of 8MW for all the CPU cores that equates to 0.33W per core for a future exascale machine.
This talk will introduce a system, AsianCat, that has a dual-socket 48-core Cavium ThunderX System-On-Chip (SoC) that implements ARMv8 architecture. In addition to being built using low-energy cores the AsianCat also integrates memory, storage and interconnects in the same node in order to minimise glue logic and further save power. It will describe in-depth study of AsianCat power dissipation under load by using a novel lightweight non-intrusive power measurement and how the results were used to categorise the workloads into groups and qualitatively show if there is any correlation between power consumption and cores utilisation. The talk will conclude with evaluation of memory subsystem efficacy of ARMv8 microarchitectural implementation in the form of ThunderX processor and x86 Broadwell processor that will present what type of workloads are suitable for which microarchitecture implementations.
About the speaker
Milos Puzovic is a Research Scientist at the Hartree Centre. His main field of research is the optimisation of software for performance and power consumption through hardware, operating system, compiler and run-time system co-design. In addition to work on software optimisation he is also working in the area of the full-system architectural simulation in order to study in-depth microarchitectural design space trace-offs. Milos has completed his PhD at the University of Cambridge on the subject of hardware and software co-design for dynamic multicore scheduling.