The Oak Ridge Leadership Computing Facility (OLCF) has been the leading driver of the advancement of high performance computing from petascale into the exascale era. In 2009, Jaguar achieved 2.3 PetaFlop performance. Last month OLCF's newest machine, Frontier, entered the exaflop era by achieving a full machine measured peak performance of 1.1 ExaFlop. The fundamental architectural change that has enabled this advance has been the increasing incorporation of accelerator hardware in the form of ever more powerful GPUs. In addition to enabling high performance, accelerators are necessary to meet the power consumption requirements of these systems. For example, Jaguar required 7MW of power whereas Frontier runs at 23 MW, which means a 160 times improvement in performance was attained with only a factor of 3.2 increase in power.
The introduction of GPUs, while providing impressive performance, has created challenges in widespread adoption of existing computational physics tools and codes. To address this challenge, the US Department of Energy (DOE) initiated the Exascale Computing Project (ECP) in 2016. The objective of the ECP is to prepare DOE computational projects, and the broader computational science community in general, for the deployment on exascale computers in 2022-2023. The ECP has prepared 21 open science applications to run efficiently on Frontier along with a complete ecosystem of supporting software technologies including solver libraries, visualization, compilers, performance tools, and more.
The combination of computational technologies enabled through the ECP and the systems provided by the OLCF provides an exciting opportunity for high energy physics modeling and simulation. Two current projects, Celeritas led out of Oak Ridge National Laboratory, and AdePT led by CERN, are investigating methodologies to enable GPU-accelerated Monte Carlo particle transport for detector simulation at the LHC. Celeritas is also closely aligned with the ECP in attempting to fully scale Monte Carlo calculations on machines at the OLCF.
In this presentation we will give a brief overview of the development of exascale computing at the OLCF and the organization and structure of the ECP. We will discuss some of the many challenges encountered preparing a diverse set of computational science applications for use on exascale architectures focusing on latest results on the AMD-based Frontier system. These include performance portability, programming model maturity, algorithmic developments, inter-node network performance, and debugging and profiling on heterogeneous architectures. Looking ahead, we will discuss collaborative pathways by which computing resources at the OLCF, particularly Frontier, can be brought to bear on challenging computing problems that will be encountered as the LHC undergoes the high luminosity upgrade in 2027.
Witold Pokorski, EP Department