Strong scaling RHMC on NVIDIA GPUs

28 Jul 2021, 13:15
15m
Oral presentation Software development and Machines Software development and Machines

Speaker

Mathias Wagner (NVIDIA)

Description

The ability to strong scale is crucial for Lattice QCD simulations. Therefore Lattice QCD has been constantly craving for higher network and memory bandwidths. While never enough well-balanced systems with favorable GPU-to-network ratios are available, e.g. with the Juelich Booster. However, API overheads and necessary synchronizations between GPU and CPU have become prohibitively expensive, not keeping up with generational improvements of GPUs and networks. This limits the ability to strong scale with MPI communication. A shift towards fine-grained GPU-centric communication provides a way out as it completely removes these bottlenecks by moving the communication to the GPU kernels. Since version 1.1 QUDA implements GPU-centric communication for NVIDIA GPUs using NVSHMEM. We will show low-level Dslash results as well as full RHMC scaling results on modern GPU systems like Selene and the Juelich Booster and discuss further expansions of this approach to even more latency-limited algorithms as Multigrid.

Primary authors

Mathias Wagner (NVIDIA) Kate Clark (NVIDIA) Jiqun Tu (NVIDIA Corporation) Evan Weinberg (NVIDIA Corporation)

Presentation materials