Conveners
Software development and Machines
- Peter Boyle (Brookhaven National Laboratory)
Adapting QUDA for the Exascale, and beyond, means being prepared for a diversity of architectures, parallel abstractions, and new programming paradigms. We report on the rewrite of QUDA in preparation of the 2.0 release, which will embrace all of the above while also significantly streamlining the design process of new kernels for new use cases. We do so without compromising performance, or...
The ability to strong scale is crucial for Lattice QCD simulations. Therefore Lattice QCD has been constantly craving for higher network and memory bandwidths. While never enough well-balanced systems with favorable GPU-to-network ratios are available, e.g. with the Juelich Booster. However, API overheads and necessary synchronizations between GPU and CPU have become prohibitively expensive,...
We present HotQCD's software suite for performing lattice QCD calculations on GPUs. Started in late 2017 and intended as a full replacement of the previous single GPU lattice QCD Code used by the HotQCD collaboration, our software suite has been developed into an extensive toolkit for lattice QCD calculations distributed on multiple GPUs over many compute nodes. The code is built on C++, CUDA...
We report on the progress made on the QDP-JIT library which acts as a drop-in replacement for the QDP++ library which Chroma builds upon. QDP-JIT now targets NVIDIA and AMD GPU machines, like the upcoming Frontier supercomputer, Summit or the new USQCD machine with AMD GPUs at Jefferson Lab. Our new implementation aims to add one missing feature of QDP++: performance.
We use the original...
I will describe the latest dynamical DWF ensemble generation efforts by RBC/UKQCD collaboration, focusing on 96^3x192x12, a ~ 0.07fm, 2+1 flavor ensemble with Iwasaki gauge action at physical point, running on summit machine at Oak Ridge National Laboratory. Basic properties of the ensemble as well as some details of the algorithmic improvements will be given.
Lattice QCD calculations require a relevant computational effort and most of the computer time is typically spent in the numerical inversion of the Dirac-Wilson operator. One of the simplest methods to solve large and sparse linear systems is the conjugate gradient (CG). In this work we present an implementation of the CG that can be executed on different devices, including CPUs, GPUs and...
We report an implementation of a multigrid solver on supercomputer Fugaku, which uses A64FX cpu with Arm architecture. On Fugaku, a highly optimized BiCGStab solver with domain decomposed preconditionor for Clover fermion, called QCD Wide Simd library (QWS), is available. Multigrid solvers are made from several components so that one can use a part of QWS such as Clover kernel. As the original...
We present Lyncs, a Python API for Lattice QCD currently under development. Lyncs aims to bring several widely used libraries for Lattice QCD under a common framework. Lyncs flexibly links to libraries for CPUs and GPUs in a way that can accommodate additional computing architectures as these arise, ensuring performance-portability for the calculations while maintaining the same high-level...