D9: Use tensor cores to accelerate math intensive kernels in QUDA

28 Jul 2021, 15:00
1h
Poster Software development and Machines Poster

Speaker

Jiqun Tu (NVIDIA Corporation)

Description

We will present our recent efforts on using tensor cores, which are available on NVIDIA GPUs starting from the Volta architecture, to speed up the math intensive kernels in QUDA. A light-weighted abstraction of the CUDA PTX matrix multiply-add (MMA) instruction is added in order to efficiently stage data through the different layers of GPU memory. Specifically the efforts include:

  • Use tensor cores to accelerate the 5th dimension DWF operators in the multi-splitting preconditioned conjugate gradient algorithm, utilizing the HMMA tensor core instruction;
  • Use tensor cores to accelerate the dense matrix multiplications in the set up steps in multi-grid;
  • Use tensor cores to accelerate the math intensive multi-BLAS kernels;
  • Use double precision DMMA instruction to accelerate the contraction workflow.

Primary authors

Jiqun Tu (NVIDIA Corporation) Evan Weinberg Kate Clark (NVIDIA) Mathias Wagner (NVIDIA)

Presentation materials