23–28 Oct 2022
Villa Romanazzi Carducci, Bari, Italy
Europe/Rome timezone

Optimized GPU usage in High Energy Physics applications

26 Oct 2022, 11:00
30m
Area Poster (Floor -1) (Villa Romanazzi)

Area Poster (Floor -1)

Villa Romanazzi

Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Speaker

Tim Voigtlaender (KIT - Karlsruhe Institute of Technology (DE))

Description

Machine Learning (ML) applications, which have become quite common tools for many High Energy Physics (HEP) analyses, benefit significantly from GPU resources. GPU clusters are important to fulfill the rapidly increasing demand for GPU resources in HEP. Therefore, the Karlsruhe Institute of Technology (KIT) provides a GPU cluster for HEP accessible from the physics institute via its batch system and the Grid. As the exact hardware needs of such applications heavily depend on the ML hyperparameters, a flexible resource setup is necessary to utilize the available resources as efficient as possible. Therefore, the multi-instance GPU feature of the Nvidia A100 GPUs was studied. Several neural network training scenarios performed on the GPU cluster at KIT are discussed to illustrate possible performance gains and the setup that has been used.

Significance

The basics we use are HTCondor and the MIG feature from NVIDIA and are described in other publications. However, we provide the resources, as one of a handful of Grid sites, to the Grid. Furthermore, the resources are shared with end-users with a more complex set of resource requirements than Grid jobs. Our experience and ideas on how to use GPUs efficiently in such an environment seem unique.

Experiment context, if any CMS

Primary author

Tim Voigtlaender (KIT - Karlsruhe Institute of Technology (DE))

Co-authors

Gunter Quast (KIT - Karlsruhe Institute of Technology (DE)) Manuel Giffels (KIT - Karlsruhe Institute of Technology (DE)) Matthias Schnepf Roger Wolf (KIT - Karlsruhe Institute of Technology (DE))

Presentation materials