Conference on Computing in High Energy and Nuclear Physics

Name: Conference on Computing in High Energy and Nuclear Physics
Start: 2024-10-19T08:00:00+02:00
End: 2024-10-25T18:30:00+02:00
Location: No location set

19–25 Oct 2024

Europe/Zurich timezone

Contact Program Chairs

chep2024-pc@cern.ch

Improving overall GPU sharing and usage efficiency with Kubernetes

23 Oct 2024, 14:42

18m

Room 2.A (Seminar Room)

Talk Track 7 - Computing Infrastructure Parallel (Track 7)

Diana Gaponcic (IT-PW-PI)

GPUs and accelerators are changing traditional High Energy Physics (HEP) deployments while also being the key to enable efficient machine learning. The challenge remains to improve overall efficiency and sharing opportunities of what are currently expensive and scarce resources.

In this paper we describe the common patterns of GPU usage in HEP, including spiky requirements with low overall usage for interactive access, as well as more predictable but potentially bursty workloads including distributed machine learning. We then explore the multiple mechanisms to share and partition GPUs, covering time slicing, virtualization, physical partitioning (MIG) and MPS for Nvidia devices.

We conclude with the results of an extensive set of benchmarks for multiple representative HEP use cases, including traditional GPU usage as well as machine learning. We highlight the limitations of each option and the use cases where they fit best. Finally, we cover the deployment aspects and the different options available targeting a centralized GPU pool that can significantly push the overall GPU usage efficiency.

Dejan Golubovic Diana Gaponcic (IT-PW-PI) Mr Diogo Filipe Tomas Guerra (CERN) Ricardo Rocha (CERN)

CHEP 2024 - final.pdf

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Improving overall GPU sharing and usage efficiency with Kubernetes

Room 2.A (Seminar Room)

Speaker

Description

Authors

Presentation materials

Choose timezone

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Speaker

Description

Authors

Presentation materials