23–28 Oct 2022
Villa Romanazzi Carducci, Bari, Italy
Europe/Rome timezone

Supporting multiple hardware architectures at CMS: the integration and validation of Power9

25 Oct 2022, 16:10
30m
Area Poster (Floor -1) (Villa Romanazzi)

Area Poster (Floor -1)

Villa Romanazzi

Speaker

Daniele Spiga (Universita e INFN, Perugia (IT))

Description

Computing resources in the Worldwide LHC Computing Grid (WLCG) have been based entirely on the x86 architecture for more than two decades. In the near future, however, heterogeneous non-x86 resources, such as ARM, POWER and Risc-V, will become a substantial fraction of the resources that will be provided to the LHC experiments, due to their presence in existing and planned world-class HPC installations. The CMS experiment, one of the four large detectors at the LHC, has started to prepare for this situation, with the CMS software stack (CMSSW) already compiled for multiple architectures. In order to allow for a production use, the tools for workload management and job distribution need to be extended to be able to exploit heterogeneous architectures.

Profiting from the opportunity to exploit the first sizable IBM Power9 allocation available on Marconi100 HPC system at CINECA, CMS developed all the needed modifications to the CMS workload management system. After a successful proof of concept, a full physics validation has been performed in order to bring the system in production. The experiences are of very high value, when it comes to commissioning of the similar (even larger) Summit HPC system at Oak Ridge, where CMS is also expecting a resource allocation. Moreover the compute power of those systems is being provided also via GPUs and this represents an extremely valuable opportunity to exploit the offloading capability already implemented in CMSSW.

The status of the current integration including the exploitation of the GPUs, the results of the validation as well as the future plans will be shown and discussed.

Significance

The presentation shows how CMS experiment is preparing to transparently integrate at large scale heterogeneous non-x86 resources, including the strategy for physics validation

Experiment context, if any CMS experiment

Primary authors

Christoph Wissing (Deutsches Elektronen-Synchrotron (DE)) Daniele Spiga (Universita e INFN, Perugia (IT))

Co-authors

Alan Malta Rodrigues (University of Notre Dame (US)) Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas) Dirk Hufnagel (Fermi National Accelerator Lab. (US)) Hasan Ozturk (CERN) Jordan Martins (Universidade do Estado do Rio de Janeiro (BR)) Kirill Skovpen (Ghent University (BE)) Marco Mascheroni (Univ. of California San Diego (US)) Saqib Haleem (National Centre for Physics (PK)) Todor Trendafilov Ivanov (University of Sofia - St. Kliment Ohridski (BG)) Dr Tommaso Boccali (INFN Sezione di Pisa)

Presentation materials