Reaching new peaks for the future of the CMS HTCondor Global Pool

19 May 2021, 11:29
13m
Short Talk Distributed Computing, Data Management and Facilities Facilities and Networks

Speaker

Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéti cas Medioambientales y Tecno)

Description

The CMS experiment at CERN employs a distributed computing infrastructure to satisfy its data processing and simulation needs. The CMS Submission Infrastructure team manages a dynamic HTCondor pool, aggregating mainly Grid clusters worldwide, but also HPC, Cloud and opportunistic resources. This CMS Global Pool, which currently involves over 70 computing sites worldwide and peaks at 300k CPU cores, is capable of successfully handling the simultaneous execution of up to 150k tasks. While the present infrastructure is sufficient to harness the current computing power scales, CMS latest estimates predict that at least a four-time increase in the total amount of CPU will be required in order to cope with the massive data increase of the High-Luminosity LHC (HL-LHC) era, planned to start in 2027. This contribution presents the latest results of the CMS Submission Infrastructure team in exploring the scalability reach of our Global Pool, in order to preventively detect and overcome any barriers in relation to the HL-LHC goals, while maintaining high efficiency in our workload scheduling and resource utilization.

Primary author

Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéti cas Medioambientales y Tecno)

Co-authors

Maria Acosta Flechas (Fermi National Accelerator Lab. (US)) Jeffrey Michael Dost (Univ. of California San Diego (US)) Saqib Haleem (National Centre for Physics (PK)) Kenyi Paolo Hurtado Anampa (University of Notre Dame (US)) Farrukh Aftab Khan (Fermi National Accelerator Lab. (US)) Edita Kizinevic (CERN) Nicholas Peregonow (Fermi National Accelerator Lab. (US)) Marco Mascheroni (Univ. of California San Diego (US))

Presentation materials

Proceedings

Paper