20–24 Sept 2021
(teleconference)
Europe/Paris timezone

Operations and Monitoring of the CMS HTCondor pools

23 Sept 2021, 15:30
30m
(teleconference)

(teleconference)

HTCondor user presentations Workshop session

Speaker

Saqib Haleem (National Centre for Physics (PK))

Description

The CMS Submission Infrastructure team manages a set of HTCondor pools to provide the vast amount of computing resources that are required by CMS to perform tasks like data processing, simulation and analysis. A set of tools that enables automation of regular tasks and maintenance of the key components of the infrastructure has been introduced and refined over the years, allowing the successful operation of this infrastructure. In parallel, a complex monitoring system that includes status dashboards and alarms have been developed, enabling this effort to be performed with minimal human intervention. This contribution will describe our technology and implementation choices, how we monitor the performance of our pools in diverse critical dimensions, and how we react to the alarms and thresholds we have configured.

Speaker release Yes

Primary authors

Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas) Marco Mascheroni (Univ. of California San Diego (US)) Saqib Haleem (National Centre for Physics (PK))

Presentation materials