February 29, 2016 to March 4, 2016
ALBA Synchrotron
Europe/Zurich timezone

Effective HTCondor-based monitoring system

Mar 1, 2016, 4:30 PM
30m
ALBA Synchrotron

ALBA Synchrotron

Carrer de la Llum 2-26 08290 Cerdanyola del Vallès, Barcelona, Spain Tel: +34 93 592 43 00 Coordinates: +41°29'12.38", +2°6'35.74" (41.486773, 2.109929)
HTCondor presentations and tutorials HTCondor presentations and tutorials

Speaker

Justas Balcas (California Institute of Technology (US))

Description

The CMS experiment at LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based grid provisioning systems. Given the scale of the global queue in CMS, operators found it increasingly difficult to monitor the pool to find issues and fix them. Operators had to rely on several different webpages, with several different levels of information and sifting tirelessly through logs in order to monitor the pool completely. Therefore, coming up with a suitable monitoring system was one of the crucial items before the beginning of Run 2 to ensure early detection of issues and to give a good overview of the whole pool. Our new monitoring page (cms-gwmsmon.cern.ch) utilizes the condor classAd mechanism to provide a complete picture of the whole submission infrastructure in CMS. Monitoring page includes useful information from HTCondor schedulers, central manager, glideinWMS frontend and factory. It also incorporates information about users and tasks making it easy for operators to provide support and debug issues.

Primary author

Justas Balcas (California Institute of Technology (US))

Co-authors

Brian Paul Bockelman (University of Nebraska (US)) Farrukh Aftab Khan (National Centre for Physics (PK)) James Letts (Univ. of California San Diego (US))

Presentation materials