21-25 May 2012
New York City, NY, USA
US/Eastern timezone

Service monitoring in the LHC experiments

22 May 2012, 13:30
4h 45m
Rosenthal Pavilion (10th floor) (Kimmel Center)

Rosenthal Pavilion (10th floor)

Kimmel Center

Poster Distributed Processing and Analysis on Grids and Clouds (track 3) Poster Session

Speakers

Alessandro Di Girolamo (CERN) Fernando Harald Barreiro Megino (CERN IT ES)

Description

The LHC experiments' computing infrastructure is hosted in a distributed way across different computing centers in the Worldwide LHC Computing Grid and needs to run with high reliability. It is therefore crucial to offer a unified view to shifters, who generally are not experts in the services, and give them the ability to follow the status of resources and the health of critical systems in order to alert the experts whenever a system becomes unavailable. Several experiments have chosen to build their service monitoring on top of the flexible Service Level Status (SLS) framework developed by CERN IT. Based on examples from ATLAS, CMS and LHCb, this contribution will describe the complete development process of a service monitoring instance and explain the deployment models that can be adopted. We will also describe the software package used in ATLAS Distributed Computing to send health reports through the MSG messaging system and publish them to SLS on a lightweight web server.

Primary authors

Alessandro Di Girolamo (CERN) Diego Da Silva Gomes (Universidade do Estado do Rio de Janeiro (BR)) Fernando Harald Barreiro Megino (CERN IT ES) José Flix Peter Kreuzer (Rheinisch-Westfaelische Tech. Hoch. (DE)) Dr Stefan Roiser (CERN) Vincent Roger Yvan Bernardoff (Univ. P. et Marie Curie (Paris VI) (FR))

Presentation Materials

There are no materials yet.