Speaker
Benjamin Gaidioz
(CERN)
Description
The ATLAS production system is one of the most critical components in the experiment's distributed system, and this becomes even more true now that real data has entered the scene.
Monitoring such a system is a non trivial task, even more when two of its main characteristics are the flexibility in the submission of job processing units and the heterogeneity of the resources it uses.
In this paper we present the architecture of the monitoring system that is in production today and being used by ATLAS shifters and experts around the world as a main tool for their daily activities. We describe in detail the different sources of job execution information, the different tools aggregating system usage into a relevant set of statistics and collecting site and resource status at near real time. The description of the shifter's routine usage of the application gives a clear idea of the tight integration with the rest of both grid and experiment operations tools.
Author
Benjamin Gaidioz
(CERN)
Co-authors
Alexander Read
(University of Oslo)
Ricardo Rocha
(CERN)
Simone Campana
(CERN)
Xavi Espinal
(PIC/IFAE)