Strategies for experiment-specific monitoring in the Grid

Dr Nicolo Magini (CERN IT)


The LHC experiments perform most, if not all, of their computing activities on Grid resources. This requires an accurate and updated picture of the status of the Grid services used by them, and of the services which are specific to the experiment. To achieve this, a common method is to periodically execute tests on the services, where the functionalities tested may be different from a VO to another. The SAM framework, developed for the EGEE operations, can be easily used to run and publish the results of arbitrary tests, from basic functionality tests, to high-level operations from real production activities. This contribution describes in detail how the monitoring system of each LHC experiment has taken advantage of SAM

3. Impact

The work covered by this contribution has largely improved the usage efficiency of Grid resources by the LHC experiments. A more accurate and prompt discovery of problems allows to fix them as soon as they appear, thus increasing the overall reliability of the Grid resources from the experiment point of view. This information also allows the experiment applications to make better decisions whenever they are given a choice of the resources to use, avoiding for example to send jobs to problematic or overloaded computing resources

1. Short overview

This contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations

4. Conclusions / Future plans

The necessity to commission the computing resources available to the experiments before the start of the LHC data taking in 2008 requires a constant effort to improve the quality of the monitoring information. This is why the work described here is still ongoing and we foresee an increasing usage of the SAM framework by the experiments, both by expanding the current tests, and by adding new tests for services that are not yet tested with this methodology

LHC, Monitoring, SAM, High Energy Physics

Primary authors

Dr Alessandro Di Girolamo (CERN IT) Dr Andrea Sciaba (CERN IT) Dr Elisa Lanciotti (CERN IT) Dr Enzo Miccio (CERN IT) Dr Nicolo Magini (CERN IT) Dr Patricia Mendez Lorenzo (CERN IT) Dr Roberto Santinelli (CERN IT) Dr Simone Campana (CERN IT)

