11–14 Feb 2008
<a href="http://www.polydome.org">Le Polydôme</a>, Clermont-Ferrand, FRANCE
Europe/Zurich timezone

Strategies for experiment-specific monitoring in the Grid

12 Feb 2008, 11:20
20m
Burgundy (<a href="http://www.polydome.org">Le Polydôme</a>, Clermont-Ferrand, FRANCE)

Burgundy

<a href="http://www.polydome.org">Le Polydôme</a>, Clermont-Ferrand, FRANCE

Oral Existing or Prospective Grid Services Monitoring, Accounting & Support

Speaker

Dr Nicolo Magini (CERN IT)

Description

The LHC experiments perform most, if not all, of their computing activities on Grid resources. This requires an accurate and updated picture of the status of the Grid services used by them, and of the services which are specific to the experiment. To achieve this, a common method is to periodically execute tests on the services, where the functionalities tested may be different from a VO to another. The SAM framework, developed for the EGEE operations, can be easily used to run and publish the results of arbitrary tests, from basic functionality tests, to high-level operations from real production activities. This contribution describes in detail how the monitoring system of each LHC experiment has taken advantage of SAM

1. Short overview

This contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations

4. Conclusions / Future plans

The necessity to commission the computing resources available to the experiments before the start of the LHC data taking in 2008 requires a constant effort to improve the quality of the monitoring information. This is why the work described here is still ongoing and we foresee an increasing usage of the SAM framework by the experiments, both by expanding the current tests, and by adding new tests for services that are not yet tested with this methodology

URL for further information:

https://lcg-sam.cern.ch:8443/sam/sam.py

3. Impact

The work covered by this contribution has largely improved the usage efficiency of Grid resources by the LHC experiments. A more accurate and prompt discovery of problems allows to fix them as soon as they appear, thus increasing the overall reliability of the Grid resources from the experiment point of view. This information also allows the experiment applications to make better decisions whenever they are given a choice of the resources to use, avoiding for example to send jobs to problematic or overloaded computing resources

Provide a set of generic keywords that define your contribution (e.g. Data Management, Workflows, High Energy Physics)

LHC, Monitoring, SAM, High Energy Physics

Primary authors

Dr Alessandro Di Girolamo (CERN IT) Dr Andrea Sciaba (CERN IT) Dr Elisa Lanciotti (CERN IT) Dr Enzo Miccio (CERN IT) Dr Nicolo Magini (CERN IT) Dr Patricia Mendez Lorenzo (CERN IT) Dr Roberto Santinelli (CERN IT) Dr Simone Campana (CERN IT)

Presentation materials