12-16 April 2010
Uppsala University
Europe/Stockholm timezone

Site Status Board: WLCG monitoring from the experiment perspective

Apr 12, 2010, 4:03 PM
15m
Room IX (Uppsala University)

Room IX

Uppsala University

Oral End-user environments, scientific gateways and portal technologies Infrastructure Tools and Services

Speaker

Jacobo Tarragón Cros (CERN)

Description

Now that the LHC has started, the experiments require a high standard of reliability and performance on their computing activities. Monitoring these activities is not a trivial task mainly due to two reasons: first of all, asserting the proper behavior of a site depends heavily on the software model of each experiment; secondly, the number of sites taking part in WLCG has increased drastically compared to previous HEP experiments.

Conclusions and Future Work

Production level services are being built using the Site Status Board. At the same time, the application is constantly evolving since it needs to adapt to the experiments growing needs. Future changes will focus on a performance boost for the historical data browsing, improvements on the reliability of information gathering, and extending the flexibility of the metric definitions.

Detailed analysis

The Site Status Board (SSB) web application, developed under the Dashboard Experiment framework, has been designed to provide an overall view of the sites performance from the experiment perspective. Designed originally for the LHC VOs, it allows the experiments to define a set of activities, also known as views. For each view, the experiment administrator can define the metrics that have to be collected. For instance, CMS has currently five different views ('computing shifters', 'site commissioning', 'space monitoring', ...). For the first view, the metrics include the number of running jobs, transfer status, availability of software on the site, etc.

The SSB collects the status of the metrics over time and presents it in several formats.
SSB will also include pointers describing the possible errors and solutions if this information is provided. Thanks to the SSB, the organizations can analyse site statuses easily, and at the same time, they keep track of the evolving metric results.

Impact

The SSB is being widely used by CMS and LHCb for several activities: computing shifts, site commissioning and space monitoring in the case of CMS, and job and space monitoring in LHCb. ATLAS and ALICE are also evaluating the SSB.

Keywords grid, monitoring, site, status
URL for further information http://dashb-ssb.cern.ch/ssb.html

Primary authors

Presentation materials