Indico celebrates its 20th anniversary! Check our blog post for more information!

2–6 Mar 2009
Le Ciminiere, Catania, Sicily, Italy
Europe/Rome timezone

SAM Test Results, Availability and Reliability Visualisation Portal

3 Mar 2009, 11:40
20m
Michelangelo (120) (Le Ciminiere, Catania, Sicily, Italy)

Michelangelo (120)

Le Ciminiere, Catania, Sicily, Italy

Viale Africa 95100 Catania
Oral Planned or on-going scientific work using the grid Monitoring

Speaker

Pablo Saiz (CERN)

Description

The portal's main purpose is to provide each Virtual Organisation (VO) with a customised view of the SAM (Service Availability Monitoring) test results and other metrics such as availability (as defined by Gridview). The objective here is to comply with the different needs of the VOs in terms of site naming convention and test criticality. The portal, based on the Dashboard framework, is a complementary tool to help VO administrators to locate site problems and to estimate

Keywords

Service Availability Monitoring (SAM), Site availability, Site reliability

Conclusions and Future Work

The portal offers several functionalities that the VOs appreciate. The foreseen next step is the development of a site flavour of the same portal that would provide the site administrators with a cross-VO view of their site's performance. In the longer run, SAM is looking into reusing parts of the portal for the visualisation of test results on the Regional Operation Center (ROC) level, when the new testing infrastructure is deployed.

Detailed analysis

The current version of the interface is composed of three main parts. The first part allows the VOs to view the SAM test results. The VOs are able to select the test results that are relevant for a given activity (e.g.: user analysis, data access). The second part allows VOs to define several groups of test types that they consider to be critical for a given activity or a given group of sites (e.g.: Tier1s only). The third part provides the VOs with metrics such as availability and reliability (as defined by GridView) based on the sets of tests previously defined.
This helps the VOs to identify and fix problems that can happen in a Grid infrastructure and to monitor the behaviour of sites.The portal is generic enough so that the exact same piece of code is executed for all the VOs. Therefore providing the interface for another VO is only a matter of configuration.

URL for further information

http://dashb-sam.cern.ch

Impact

In its current state, the portal already meets the requirements expressed by the VOs. It is available for the four major LHC experiments as well as the OPS VO. ATLAS, LHCb and CMS use the full panel of functionalities provided by the portal.
In order to present useful information to the VOs, an extension to the SAM database schema was needed. This extension holds information about the VO topology (tier repartition, VO-specific site naming convention). Only this part cannot be common to all the VOs and has to be updated regularly. For this purpose, a generic collector has been developed, but it is still up to the VOs to provide the information to be taken into account to maintain the topology information consistent.

Primary authors

Alessandro DI GIROLAMO (CERN) Andrea SCIABA (CERN) Benjamin GAIDIOZ (CERN) Brian BOCKELMAN (University of Nebraska) Gerhild MAIER (CERN) Julia ANDREEVA (CERN) Pablo Saiz (CERN) Ricardo BRITO DA ROCHA (CERN) Roberto SANTINELLI (CERN) Stefano BELFORTE (INFN, Sezione di Trieste Universita & INFN, Trieste) William OLLIVIER (ENS des Telecommunicat. de Bretagne-Brest-France)

Presentation materials