Speakers
Alessandro Di Girolamo
(CERN)Dr
Andrea Sciaba
(CERN)
Description
Since several years the LHC experiments rely on the WLCG Service Availability Monitoring framework (SAM) to run functional tests on their distributed computing systems. The SAM tests have become an essential tool to measure the reliability of the Grid infrastructure and to ensure reliable computing operations, both for the sites and the experiments.
Recently the old SAM framework was replaced with a completely new system based on Nagios and ActiveMQ to better support the transition to EGI and to its more distributed infrastructure support model and to implement several scalability and functionality enhancements.
This required all LHC experiments and the WLCG support teams to migrate their tests, to acquire expertise on the new system, to validate the new availability and reliability computations and to adopt new visualisation tools.
In this contribution we describe in detail the current state of the art of functional testing in WLCG: how the experiments use the new SAM/Nagios framework, the advanced functionality made available by the new framework and the future developments that are foreseen, with a strong focus on the improvements in terms of stability and flexibility brought by the new system.
Author
Dr
Andrea Sciaba
(CERN)
Co-authors
Akshat Kakkar
(Bhabha Atomic Research Centre (BARC))
Alessandro Di Girolamo
(CERN)
Amol Wakankar
(Bhabha Atomic Research Centre (BARC))
Biswajit Sarkar
(Department of Atomic Energy (DAE))
Guidone Negri
(CERN)
Julia Andreeva
(CERN)
Maarten Litmaath
(CERN)
Maria Dolores Saiz Santos
(Conseil Europeen Recherche Nucl. (CERN))
Nicolo Magini
(CERN)
Pablo Saiz
(CERN)
Partha Dhara
(Variable Energy Cyclotron Centre, Kolkata (India))
Dr
Stefan Roiser
(CERN)
Suja Ramachandran
(Indira Gandhi Centre for Atomic Res)