Speaker
Mr
david collados
(CERN)
Description
Authors: David Collados, Judit Novak, John Shade, Konstantin Skaburskas, Lapka Wojciech
It is four years now since the first prototypes of tools and tests started to monitor the Worldwide LHC Computing Grid (WLCG) services. One of these tools is the Service
Availability Monitoring (SAM) framework, which superseded the SFT tool, and has become a keystone for the monthly WLCG availability and reliability computations.
During this time, the grid has evolved into a robust, production-level infrastructure, in no small part thanks to the extensive monitoring infrastructure which includes testing,
visualization and reporting. Experience gained with monitoring has led to emerging grid monitoring standards, and provided valuable input for the Operations Automation
Strategy aimed at the regionalization of monitoring services. This change in scope, together with an ever-increasing number of services and infrastructures, make
enhancements in the architecture of existing monitoring tools a necessity. This paper describes the present architecture of SAM, an enhanced and distributed model for
monitoring WLCG services, and the required changes in SAM to adopt this new model inside the EGEE-III project.
Primary author
Mr
david collados
(CERN)