Speaker
James Casey
(CERN)
Description
During 2006, the Worldwide LHC Computing Grid Project (WLCG) constituted several
working groups in the area of fabric and application monitoring with the mandate of
improving the reliability and availability of the grid infrastructure through
improved monitoring of the grid fabric.
This talk will discuss the ‘Grid Service Monitoring’ Working Group. This has the aim
to evaluate the existing monitoring system and create a coherent architecture that
would let the existing system run, while increasing the quality and quantity of
monitoring information gathered.
We will describe in detail the stakeholders in this project, and focus in particular
on the needs of the site administrators, which were not well satisfied by existing
solutions.
Several standards for service metric gathering and grid monitoring data exchange, and
the place of each in the architecture will be shown.
Finally we will describe the use of a Nagios-based prototype deployment for
validation of our ideas, and the progress on turning this prototype into a
production-ready system.
Submitted on behalf of Collaboration (ex, BaBar, ATLAS) | WLCG |
---|
Author
James Casey
(CERN)
Co-author
Ian Neilson
(CERN)