Speaker
Dr
Andreas Heiss
(Forschungszentrum Karlsruhe)
Description
Within the Worldwide LHC Computing Grid (WLCG), a Tier-1 centre like the German
GridKa computing facility has to provide significant CPU and storage resources as
well as several Grid services with a high level of quality. GridKa currently supports
all four LHC Experiments, Alice, Atlas, CMS and LHCb as well as four non-LHC high
energy physics experiments, and is about to significantly extend its services for
other communities within the German Grid initiative D-Grid. In order to ensure the
simultaneous usability of the resources by all VOs as well as the persistent import
of data from CERN and the distribution of data to associated Tier-2 sites, a
sophisticated monitoring model is essential.
We present the GridKa monitoring concept which is based on the Ganglia and Nagios
systems combined with additional tools to monitor Grid services and infrastructure.
Due to the complex dependencies between a high number of monitored hosts and
services, a clear and simple to use 'dashboard' showing a summarized view of the
monitoring information is an essential tool. This 'dashboard' allows for a quick
overview of the status and performance of services during the day and will be the
first source of information for a deeper problem analysis if an automatic alarm
notification is sent during nights and weekends.
Author
Dr
Andreas Heiss
(Forschungszentrum Karlsruhe)
Co-authors
Mr
Axel Jaeger
(FORSCHUNGSZENTRUM KARLSRUHE)
Mr
Bernhard Verstege
(FORSCHUNGSZENTRUM KARLSRUHE)
Mr
Bruno Hoeft
(FORSCHUNGSZENTRUM KARLSRUHE)
Dr
Holger Marten
(FORSCHUNGSZENTRUM KARLSRUHE)