CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Integrated monitoring of the ATLAS online computing farm

11 Oct 2016, 15:30

1h 15m

San Francisco Marriott Marquis

Poster Track 7: Middleware, Monitoring and Accounting Posters A / Break

Daniel Fazio (CERN)

The online farm of the ATLAS experiment at the LHC, consisting of nearly 4000 PCs with various characteristics, provides configuration and control of the detector and performs the collection, processing, selection and conveyance of event data from the front-end electronics to mass storage.
The status and health of every host must be constantly monitored to ensure the correct and reliable operation of the whole online system. This is the first line of defense, which should not only promptly provide alerts in case of failure but, whenever possible, warn of impending issues.
The monitoring system should be able to check up to 100000 health parameters and provide alerts on a selected subset.
In this paper we present the implementation and validation of our new monitoring and alerting system based on Icinga 2 and Ganglia. We describe how the load distribution and high availability features of Icinga 2 allowed us to have a centralised but scalable system, with a configuration model that allows full flexibility while still guaranteeing complete farm coverage. Finally, we cover the integration of Icinga 2 with Ganglia and other data sources, such as SNMP for system information and IPMI for hardware health.

Primary Keyword (Mandatory)	Monitoring

Diana Scannicchio (University of California Irvine (US))

Christopher Jon Lee (University of Cape Town (ZA)) Costin Gament (University Politehnica of Bucharest (RO)) Daniel Fazio (CERN) Franco Brasolin (Sezione di Bologna (INFN)-Universita e INFN) Matthew Shaun Twomey (University of Washington (US)) Sergio Ballestrero (A.D.A.M. Applications of Detectors and accelerators to Medicine)

poster-450.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Integrated monitoring of the ATLAS online computing farm

San Francisco Marriott Marquis

Speaker

Description

Primary author

Co-authors

Presentation materials

Choose timezone

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Speaker

Description

Primary author

Co-authors

Presentation materials

Share this page

Direct link

Social networks

Calendaring