Improved ATLAS HammerCloud Monitoring for local Site Administration

Not scheduled
15m
OIST

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495
poster presentation Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing

Speaker

Michael Boehler (Albert-Ludwigs-Universitaet Freiburg (DE))

Description

Every day hundreds of tests are run on the Worldwide LHC Computing Grid for the ATLAS, CMS, and LHCb experiments in order to evaluate the performance and reliability of the different computing sites. All this activity is steered, controlled, and monitored by the HammerCloud testing infrastructure. Sites with failing functionality tests are auto-excluded from the ATLAS computing grid, therefore it is essential to provide a detailed and well organized web interface for the local site administrators such that they can easily spot and promptly solve site issues. Additional functionalities have been developed to extract and visualize the most relevant information. The site administrators can now be pointed easily to major site issues which lead to site blacklisting as well as possible minor issues that are usually not conspicuous enough to warrant the blacklisting of a specific site, but can still cause undesired effects such as a non-negligible job failure rate. This contribution summarizes the different developments and optimizations of the HammerCloud web interface and gives an overview of typical use cases.

Primary author

Michael Boehler (Albert-Ludwigs-Universitaet Freiburg (DE))

Co-authors

Federica Legger (Ludwig-Maximilians-Univ. Muenchen (DE)) Francesco Giovanni Sciacca (Universitaet Bern (CH)) Friedrich Hoenig (Ludwig-Maximilians-Univ. Muenchen (DE)) Johannes Elmsheuser (Ludwig-Maximilians-Univ. Muenchen (DE)) Valentina Mancinelli (Universita e INFN (IT))

Presentation Materials