10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Grid Site Availability Evaluation and Monitoring at CMS

11 Oct 2016, 15:30
1h 15m
San Francisco Marriott Marquis

San Francisco Marriott Marquis

Poster Track 7: Middleware, Monitoring and Accounting Posters A / Break

Description

Grid Site Availability Evaluation and Monitoring at CMS

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyze the vast quantity of scientific data recorded every year.

The computing resources are grouped into sites and organized in a tiered structure. A tier consists of sites in various countries around the world. Each site provides computing and storage to the CMS computing grid. In total about 125 sites contribute with resources from hundred to well over tenthousand computing cores and storage from tens of TBytes to tens of PBytes.

In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impact data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid (WLCG). Sites are supplementing their computing with cloud resources while others focus on increased use of opportunistic resources. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.

Primary Keyword (Mandatory) Computing middleware
Secondary Keyword (Optional) Computing facilities

Authors

Andrea Sciaba (CERN) Gaston Lyons Pacini (Fermi National Accelerator Lab. (US)) Giuseppe Bagliesi (Universita di Pisa & INFN (IT)) Rokas Maciulaitis (Vilnius University (LT)) Stephan Lammel (Fermi National Accelerator Lab. (US))

Presentation materials