21–27 Mar 2009
Prague
Europe/Prague timezone

Bringing the CMS Distributed Computing System into Scalable Operations

23 Mar 2009, 16:50
20m
Panorama (Prague)

Panorama

Prague

Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
oral Grid Middleware and Networking Technologies Grid Middleware and Networking Technologies

Speaker

Dr Jose Hernandez (CIEMAT)

Description

Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workfload management tools, the various computing workflows and the underlying computing infrastructure located at more than 50 computing centres worldwide interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Programme with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution will report on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
Presentation type (oral | poster) oral

Primary authors

Dr Alessandra Fanfani (INFN and University of Bologna) Dr Andrea Sciaba (CERN) Dr Ian Fisk (FNAL) Dr James Letts (UCSD) Dr Jose Hernandez (CIEMAT) Dr Josep Flix (PIC/CIEMAT) Dr Nicolo Magini (CERN) Dr Stefano Belforte (INFN, Sezione di Trieste) Dr Thomas Kress (RWTH) Dr Vincenzo Miccio (INFN and University of Milano)

Presentation materials