3–7 Nov 2008
Ettore Majorana Foundation and Centre for Scientific Culture
Europe/Zurich timezone

Large Scale Job Management and Experience in Recent Data Challenges within the LHC CMS experiment.

3 Nov 2008, 17:25
25m
Ettore Majorana Foundation and Centre for Scientific Culture

Ettore Majorana Foundation and Centre for Scientific Culture

Via Guarnotta, 26 - 91016 ERICE (Sicily) - Italy Tel: +39-0923-869133 Fax: +39-0923-869226 E-mail: hq@ccsem.infn.it
Parallel Talk 1. Computing Technology Computing Technology for Physics Research

Speaker

Dr Stuart Wakefield (Imperial College London)

Description

From its conception the job management system has been distributed to increase scalability and robustness. The system consists of several applications (called prodagents) which each manage Monte Carlo, reconstruction and skimming jobs on collections of sites within different Grid environments (OSG, NorduGrid?, LCG) and submission systems (GlideIn?, local batch, etc..). Production of simulated data in CMS will take place mainly on so called Tier2s (small to medium size computing centers) resources. Approximately ~50% of the CMS Tier2 resources are allocated to running simulation jobs. While the so called Tier1s (medium to large size computing centers with high capacity tape storage systems) will be mainly used for skimming and reconstructing detector data. During the last one and a half years the system has also been adapted such that it can be configured for converting Data Acquisition (DAQ)/ High Level Trigger (HLT) output from the CMS detector to the CMS data format and manage the real time data stream from the experiment. Simultaneously the system has been upgraded to facilitate the increasing scale of the CMS production and adapting to the procedures used by its operators. In this paper we discuss the current (high level) architecture of ProdAgent, the experience in using this system in computing challenges, feedback from these challenges, and future work including migration to a set of core libraries to facilitate convergence between the different data management projects within CMS that deal with analysis, simulation, and initial reconstruction of real data. This migration is important as it will decrease the code footprint used by these projects and increase maintainability of the code base.

Primary author

Dr David Evans (Fermilab, Batavia, IL, USA)

Co-authors

Mr Ahmad Hassan (CERN, Geneva, Switzerland) Dr Ajit Mohapatra (Wisconsin) Dr David Mason (Fermilab, Batavia, IL, USA) Dr Dirk Hufnagel (CERN, Geneva, Switzerland) Mr Frank van Lingen (Caltech) Dr Mike Miller (MIT) Dr Oliver Gutsche (Fermilab, Batavia, IL, USA) Dr Simon Metson (H.H. Wills Physics Laboratory, Bristol University) Dr Stuart Wakefield (Imperial College London)

Presentation materials