21-27 March 2009
Prague
Europe/Prague timezone

Monitoring the world-wide daily computing operations in ATLAS LHC experiment

23 Mar 2009, 08:00
1h
Prague

Prague

Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
Board: Monday 077
poster Distributed Processing and Analysis Poster session

Speaker

Dr Xavier Espinal (PIC/IFAE)

Description

The ATLAS distributed computing activities involve about 200 computing centers distributed world-wide and need people on shift covering 24 hours per day. Data distribution, data reprocessing, user analysis and Monte Carlo event simulation runs continuously. Reliable performance of the whole ATLAS computing community is of crucial importance to meet the ambitious physics goals of the ATLAS experiment. Distributed computing software and monitoring tools are evolving continuously to achieve this target. The world-wide daily operations shift group are the first responders to all faults, alarms and outages. The shifters are responsible to find, report and follow problems at almost every level of a complex distributed infrastructure, and complex processing model. In this paper we present the operations model followed by the experiences of running the world-wide daily operations group for the past year. We will present the most common problems encountered, and the expected future evolution to provide efficient usage of data, resources, manpower and improve communication between sites and the experiment.
Presentation type (oral | poster) oral

Primary author

Dr Xavier Espinal (PIC/IFAE)

Co-author

Dr Kaushik De (UTA)

Presentation Materials