Response of the ATLAS Spanish Tier2 for the collisions collected in the first run at LHC
Presented by Dr. Santiago GONZALEZ DE LA HOZ on 14 Apr 2010 from 14:00 to 14:20
Session: High Energy Physics
Track: Scientific results obtained using distributed computing technologies
The distributed analysis tests during the STEP09 and UAT exercises were a success for ATLAS and for the sites involved in the ATLAS computing model. The services were exercised at records level with good efficiencies and real users continued to get real work done at the same time. Sample problems found were : data access, was troublesome under heavy loads; pilot factories need to be better organised; monitoring could be improved Solutions have ben developed for these problems
A Scale Test of Experiment Production was executed in June 2009 (STEP09) and in October 2009 (User Analysis Tests). The STEP09 full production activity stressed a number of critical areas, including tape writing/reading at Tier1 as well as analysis. Tier2 participated in Monte Carlo simulation and in user analysis components of the challenge. User analysis jobs in the EGEE cloud occurred through both WMS job submission and pilot jobs. User Analysis Tests (UAT) was a follow-on test to the STEP09 exercise and the last one before data taking. The goal was to get many user analysis jobs running over worldwide resources. This has the advantage of including potential problem jobs that might be missed in a more controlled test like STEP09. The goal was to cover the major Tier2 activities: Monte Carlo production, data distribution and the user analysis challenge during the STEP09 and UAT exercises from a site point of view (in this case the ATLAS Spanish Tier2). The outcome of these exercises was that there were a number of areas where limitations were found. Improvements were made, defining the “final” WLCG operation environment that will be used for the first pp run of the LHC.
STEP09 and UAT exercises involved all major offline activities done in conjunction with other LHC experiments: Monte Carlo Production, Full Chain Data Distribution, Reprocessing at Tier1s, User Analysis Challenge and ATLAS Central Services Infrastructure. Those tests were a successful exercises for ATLAS and for the sites because they showed that data distribution was generally good, ATLAS central services worked well, analysis tested at very high rates at Tier2s and the reprocessing from tapes works well, with CMS concurrently active at shared Tier1s (for instance the Spanish Tier1). STEP09 and UAT were a useful exercise for our Tier2 in order to solve the problems highlighted as soon as possible. It was the first time we had that level of feedback and information. Storage resources are sometimes undersized but it will not be a long-term problem; data transfer showed a timeout problem that may not be related to storageware, and Intra-VO fairshare (50% production, 50% analysis) was tested.
Spanish ATLAS Tier2 sites are ready, showed robustness, stability and good performance ready for the data taking. The ATLAS computing system is ready as well. The Distributed Data Management (DDM) system improved during the last year and the PanDa Monte Carlo and User Analysis system increased global efficiencies and running stability. The last updates will be made well in advance to have the sites ready for the LHC data taking, and to avoid big computing system interventions.
EGEE, CLOUD, LHC, ATLAS, TIER, GRID
Location: Uppsala University
Room: Room X
- Dr. Santiago GONZALEZ DE LA HOZ Instituto de Física Corpuscular (IFIC)-Universitat de València-CSIC
- Mr. Fco. Javier SANCHEZ MARTINEZ IFIC-Valencia
- Dr. Mohamed KACI IFIC-Valencia
- Dr. Gabriel AMOROS IFIC-Valencia
- Mr. Alvaro FERNANDEZ IFIC-Valencia