2–6 Mar 2009
Le Ciminiere, Catania, Sicily, Italy
Europe/Rome timezone

ATLAS Distributed Analysis tests in the Spanish Cloud

4 Mar 2009, 11:20
20m
Raffaello (80) (Le Ciminiere, Catania, Sicily, Italy)

Raffaello (80)

Le Ciminiere, Catania, Sicily, Italy

Viale Africa 95100 Catania
Oral Scientific results obtained using grid technology High Energy Physics

Speaker

Dr Santiago Gonzalez De La Hoz (IFIC-Valencia/CERN)

Description

ATLAS distributed analysis challenges need to be performed in order to validate site and cloud readiness for the full-scale user load. Breaking points and bottlenecks which result from the site/cloud design or configuration need to be identified.For that test,we are using a real analysis code from physicists,the Ganga LCG/EGEE backend,using its data-based brokering and splitting and both Posix I/O and copy mode for accessing to the data. A set of metrics and error classification will be provided

Impact

The ATLAS detector is going to take real data in Autumn 2009. Spanish sites have to be validated in order to be ready for the physics analysis. With these tests we will identify breaking points and bottlenecks which result from the site design or configuration taking into account that needs of analysis jobs differ from those of production. The essential impact is ensuring the robustness and effectiveness of the complex system of GRID Analysis

Conclusions and Future Work

Distributed analysis tests are necessary to stress the facilities at a simulated full user load. Very useful first test that allows to identify “big” problems (missing software, missing/bad information published) and already to point out some limitation (e.g.:network connection in some sites). In the future, we would like to test new parameters to improve access to the data using DCAP and LUSTRE and reproduce the exercise with more jobs to really hammer the sites in order to stress the system.

Keywords

Distributed Analysis, GRID, LCG/EGEE; ATLAS, GANGA, LUSTRE, DCAP

Detailed analysis

The needs of analysis jobs differ from those of production, so though a site may function for MC, it may have a poor performance for analysis. With these tests we are trying to identify what works well, and what doesn’t work well. The distributed analysis challenge framework is built with Ganga LCG/EGEE jobs, which submits an application to the sites, tracks the job statuses periodically as they run, and allows us to easily develop reports after the complete. The jobs read directly from the Storage Element the data using posix I/O and some performance metrics, for instance success/failure rate, CPU/walltime, events per second and error classification, e. g.: different I/O errors are recorded. The results led to discovery network and switch problems and software bugs. The same application and input data were used in all sites.

URL for further information

http://gangarobot.cern.ch/st/

Author

Dr Santiago Gonzalez De La Hoz (IFIC-Valencia/CERN)

Co-authors

Ms Alejandro Lamas (IFIC-Valencia) Mr Alvaro Fernandez (IFIC-Valencia) Ms Elena Oliver (IFIC-Valencia) Dr Farida Fassi (In2p3-Lyon) Dr Gabriel Amoros (IFIC-Valencia) Mr Javier Sanchez (IFIC-Valencia) Laura Del cano (UAM-MAdrid) Dr Luis March (UAM-MAdrid) Luis Muñoz (UAM-MAdrid) Dr MOhammed Kaci (IFIC-Valencia) Mr Miguel Villaplana (IFIC-Valencia) Pablo Fernandez (UAM-MAdrid) Mr Roger Vives (IFIC-Valencia) Dr Xavier Espinal (PIC/IFAE) andreu pacheco (IFAE-Barcelona) carlos borrego (IFAE-Barcelona) jordi nadal (IFAE-Barcelona) jose del peso (UAM-MAdrid) Dr jose salt (IFIC-Valencia) juan pardo (UAM-MAdrid) marc campos (IFAE-Barcelona)

Presentation materials