1–3 Mar 2006
CERN
Europe/Zurich timezone

CRAB: a tool for CMS distributed analysis in grid environment.

1 Mar 2006, 18:00
30m
40/5-A01 (CERN)

40/5-A01

CERN

45
Show room on map
Oral contribution Astroparticle physics - Fusion - High-Energy physics 1b: Astrophysics/Astroparticle physics - Fusion - High-Energy physics

Speaker

Federica Fanzago (INFN-PADOVA)

Description

The CMS experiment will produce a large amount of data (few PBytes each year) that will be distributed and stored in many computing centres spread in the countries participating to the CMS collaboration and made available for analysis to world-wide distributed physicists. CMS will use a distributed architecture based on grid infrastructure to analyze data stored at remote sites, to assure data access only to authorized users and to ensure remote resources availability. Data analisys in a distributed environment is a complex computing task, that assume to know which data are available, where data are stored and how to access them. The CMS collaboration is developing a user friendly tool, CRAB (Cms Remote Analysis Builder), whose aim is to simplify the work of final users to create and to submit analysis jobs into the grid environment. Its purpose is to allow generic users, without specific knowledge of grid infrastructure, to access and analyze remote data as easily as in a local environment, hiding the complexity of distributed computational services. Users have to develop their analisys code in an interactive environment and decide which data to analyze, providing to CRAB data parameters (keywords to select data and total number of events) and how to manage produced output (return file to UI or store into remote storage). CRAB creates a wrapper of the analisys executable which will be run on remote resources, including CMS environment setup and output management. CRAB splits the analisys into a number of jobs according to user provided information about number of events. The job submission is done using grid workload management command. The user executable is sent to remote resource via inputsandbox, together with the job. Data discovery, resources availability, status monitoring and output retrieval of submitted jobs are fully handled by CRAB. The tool is written in python and have to be installed to the User Interface, the user access point to the grid. Up to now CRAB is installed in ~45 UI and about ~210 different kind of data are available in ~40 remote sites. The weekly rate of submitted jobs is ~10000 with a success rate about 75%, that means jobs arrive to remote sites and produce outputs, while the remnant 25% aborts due to site setup problem or grid services failure. In this report we will explain how CRAB is interfaced with other CMS/grid services and will report the daily user's experience with this tool analyzing simulated data needed to prepare the Physics Technical Design Report.

Summary

Report about CRAB, a tool for CMS analysis in grid environment: how it is interfaced
with CMS/grid services and user's experience.

Authors

Alessandra Fanfani (Bologna University) Daniele Spiga (Perugia University) Federica Fanzago (INFN-PADOVA) Marco Corvo (CERN/CNAF) Stefano Lacaprara (INFN Legnaro)

Co-authors

Giovanni Ciraolo (Firenze University) Nicola De Filippis (Bari University) Nikolai Smirnov (INFN-PADOVA)

Presentation materials