Feb 13 – 17, 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

A Scalable Distributed Data Management System for ATLAS

Feb 14, 2006, 4:20 PM
20m
Auditorium (Tata Institute of Fundamental Research)

Auditorium

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
oral presentation Grid middleware and e-Infrastructure operation Grid Middleware and e-Infrastructure Operation

Speaker

Dr David Cameron (European Organization for Nuclear Research (CERN))

Description

The ATLAS detector currently under construction at CERN's Large Hadron Collider presents data handling requirements of an unprecedented scale. From 2008 the ATLAS distributed data management (DDM) system must manage tens of petabytes of event data per year, distributed around the world: the collaboration comprises 1800 physicists participating from more than 150 universities and laboratories in 34 countries. The ATLAS DDM project was established in spring 2005 to develop the system, Don Quijote 2 (DQ2), drawing on operational experience from a previous generation of data management tools. The foremost design objective was to achieve the scalability, robustness and flexibility required to meet the data handling needs of the ATLAS Computing Model, from raw data archiving through global managed production and analysis to individual physics analysis at home institutes. The design layers over a foundation of basic file handling Grid middleware a set of loosely coupled components that provide logical organization at the dataset (hierarchical, versioned file collections) level, supporting in a flexible and scalable way the data aggregations by which data is replicated, discovered and analyzed around the world. A combination of central services, distributed site services and agents handle data transfer, bookkeeping and monitoring. Implementation approaches were carefully chosen to meet performance and robustness requirements. Fast and lightweight REST-style web services knit together components which utilize through standardized interfaces cataloging and file movement tools chosen for their performance and maturity, with the expectation that choices will evolve over time. In this paper we motivate and describe the architecture of the system, its implementation, the current state of its deployment for production and analysis operations throughout ATLAS, and the work remaining to achieve readiness for datataking.

Primary authors

Dr David Cameron (European Organization for Nuclear Research (CERN)) Mr Miguel Branco (European Organization for Nuclear Research (CERN)) Dr Torre Wenaus (Brookhaven National Laboratory)

Presentation materials