28–29 May 2013
CERN
Europe/Zurich timezone

The ATLAS Distributed Data Management System & Databases

29 May 2013, 15:50
20m
60/6-015 - Room Georges Charpak (Room F) (CERN)

60/6-015 - Room Georges Charpak (Room F)

CERN

90
Show room on map

Speaker

Vincent Garonne (CERN)

Description

The ATLAS Distributed Data Management (DDM) System is responsible for the global management of petabytes of high energy physics data. The current system, DQ2, has a critical dependency on Relational Database Management Systems (RDBMS), like Oracle. RDBMS are well-suited to enforcing data integrity in online transaction processing applications, however, concerns have been raised about the scalability of its data warehouse-like workload. In particular, analysis of archived data or aggregation of transactional data for summary purposes is problematic. Therefore, we have evaluated new approaches to handle vast amounts of data. We have investigated a class of database technologies commonly referred to as NoSQL databases. This includes distributed filesystems, like HDFS, that support parallel execution of computational tasks on distributed data, as well as schema-less approaches via key-value stores, like HBase. In this talk we will describe our use cases in ATLAS, share our experiences with various databases used in production and present the database technologies envisaged for the next-generation DDM system, Rucio. Rucio is an evolution of the ATLAS DDM system which addresses the scalability issues observed in DQ2.

Author

Vincent Garonne (CERN)

Co-authors

Cedric Serfon (CERN) Mario Lassnig (University of Innsbruck (AT)) Martin Barisits (CERN) Ralph Vigne (University of Vienna (AT)) Thomas Beermann (Bergische Universitaet Wuppertal (DE))

Presentation materials