21–25 May 2012
New York City, NY, USA
US/Eastern timezone

ATLAS DDM/DQ2 & NoSQL databases: Use cases and experiences

21 May 2012, 17:50
25m
Room 802 (Kimmel Center)

Room 802

Kimmel Center

Parallel Software Engineering, Data Stores and Databases (track 5) Software Engineering, Data Stores and Databases

Speaker

Mario Lassnig (CERN)

Description

The Distributed Data Management System DQ2 is responsible for the global management of petabytes of ATLAS physics data. DQ2 has a critical dependency on Relational Database Management Systems (RDBMS), like Oracle, as RDBMS are well suited to enforce data integrity in online transaction processing application. Despite these advantages, concerns have been raised recently on the scalability of data warehouse-like workload against the relational schema, in particular for the analysis of archived data or the aggregation of data for summary purposes. Therefore, we have considered new approaches of handling very large amount of data. More specifically, we investigated a new class of database technologies commonly referred to as NoSQL databases. This includes distributed file system like HDFS that support parallel execution of computational tasks on distributed data, as well as schema-less approaches via key-value/document stores, like HBase, Cassandra or MongoDB. These databases provide solutions to particular types of problems: for example, NoSQL databases have demonstrated horizontal scalability, high throughput, automatic fail-over mechanisms, and provide easy replication support over LAN and WAN. In this talk, we will describe our use cases in ATLAS, and share our experiences with NoSQL databases in a comparative study with Oracle.

Author

Co-authors

Angelos Molfetas (CERN) Gancho Dimitrov (Brookhaven National Laboratory (US)) Graeme Andrew Stewart (CERN) Luca Canali (CERN) Mario Lassnig (CERN) Martin Barisits (Vienna University of Technology (AT)) Vincent Garonne (CERN)

Presentation materials