9–13 Jul 2018
Sofia, Bulgaria
Europe/Sofia timezone

Distributed Data Collection for the Next Generation ATLAS EventIndex Project

11 Jul 2018, 12:15
15m
Hall 8 (National Palace of Culture)

Hall 8

National Palace of Culture

presentation Track 4 - Data Handling T4 - Data handling

Speaker

Alvaro Fernandez Casani (Univ. of Valencia and CSIC (ES))

Description

The ATLAS EventIndex currently runs in production in order to build a
complete catalogue of events for experiments with large amounts of data.

The current approach is to index all final produced data files at CERN Tier0,
and at hundreds of grid sites, with a distributed data collection architecture
using Object Stores to temporarily maintain the conveyed information, with
references to them sent with a Messaging System. The final backend of all the
indexed data is a central Hadoop infrastructure at CERN; an Oracle
relational database is used for faster access to a subset of this information.

In the future of ATLAS, instead of files, the event should be the atomic
information unit for metadata. This motivation arises in order to accommodate
future data processing and storage technologies. Files will no longer be static
quantities, possibly dynamically aggregating data, and also allowing event-level
granularity processing in heavily parallel computing environments. It also
simplifies the handling of loss and or extension of data. In this sense
the EventIndex will evolve towards a generalized event WhiteBoard,
with the ability to build collections and virtual datasets for end users.

This paper describes the current Distributed Data Collection Architecture of the
ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor
entities, and the protocol and information temporarily stored in the ObjectStore.
It also shows the data flow rates and performance achieved since the new Object
Store as temporary store approach was put in production in July 2017.

We review the challenges imposed by the expected increasing rates that will
reach 35 billion new real events per year in Run 3, and 100 billion new real
events per year in Run 4. For simulated events the numbers are even higher, with
100 billion events/year in run 3, and 300 billion events/year in run 4.

We also outline the challenges we face in order to accommodate this approach for
the future Event White Board in ATLAS.

Primary authors

Alvaro Fernandez Casani (Univ. of Valencia and CSIC (ES)) Dario Barberis (Università e INFN Genova (IT)) Javier Sanchez (Universidad de Valencia (ES)) Carlos García Montoro (IFIC) Santiago Gonzalez De La Hoz (IFIC-Valencia) JOSE SALT (IFIC-VALENCIA)

Presentation materials