Speaker
Description
The ATLAS EventIndex has been running in production since mid-2015,
reliably collecting information worldwide about all produced events and storing
them in a central Hadoop infrastructure at CERN. A subset of this information
is copied to an Oracle relational database for fast access.
The system design and its optimization is serving event picking from requests of
a few events up to scales of tens of thousand of events, and in addition, data
consistency checks are performed for large production campaigns. Detecting
duplicate events with a scope of physics collections has recently arisen as an
important use case.
This paper describes the general architecture of the project and the data flow
and operation issues, which are addressed by recent developments to improve the
throughput of the overall system. In this direction, the data collection system
is reducing the usage of the messaging infrastructure to overcome the
performance shortcomings detected during production peaks; an object storage
approach is instead used to convey the event index information, and messages to
signal their location and status. Recent changes in the Producer/Consumer
architecture are also presented in detail, as well as the monitoring
infrastructure.
Primary Keyword (Mandatory) | Distributed data handling |
---|---|
Secondary Keyword (Optional) | Object stores |