Design of a resilient, high-throughput, persistent storage system for the ATLAS Phase-II DAQ system

May 27, 2021, 5:00 AM
Adam Abed Abud (University of Liverpool (GB))


The ATLAS experiment will undergo a major upgrade to adapt to the HL-LHC. The Trigger and Data Acquisition system (TDAQ) will record data at unprecedented rates: detectors will be read out at 1 MHz generating around 5 TB/s of data. Within TDAQ the Dataflow system (DF) introduces a novel design: readout data are buffered on persistent storage while the event filtering system selects 10 kHz of events for a total throughput of around 60 GB/s. New challenges arise for DF to design and implement a distributed, reliable, persistent storage system supporting several TB/s of aggregated throughput while providing tens of PB of capacity. In this paper after describing some of these challenges we present the ongoing R&D to address each of them: data safety, indexing at high rates in a distributed system, and high-performance management of storage capacity. Finally the performance achieved with a working prototype is shown.

Primary authors

Andrei Kazarov (NRC Kurchatov Institute PNPI (RU)) Adam Abed Abud (University of Liverpool (GB))

