Speaker
Tommaso Colombo
(CERN and Universität Heidelberg)
Description
The ATLAS detector at CERN records proton-proton collisions delivered by the
Large Hadron Collider (LHC). The ATLAS Trigger and Data-Acquisition (TDAQ)
system identifies, selects, and stores interesting collision data. These are
received from the detector readout electronics at an average rate of 100 kHz.
The typical event data size is 1 to 2 MB. Overall, the ATLAS TDAQ can be seen as
a distributed software system executed on a farm of roughly 2000 commodity PCs.
The worker nodes are interconnected by an Ethernet network that at the restart
of the LHC in 2015 is expected to experience a sustained throughput of several
10 GB/s.
A particular type of challenge posed by this system, and by DAQ systems in
general, is the inherently bursty nature of the data traffic from the readout
buffers to the worker nodes. This can cause instantaneous network congestion and
therefore performance degradation. The effect is particularly pronounced for
unreliable network interconnections, such as Ethernet.
In this presentation we report on the design of the data-flow software for the
2015-2018 data-taking period of the ATLAS experiment. This software will be
responsible for transporting the data across the distributed data-acquisition
system. We will focus on the strategies employed to manage the network
congestion and therefore minimize the data-collection latency and maximize the
system performance.
We will discuss the results of systematic measurements performed on the
production hardware. These results highlight the causes of network congestion
and the effects on the overall system performance. Based on these results, a
simulation of the distributed system communication has been developed. This
enables to explore different solutions to the network congestion sources and
effects, without physical intervention. These investigations will support the
choice of the best data-flow control strategy for the coming data-taking period.
Primary author
Dr
Fabrizio Salvatore
(University of Sussex (GB))
Co-author
Tommaso Colombo
(CERN and Universität Heidelberg)