ACAT 2014

Name: ACAT 2014
Start: 2014-09-01T08:00:00+02:00
End: 2014-09-05T18:00:00+02:00
Location: Faculty of Civil Engineering

1–5 Sept 2014

Faculty of Civil Engineering

Europe/Prague timezone

Secretary

acat2014@particle.cz

Data-flow performance optimization on unreliable networks: the ATLAS data-acquisition case

2 Sept 2014, 08:00

Faculty of Civil Engineering

Faculty of Civil Engineering, Czech Technical University in Prague Thakurova 7/2077 Prague 166 29 Czech Republic

Board: 103

Poster Computing Technology for Physics Research Poster session

Tommaso Colombo (CERN and Universität Heidelberg)

The ATLAS detector at CERN records proton-proton collisions delivered by the Large Hadron Collider (LHC). The ATLAS Trigger and Data-Acquisition (TDAQ) system identifies, selects, and stores interesting collision data. These are received from the detector readout electronics at an average rate of 100 kHz. The typical event data size is 1 to 2 MB. Overall, the ATLAS TDAQ can be seen as a distributed software system executed on a farm of roughly 2000 commodity PCs. The worker nodes are interconnected by an Ethernet network that at the restart of the LHC in 2015 is expected to experience a sustained throughput of several 10 GB/s. A particular type of challenge posed by this system, and by DAQ systems in general, is the inherently bursty nature of the data traffic from the readout buffers to the worker nodes. This can cause instantaneous network congestion and therefore performance degradation. The effect is particularly pronounced for unreliable network interconnections, such as Ethernet. In this presentation we report on the design of the data-flow software for the 2015-2018 data-taking period of the ATLAS experiment. This software will be responsible for transporting the data across the distributed data-acquisition system. We will focus on the strategies employed to manage the network congestion and therefore minimize the data-collection latency and maximize the system performance. We will discuss the results of systematic measurements performed on the production hardware. These results highlight the causes of network congestion and the effects on the overall system performance. Based on these results, a simulation of the distributed system communication has been developed. This enables to explore different solutions to the network congestion sources and effects, without physical intervention. These investigations will support the choice of the best data-flow control strategy for the coming data-taking period.

Dr Fabrizio Salvatore (University of Sussex (GB))

Tommaso Colombo (CERN and Universität Heidelberg)

There are no materials yet.

proceedings.pdf

ACAT 2014

Secretary

Data-flow performance optimization on unreliable networks: the ATLAS data-acquisition case

Faculty of Civil Engineering

Speaker

Description

Primary author

Co-author

Presentation materials

Peer reviewing

Paper

Choose timezone

ACAT 2014

Secretary

Speaker

Description

Primary author

Co-author

Presentation materials

Peer reviewing

Paper