4–8 Nov 2019
Adelaide Convention Centre
Australia/Adelaide timezone

Design of the data distribution network for the ALICE Online-Offline (O2) facility

5 Nov 2019, 12:00
15m
Riverbank R5 (Adelaide Convention Centre)

Riverbank R5

Adelaide Convention Centre

Oral Track 1 – Online and Real-time Computing Track 1 – Online and Real-time Computing

Speaker

Gvozden Neskovic (Johann-Wolfgang-Goethe Univ. (DE))

Description

ALICE (A Large Ion Collider Experiment), one of the large LHC experiments, is currently undergoing a significant upgrade. Increase in data rates planned for LHC Run3, together with triggerless continuous readout operation, requires a new type of networking and data processing infrastructure.

The new ALICE O2 (online-offline) computing facility consists of two types of nodes: First Level Processors (FLP): containing a custom PCIe cards to receive data from detectors, and Event Processing Nodes (EPN): compute dense nodes equipped with GPGPUs for fast online data compression. FLPs first buffer the detector data for a time interval into SubTimeFrame (STF) objects. A TimeFrame then aggregates all corresponding STFs from each FLP into the TimeFrame (TF) object, located on a designated EPN node where it can be processed. The data distribution network connects FLP and EPN nodes, enabling efficient TimeFrame aggregation and providing a high quality of service.

We present design details of the data distribution network tailored to the requirements of the ALICE O2 facility based on the InfiniBand HDR technology. Further, we will show a scheduling algorithm for TimeFrame distribution from FLP to EPN nodes, which evenly utilizes all available processing capacity and avoids creating long-term network congestion.

Consider for promotion No

Primary author

Gvozden Neskovic (Johann-Wolfgang-Goethe Univ. (DE))

Presentation materials