A design study for the upgraded ALICE O2 computing facility

14 Apr 2015, 16:30
15m
Village Center (Village Center)

Village Center

Village Center

oral presentation Track1: Online computing Track 1 Session

Speaker

Matthias Richter (University of Oslo (NO))

Description

An upgrade of the ALICE detector is currently prepared for the Run 3 period of the Large Hadron Collider (LHC) at CERN starting in 2020. The physics topics under study by ALICE during this period will require the inspection of all collisions at a rate of 50 kHz for minimum bias Pb-Pb and 200 kHz for pp and p-Pb collisions in order to extract physics signals embedded into a large background. The upgraded ALICE detector will produce more than 1 TByte/s of data. Both collision and data rate impose new challenges onto the detector readout and compute system. Some detectors will not use a triggered readout, which will require a continuous processing of the detector data. Although various online systems are existing for event based reconstruction, the application of a production system for time-based data processing and reconstruction is a novel case in HEP. The project will benefit from the experience gained with the current ALICE High Level Trigger online system, which already implements a modular concept combining data transport, algorithms and heterogeneous hardware. Processing of individual events will however have to be replaced by the continuous processing of the data stream segmented according to a time-frame structure. One challenge is the distribution of data within the compute nodes. Time-correlated data sets are received by the First Level Processors (FLP) and must be coherently transported to and aggregated on the Event Processing Nodes (EPN). Several approaches for the distribution of data are being studied. Aggregated time-frame data is processed on the EPN with the primary goal to reconstruct particle properties. On-the-fly and short-latency detector calibration is necessary for the reconstruction. The impact of the calibration strategy to the reconstruction performance is under study. Based on the partially reconstructed data, events corresponding to particular collisions can be assembled from the time-based data. The original raw data are then replaced by these preprocessed data. This transformation together with the application of lossless data compression algorithms will provide a data volume reduction of a factor of 20 before data is passed onto the storage system. Building on messaging solutions, the design and development of a flexible framework for transparent data flow, online reconstruction, and data compression has started. The system uses parallel processing on the level of processes and threads within processes in order to achieve an optimal utilization of CPU cores and memory. Furthermore, the framework provides the necessary abstraction to run common code on heterogeneous platforms including various hardware accelerator cards. We present in this contribution the first results of a prototype with estimates for scalability and feasibility for a full scale system.

Primary author

Matthias Richter (University of Oslo (NO))

Presentation Materials