Versatile firmware for the Common Readout Unit (CRU) of the LHC ALICE experiment.

5 Sept 2019, 14:50
25m
Aula de bioloxía

Aula de bioloxía

Oral Programmable Logic, Design Tools and Methods Programmable Logic, Design Tools and Methods

Speaker

Olivier Bourrion (Centre National de la Recherche Scientifique (FR))

Description

For the next upgrade, the ALICE experiment will use a Common Readout Unit (CRU) at the heart of the data acquisition system. The CRU, based on the PCIe40 hardware designed for LHCb, is a common interface between front-ends, computing system and the trigger and timing system. The 475 CRUs will interface 10 different sub-detectors with 3 sub-systems and reduce the total data throughput from 3.5 TB/s to 635 GB/s. The ALICE common firmware framework is under development. It supports data taking in continuous and triggered mode, clock, trigger and slow control delivery. The architecture and results will be presented.

Summary

The CRU will be located in the data acquisition farm and installed in dedicated computers, namely the First Level Processors (FLP) that are part of the online/offline system (O2).
Each CRU is read out via PCIe gen 3 16 lanes, connected to the front-end electronics with up to 24 bidirectional optical fibers operating at 4.48Gb/s and to the Trigger and Timing System (TTS) with a 10Gb/s Passive Optical Network.

The common firmware of the CRU copes with the various interface requirements of the sub-detectors. The front-ends are read out via GBT in standard or wide bus, and with different protocols: streaming or packet mode. Additionally, some front-ends receive the trigger and timing information from the CRU via the GBT downstream direction, while others have a direct connection to the TTS. The detectors use the GBT downstream connection to relay the Detector Control System (DCS) commands.. These commands are transmitted through the O2 framework, which uses the FLP to send these to the front-end, via the CRU.

In its core, the common firmware performs the detector readout, either in continuous or in triggered mode. The continuous mode differs from the triggered mode, in the sense that all data are transferred unfiltered through the CRU to the FLP memory where it will be further processed and shipped to the event building nodes. The readout system is designed for continuous readout. However, during commissioning or non-nominal operation conditions, quick recovery from eventual data loss is ensured by performing data stream consistency checks in the CRU and transmission of real time status information to the Central Trigger Processor (CTP). The CTP, which has a full detector view, can then request the CRUs to temporarily throttle the data taking to allow recovery.

For commissioning, but also for long term support during operation, many testing and emulator tools are delivered with the common firmware (fake-data generators, pattern player, trigger emulator).

Apart from the common firmware framework that offers raw readout, the possibility is offered to each detector to integrate its own user logic for pre-processing the data and thus reducing the data throughput. This user logic can be quite demanding in terms of logic resources, especially in the case of the Time Projection Chamber (TPC). Consequently, the resource usage of the common firmware must be minimized to allow user logic to fit in the FPGA.

The common firmware framework provides validated solutions for all these use cases and requirements. The chosen architecture, the various solutions implemented and the results obtained will be presented.

Primary author

Olivier Bourrion (Centre National de la Recherche Scientifique (FR))

Co-authors

Erno David (Hungarian Academy of Sciences (HU)) Filippo Costa (CERN) Joel Rene Bouvier (Centre National de la Recherche Scientifique (FR)) Jozsef Imrek (Hungarian Academy of Sciences (HU)) Sanjoy Mukherjee (Bose Institute (IN)) Tuan Mate Nguyen (Hungarian Academy of Sciences (HU))

Presentation materials