Speakers
Description
The STS detector in the CBM experiment delivers data via multiple E-Links connected to GBTX ASICs. In the process of data aggregation, that data must be received, combined into a smaller number of streams, and packed into so-called microslices containing data from specific periods. The aggregation must consider data randomization due to amplitude-dependent processing time in the FEE ASICs and different occupancy of individual E-Links. During the development of the STS readout, the continued progress in the available technology affected the requirements for data aggregation, its architecture, and algorithms. The contribution presents considered solutions and discusses their properties.
Summary (500 words)
The Silicon Tracking System (STS) detector in the Compressed Baryonic Matter (CBM) detector digitizes hits with SMX ASICs. The hit data are transferred via over 20000 E-Links to the GBTX ASICs in readout boards (ROBs) and then via standard GBT links. Finally, the data must be delivered via PCIe to the memory of entry nodes in the First Level Event Selector (FLES) system, which provides the software trigger. Data from multiple E-Links must be aggregated into the streams accepted by PCIe interface boards. Requirements for the aggregation depend on the PCIe bus throughput, the PCIe interface input data width, and the throughput of the data link between the ROBs and the PCIe boards.
For efficient further processing, the data is packed into so-called microslices, containing data from specific time periods. Neighboring microslices are later combined into timeslices used by the event reconstruction algorithms. However, the readout data delivered to ROBs are not perfectly ordered in time. The time of hit processing in the SMX depends on the hit amplitude. Additionally, the hit transmission latency depends on the occupancy of the particular SMX and the number of connected output links. The resulting data randomization must be considered in the data aggregation.
The first proposed STS readout version was assumed to use the intermediate FPGA-based data processing boards (DPB) connected via a 10 Gb/s Aurora link to the PCIe FLES interface boards (FLIB). A heap sorter-based solution was created and tested, which perfectly sorts incoming data according to their timestamp. It enabled a reduction of the data volume. Unfortunately, such a concentrator appeared extremely sensitive to overflow caused by fluctuations in the hit intensity and timestamps corrupted by transmission errors.
The next STS readout architecture eliminated the intermediate DPB layer. It has been integrated with the FLIB boards into a new Common Readout Interface (CRI) board. The CRI implements GBT links for ROB connectivity and the FLES Interface Module (FLIM) for PCIe. The requirements for the perfect sorting were relaxed, and the heap sorter could be replaced with a bin sorter, which handles intermittent peaks in the hit intensity without catastrophic effects but still with significant data loss. Both described solutions significantly modified the data stream. Therefore, they required special diagnostic versions of the FPGA firmware or additional resources for debugging.
The progress of PCIe technology has enabled a significant simplification of the data aggregation scheme. It became possible to transmit the hit data extended with the source identifier and supplemented with time epoch markers via PCIe to the computer’s memory. Using the local time counter eliminated the disastrous effects of corrupted timestamps, which may be detected in software. The minimal data stream modifications reduce the need for special diagnostic firmware.
Implementation of that approach inspired the development of fast lossless data concentrators, which have been developed for concentrating the data into 256-bit and 512-bit wide PCIe output data streams (considered for the second version of FLIM) and backported to 64-bit wide PCIe data used by the first version of FLIM.