With the beginning of LHC Run 3, the upgraded ALICE detector will record Pb-Pb collisions at an interaction rate of 50 kHz using continuous readout, resulting in raw data rates of over 3.5TB/s marking a hundredfold increase over Run 2. Since permanent storage at this rate is unfeasible and exceeds available capacities, a sequence of highly effective compression and data reduction steps is required. Most of these steps perform lossy compression based on techniques like zero suppression, track finding and clusterization without affecting physics. The final compression step is the lossless entropy encoding.
Huffman coding as used for entropy coding in Run 2 is fast and simple, but suffers from an inefficiency of up to one bit per encoded symbol compared to the entropy, leaving room for optimization.
For Run 3 entropy coding based on asymmetric numeral systems (ANS) is under evaluation. ANS achieves compression by encoding source symbols into a single, infinite precision integer state variable using arithmetic operations based on a symbols probability, thus overcoming the one bit inefficiency of Huffman coders. This allows for compression ratios very close to the entropy. Furthermore the mathematical properties of ANS allow efficient parallelization using vector units on CPUs or GPUs.
This contribution describes a custom implementation of an ANS encoder and decoder required to cope with the various distributions of source data that have to be taken into account to achieve the desired high compression ratios for the ALICE detectors - in particular the Time Projection Chamber (TPC) and the Inner Tracking System (ITS). To ensure encoding bandwidth is on pair with the high data rates, our implementation leverages SIMD vector units of CPUS and on-GPU data compression is under evaluation.
|Consider for promotion||Yes|