Speaker
Description
In 2021 the NA62 experiment at CERN is restarting data taking with upgraded instrumentation. In this framework we present the commissioning test of the new L0 trigger processor offering enhanced bandwidth, updated interconnection technology and increased logic capabilities with respect to its predecessor. We also present the latest performances of two computing-intense additional components dedicated to the online processing of RICH detector information: a ring reconstruction algorithm on GPU for electron identification and a fast neural network developed with HLS tools on FPGA for ring multiplicity counting. Finally we evaluate the impact of introducing such features in the TDAQ system.
Summary (500 words)
NA62 is a fixed target experiment at CERN aiming for precision measurements of the rarest decay modes of the K+ meson.
The apparatus is 270 meters long and composed by many sub-detector modules dedicated to particles identification and kinematics reconstruction.
The online data selection is managed by a two-levels trigger system: the lowest (L0) is implemented in hardware and constrained by a 1 millisecond latency,
the other is software operated on a dedicated PC farm.
Among the most challenging requirements there is a 10 orders of magnitude background rejection and a peculiar beam structure consisting of ~5 seconds long spill populated by 3×10^12 particles.
The L0 trigger processor adopted so far (L0TP) was implemented on a Tera-sic DE4 board equipped with eight 1GbE links; it proved substantially robust but meanwhile technology has advanced.
The system has now been ported onto a more recent platform, a Xilinx VCU118 board featuring 10GbE links.
We present a comparison between the old and the new system (L0TP+) in realistic conditions in lab and in beam tests.
The increased resources on L0TP+ allow for better integration with the Run Control of the experiment and addition of new features for online detectors data processing.
One of the inputs of L0TP is the Ring Imaging Cherenkov (RICH) detector data, requiring a maximum 10 MHz throughput processing capability.
No track information is available and the current solution is based just on hit multiplicity and temporal clustering.
Two online systems were developed to refine the logic dedicated to process the RICH data stream.
The first one is a neural network for rings counting implemented on FPGA.
The model consists of a fully connected architecture with less than 150 neurons; it was trained using the Tensorflow framework, using Qkeras for weights quantization.
The FPGA code is generated using High Level Synthesis (HLS) techniques and reaches about 80% test accuracy when deployed on device.
We present bench test results that demonstrate the effectiveness of this approach in satisfying the high demanding timing requirements of NA62-RICH i.e. a throughput of 10 million event classifications per second.
We highlight the innovative aspects of the workflow together with details related to the reduced numerical representation on FPGA.
Thanks to the small footprint on the VCU118 resources the module could be integrated in L0TP+.
A second system was developed for geometrical reconstruction of the rings on a Nvidia GPU board and has been tested on site in 2018 experiment runs.
The key component enabling GPU for online usage is NaNet, an FPGA-based Network Interface Card that exchanges data with the GPU memory without host CPU supervision.
We preliminary assess here the system performance in electron identification during the first months of the 2021 NA62 data taking.
In conclusion the trigger scenario in HEP is evolving thanks to new distributed computing paradigms based on heterogeneous nodes and low latency interconnects.
The L0TP+ of NA62 enriched with parallel and specialized systems is an example of this trend that more and more will take hold in near future.