Speaker
Description
The NA62 experiment at CERN SPS is aimed at measuring the branching ratio of the very rare kaon decay K+-> pi+ nu nubar.
NaNet is the reconfigurable design of a FPGA-based PCI Express Network Interface Card with processing, RDMA and GPUDirect capabilities and support for multiple link technologies.
NaNet has been employed to implement a real-time distributed processing pipeline in the low level trigger of the experiment, operating on the data streams produced by the RICH detector with an orchestrated combination of heterogeneous computing devices (CPUs, FPGAs and GPUs).
Recent results collected during NA62 runs are presented and discussed.
Summary
The NA62 experiment at CERN SPS is aimed at measuring the branching ratio of the very rare kaon decay K+-> pi+ nu nubar.
A centralized level 0 hardware trigger system (L0TP) processes in real-time streams of primitives coming from the detectors readout boards in order to reject the considerable background.
Our approach aims at improving the L0TP performances distributing this processing over the whole chain starting from the earliest stages, i.e. the readout boards, and operating on the data streams with an orchestrated combination of heterogeneous computing devices (CPUs, FPGAs and GPUs).
The enabling element of this real-time distributed stream computing architecture is NaNet, a FPGA-based PCI Express Network Interface Card with processing, RDMA and GPUDirect capabilities, supporting multiple link technologies (1/10/40GbE and custom ones).
We have demonstrated, the effectiveness of our design by retrofitting the RICH detector to compute within 350 us the Cerenkov rings parameters, using the FPGA to implement the data receiving and coalescing of events split in 4 data streams while the GPU was in charge for the ring reconstruction, executing a parallel algorithm based on geometric considerations (Histogram).
Driven by the fact that FPGA architectures shows a relevant ramp up in their computing resources, along with the recent availabilty of frameworks for high level synthesis, we evaluated
the Intel FPGA SDK for OpenCL environment to assess feasibility of rings reconstruction directly on
the FPGA hosted on the NaNet card. The Histogram algorithm has been ported in this framework and optimized to make the most of the pipelined architecture.
Obtained results are discussed and compared with the GPU implementation.
Recent results collected during NA62 runs along with a detailed description of the latest developments in the NaNet architecture and an insight of future project developments are also presented and discussed.