Speaker
Description
Machine learning algorithms are being used more frequently in the {first-level triggers in collider experiments}, with Graph Neural Networks (GNNs) pushing the hardware requirements of FPGA-based triggers beyond the current state of the art. {As a first online event processing stage, first-level trigger systems process $O({10}~{\text{M}})$ events per second with a hard real-time latency requirement of $O({1}~{\text{us}})$.}
To meet the stringent demands of high-throughput and low-latency environments, we propose a concept for latency-optimized pre-processing of sparse sensor data, enabling efficient GNN hardware acceleration by removing dynamic input sparsity.
Our approach rearranges data coming from a large number of First-In-First-Out (FIFO) interfaces, typically sensor frontends, to a smaller number of FIFO interfaces connected to a GNN hardware accelerator.
In order to achieve high throughput while minimizing the hardware utilization, we developed a hierarchical stream compaction pipeline optimized for FPGAs.
We implemented our concept in the Chisel design language and integrate our open-source package as a parameterizable IP core with FuseSoC. For demonstration, we implemented one configuration of our IP core as pre-processing stage in a GNN-based first-level trigger for the {Electromagnetic Calorimeter (ECL) inside the Belle II detector}. Additionally we evaluate latency, throughput, resource utilization, and scalability for a wide range of parameters, to enable broader use for {other large scale scientific experiments}.
Significance
Our work presents a reusable and scalable data orchestration solution for real-time GNN applications in FPGA-based triggers for particle physics. It enables the hardware acceleration of GNNs on FPGAs in low-latency applications, by eliminating structured sparsity in large heterogeneous sensor arrays. Eliminating sparsity is an important step for real-time algorithms, as it greatly improves compute efficiency and allows for straight-forward calculation of worst-case execution times.
Experiment context, if any | Belle II Experiment |
---|