3–6 Oct 2022
Southern Methodist University
America/Chicago timezone

Resource Efficient and Low Latency GNN-based Particle Tracking on FPGA

3 Oct 2022, 16:15
15m
Southern Methodist University

Southern Methodist University

Speakers

Bo-Cheng Lai Shi-Yu Huang

Description

Charged particle tracking is important in high-energy particle physics. For CERN Large Hadron Collider (LHC), tracking algorithms are used to identify the trajectories of charged particles created in the collisions. The existing tracking algorithms are typically based on the combinatorial Kalman filter where the complexity increases quadratically with the number of hits. The poor scalability issue will be exacerbated when the beam intensities are expected to increase dramatically. Therefore, new tracking algorithms based on Graph Neural Networks (GNNs) are introduced to enhance the scalability of particle tracking tasks. These GNN algorithms are implemented on Field Programmable Gate Arrays (FPGAs) to meet the strict latency requirement of fast particle tracking. However, the previous design on Xilinx Virtex UltraScale+ VU9P FPGA can only accommodate a small GNN (28 nodes / 56 edges) due to the significant resource requirement of complex graph processing patterns. A collision event (660 nodes / 1320 edges) needs to be partitioned into smaller sub-graphs to fit the GNN processing to VU9P FPGA. Dividing a collision event into smaller sub-graphs could cause a higher possibility of missing important trajectories between sub-graphs.
In this work, we introduce a resource efficient and low latency architecture to accelerate large GNN processing on FPGA. This design leverages the GNN processing patterns and trajectory data properties to significantly improve the parallelism and computation throughput. We propose a highly parallel architecture with configurable parameters for users to adjust latency, resource utilization, and parallelism. A customized data allocation is used to address the irregular processing patterns and attain high processing parallelism. We further exploit the properties of trajectories between inner and outer detector layers, and reduce the unnecessary dependencies and edges in the graph.
The design is synthesized using hls4ml and implemented on Xilinx Virtex UltraScale+ VU9P FPGA. The proposed design can support a graph of size 660 nodes and 1560 edges with Initialization Interval of 200 ns.

Primary authors

Co-authors

Abdelrahman Elabd Javier Duarte (Univ. of California San Diego (US)) Jin-Xuan Hu Mark Neubauer (Univ. Illinois at Urbana Champaign (US)) Markus Atkinson (Univ. Illinois at Urbana Champaign (US)) Scott Hauck Shih-Chieh Hsu (University of Washington Seattle (US)) Vesal Razavimaleki (Univ. Illinois at Urbana-Champaign (US))

Presentation materials