Speaker
Description
The ever-increasing data rates and ultra-low-latency requirements of particle physics experiments demand innovations for real-time decision-making. Transformer Neural Networks (TNNs) have demonstrated state-of-the-art performance in classification tasks, including jet tagging, but implementations on CPUs and GPUs fail to meet the constraints for real-time triggers. This work introduces two novel TNN architectures optimized for Field-Programmable Gate Arrays. The first one prioritizes latency, achieving a speedup of >1000× over GPU implementations while maintaining accuracy. The second one explores trade-offs between latency and accuracy through design-space exploration and a custom post-training quantization, which identifies optimal bit-widths, yielding significant reductions in hardware resource utilization with negligible accuracy degradation. Experiments demonstrate the effectiveness of these designs, setting new benchmarks for real-time machine learning in high-energy physics.
Talk's Q&A | During the talk |
---|---|
Talk duration | 15'+7' |
Will you be able to present in person? | Yes |