8–12 Sept 2025
Hamburg, Germany
Europe/Berlin timezone

Efficient Transformers for Jet Tagging

10 Sept 2025, 11:00
30m
ESA W 'West Wing'

ESA W 'West Wing'

Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Speaker

Vivekanand Gyanchand Sahu (University of California San Diego)

Description

We present a suite of optimizations to the Particle Transformer (ParT), a state-of-the-art model for jet tagging, targeting the stringent latency and memory constraints of real-time environments such as HL-LHC triggers. To address the quadratic scaling and compute bottlenecks of standard attention, we integrate FlashAttention for exact, fused-kernel attention with reduced memory I/O, and Linformer to lower attention complexity from O(n²) to O(n) via low-dimensional projections—substantially improving scalability for longer sequences. We further apply INT8 dynamic quantization to compress matrix multiplications, reducing latency and GPU memory usage without retraining. Evaluations on JetClass and HLS4ML datasets show that these techniques—individually and in combination—deliver significant inference speedups, FLOP reductions, and memory savings while maintaining near-baseline accuracy. Additional experiments explore sequence ordering strategies, including physics-motivated projection matrices, and employ interpretability analyses of attention maps and embeddings to better understand model behavior. The combined approach enables efficient, accurate transformer-based jet classification suitable for high-rate trigger systems.

References

Interpreting and Accelerating Transformers for Jet Tagging (Talk at FastML)

Significance

This presentation goes beyond a status update by showcasing novel integration of FlashAttention, Linformer, and INT8 quantization in the Particle Transformer (ParT) for jet classification. It highlights the synergistic impact of these optimizations on reducing inference time and memory usage without sacrificing accuracy. By systematically evaluating their combined effects, the work provides practical insights for real-time deployment in HL-LHC triggers, marking a significant step toward production-ready transformer models in high-energy physics.

Experiment context, if any CMS

Authors

Aaron Wang (University of Illinois Chicago (US)) Abhijith Gandrakota (Fermi National Accelerator Lab. (US)) Elham Khoda (University of Washington (US)) Javier Mauricio Duarte (Univ. of California San Diego (US)) Jennifer Ngadiuba (FNAL) Vivekanand Gyanchand Sahu (University of California San Diego) Zihan Zhao (Univ. of California San Diego (US))

Presentation materials

There are no materials yet.