Speaker
Description
Normalized Transformer architectures have shown significant improvements in training efficiency across large-scale natural language processing tasks. Motivated by these results, we explore the application of normalization techniques to Particle Transformer (ParT) for jet classification in high-energy physics. We construct a normalized vanilla Transformer classifier and a normalized ParT (n-ParT), aiming to observe acceleration effects on the Top Tagging and JetClass datasets. Our approach combines normalization strategies with the inductive biases of ParT to enhance convergence speed and model performance. Preliminary results indicate that normalization can offer faster training while maintaining classification accuracy, suggesting promising directions for deploying efficient Transformer-based models in particle physics analyses.
Significance
This work introduces a normalized variant of the Particle Transformer (n-ParT), applying normalization strategies from NLP to improve training efficiency in jet classification tasks. The results demonstrate faster convergence without sacrificing accuracy on benchmarks like Top Tagging and JetClass, marking a meaningful step toward scalable and efficient Transformer-based models in high-energy physics.