Speaker
Description
The challenging environment of real-time systems at the Large Hadron Collider (LHC) strictly limits the computational complexity of algorithms that can be deployed. For deep learning models, this implies only smaller models that have lower capacity and weaker inductive bias are feasible. To address this issue, we utilize knowledge distillation to leverage both the performance of large models and the speed of small models. In this paper, we present an implementation of knowledge distillation for jet tagging, demonstrating an overall boost in student models' jet tagging performance. Furthermore, by using a teacher model with a strong inductive bias of Lorentz symmetry, we show that we can induce the same bias in the student model which leads to better robustness against arbitrary Lorentz boost.