1–4 Nov 2022
Rutgers University
US/Eastern timezone

Equivariant Neural Networks for Particle Physics: PELICAN

1 Nov 2022, 12:15
20m
Multipurpose Room (aka Livingston Hall) (Rutgers University)

Multipurpose Room (aka Livingston Hall)

Rutgers University

Livingston Student Center

Speaker

Alexander Bogatskiy (Flatiron Institute, Simons Foundation)

Description

A lot of attention has been paid to the applications of common machine learning methods in physics experiments and theory. However, much less attention is paid to the methods themselves and their viability as physics modeling tools. One of the most fundamental aspects of modeling physical phenomena is the identification of the symmetries that govern them. Incorporating symmetries into a model can make it more physically satisfactory, reduce the risk of over-parameterization, and consequently improve robustness and predictive power. As usage of neural networks continues to grow in the field of particle physics, more effort will need to be invested in narrowing the gap between the black-box models of ML and the analytic models of physics.

Building off of previous work, we demonstrate how careful choices in the details of network design – creating a model both simpler and more grounded in physics than the traditional approaches – can yield state-of-the-art performance within the context of problems including jet tagging and four-momentum reconstruction. We present the Permutation-Equivariant and Lorentz-Invariant or Covariant Aggregator Network (PELICAN), which is based on three key ideas: symmetry under permutations of particles, Lorentz symmetry, and the ambiguity of the aggregation process in Graph Neural Networks. For the first, we use the most general permutation-equivariant layer acting on square arrays, which can be viewed as a powerful generalization of Message Passing. For the second, we use classical theorems of Invariants Theory to reduce the 4-vector inputs to an array of Lorentz-invariant quantities. Finally, the flexibility of the aggregation process commonly used in Graph Networks can be leveraged for improved accuracy, in particular to allow variable scaling with the size of the input. We demonstrate the performance of this architecture on two problems: top tagging, and W momentum reconstruction.

Primary authors

Alexander Bogatskiy (Flatiron Institute, Simons Foundation) Timothy Hoffman Xiaoyang Liu (University of Chicago) David Miller (University of Chicago (US)) Jan Tuzlic Offermann (University of Chicago (US))

Presentation materials