# ML4Jets2020

America/New_York
KC 802 (Kimmel Center for University Life)

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Description

Machine learning has become a hot topic in particle physics over the past several years. In particular, there has been a lot of progress in the area of particle and event identification, reconstruction, fast simulation and others. One significant area of research and development has focused on jet physics. In this workshop, we will discuss current progress in this area, focusing on new breakthrough ideas and existing challenges. The ML4Jets workshop will be open to the full community and will include LHC experiments as well as theorists and phenomenologists interested in this topic.  This year's workshop is hosted at NYU in New York City.

This workshop follows successful workshops in 2017 and 2018.

Registration is now closed - please email the organizing committee if you are still interested in attending the workshop in person.

International Organizing Committee:
Kyle Cranmer (NYU), Ben Nachman (LBNL), Maurizio Pierini (CERN), Tilman Plehn (Heidelberg), Jesse Thaler (MIT)

Local Organizing Committee:
Kyle Cranmer (NYU) and Sebastian Macaluso (NYU)

Support:
This event is supported by CERN and U.S. National Science Foundation Cooperative Agreement OAC-1836650 (IRIS-HEP)

Map of lunch spots etc.

Participants
• Alan Mathew Kahn
• Alba Soto Ontoso
• Alejandro Gomez Espinosa
• Alexander Bogatskiy
• Alexander Linus Sopio
• Alison Lister
• Anders Andreassen
• Anja Butter
• Ariana Haghju
• Barry Dillon
• Ben Nachman
• Benjamin Tannenwald
• Bernard Brickwedde
• binish batool
• Bryan Ostdiek
• Charanjit Kaur
• Chase Owen Shimmin
• Christina Gao
• Christine Angela McLean
• Claudius Krause
• Cristina Ana Mantilla Suarez
• Dalila Salamani
• Dan Guest
• Daniel Williams
• David Miller
• David Shih
• Dogukan Kizbay
• Dylan Sheldon Rankin
• Eilam Gross
• Elham E Khoda
• Emma Grace Castiglia
• Engin Eren
• Eric Metodiev
• Erik Buhmann
• Frederic Alexandre Dreyer
• Gage DeZoort
• Garvita Agarwal
• George Stein
• Gilles Louppe
• Gregor Kasieczka
• Hannah Bossi
• Heiko Mueller
• Hugo Borges
• Huilin Qu
• Ines Ochoa
• Jack Collins
• Jan Offermann
• Jana Schaarschmidt
• Jannicke Andree Pearkes
• Jean-Francois Arguin
• Jeong Han Kim
• Jesse Thaler
• Joey Huston
• Johann Brehmer
• Jonathan Shlomi
• Joseph Walker
• João Pedro de Gonçalves
• Julia Lynne Gonski
• Karla Pena
• Kayoung Ban
• Kimmo Kallonen
• Kyle Stuart Cranmer
• Laura Havener
• Lauren Hay
• Luigi Sabetta
• Marat Freytsis
• Marco Bellagente
• Marco Farina
• Marko Jerčić
• Martin Erdmann
• Matthew Buckley
• Matthew Drnevich
• Matthew Schwartz
• Maurizio Pierini
• Maxim Perelstein
• Miaoyuan Liu
• Michel Luchmann
• Mukharbek Organokov
• Myeonghun Park
• Nhan Tran
• Niclas Eich
• Nilai Sarda
• Oz Amram
• Pablo Martín
• Patrick Komiske
• Philip Harris
• Philipp Windischhofer
• Pilette Jacinthe
• Prasanth Shyamsundar
• Ramon Winterhalder
• Reyer Edmond Band
• Ridhi Chawla
• Risi Kondor
• Robert John Bainbridge
• Salvatore Rappoccio
• Sang Eon Park
• Sanmay Ganguly
• Sascha Daniel Diefenbacher
• Savannah Jennifer Thais
• Sean Joseph Gasiorowski
• Sebastian Macaluso
• Serena Palazzo
• Shih-Chieh Hsu
• Sijun Xu
• Simone Francescato
• Stefano Carrazza
• Sung Hak Lim
• Syed Waqar Ahmed
• Tae Kim
• Taoli Cheng
• Thabang Lebese
• Tianji Cai
• Tommaso Dorigo
• Vinicius Massami Mikuni
• Yang-Ting Chien
• Yingying Li
• Wednesday, January 15
• Registration KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012

Registration and payment for dinner

• Introduction KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Conveners: Ben Nachman (Lawrence Berkeley National Lab. (US)), Tilman Plehn
• 1
Welcome and Logistics
Speakers: Kyle Stuart Cranmer (New York University (US)), Sebastian Macaluso (New York University)
• 2
Theory Introduction (20'+10')
Speaker: Prof. Maxim Perelstein
• 3
Experiment Introduction (20'+10')
Speaker: Alison Lister (University of British Columbia (CA))
• 4
ML Introduction (20'+10')
Speaker: Gilles Louppe (New York University (US))
• 10:40 AM
Coffee break KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
• Architectures KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Conveners: Matthew Schwartz, Taoli Cheng (University of Montreal)
• 5
CapsNets Continuing the Convolutional Quest

Convolutional Neural Networks are an important tool for image classification both in and outside of particle physics. Capsule networks allow us to expand on the standard CNNs setup, both to increase the networks performance and to give insight into its decision making processes. We demonstrate the use of the Capsule Networks by separating a resonance decaying to top quarks from both, QCD di-jet and the top continuum backgrounds and benchmarking them against classical analysis methods as well as other machine learning approaches. Further, we show the capsules’ ability to handle high activity environments such as associated top-Higgs production. Throughout all of this we find that the capsule structure allows us to interpret both the results and inner workings of the network. Finally, we present recent results of combining capsule networks with a Bayesian approach to uncertainty estimation.

Speaker: Sascha Daniel Diefenbacher (Hamburg University (DE))
• 6
Quark-gluon discrimination with point clouds

Quark-gluon tagging refers to the task of identifying the origin of a jet as produced from the hadronization of a gluon or a quark. Common methods rely on jet constituent properties to disentangle the two objects to varying degrees of success. In this talk an innovative method of classifying jets according to its constituents is introduced. The method uses the information of the constituents to build a graph-based neural network aided by attention mechanisms. The implementation is similar to the one presented in [1], achieving an improved performance for the methods described in [2].

[1] C. Chen, L. Z. Fragonara, and A. Tsourdos, Gapnet: Graph attention based point neural network for exploiting local feature of point cloud, 2019.
[2] H. Qu and L. Gouskos, ParticleNet: Jet Tagging via Particle Clouds, [arXiv:1902.08570]

Speaker: Vinicius Massami Mikuni (Universitaet Zuerich (CH))
• 7
Modeling a top jet classifier with two-point energy correlation and geometry of soft emission

We introduce a two-point energy correlation spectra analysis for the classification of top jets and QCD jets. The two-point energy correlation spectra based on the angle between constituents, which is the main parameter of the kinematics of parton shower and heavy particle decay, are useful for tagging Higgs jets with a multilayer perceptron (MLP) or logistic regression. On the other hand, the substructure of a top jet is more complicated than that of a Higgs jet, and additional variables are required. We use trimmed jets and subjets together with the spectra to encode the pattern of soft radiation and the three-point correlation into the two-point correlation spectra. For the classification model with these new inputs, we use a multilayer perceptron analyzing each category of two-point correlation spectra independently. We further compare the classification result with that of a convolutional neural network (CNN) with jet images. The performances of our method and that of the CNN are comparable within the uncertainty between Pythia8 and Herwig7 generated jets.

Speaker: Sung Hak Lim (KEK)
• 8
Deep Learning Jet Substructure from Two-Particle Correlation

Deciphering the complex information contained in jets produced in collider events requires a physical organization of the jet data. In this talk I will discuss the use of two-particle correlations (2PCs) by pairing individual particles as the initial jet representation from which a probabilistic model can be built. Particle momenta, as well as particle types and vertex information are included in the correlation. We construct a novel, two-particle correlation neural network (2PCNN) architecture by combining neural network based filters on 2PCs and a deep neural network for capturing jet kinematic information. We apply the 2PCNN to boosted boson and heavy flavor tagging and achieve excellent performance by comparing to models based on telescoping deconstruction. Major correlation pairs exploited in the trained models are also identified, which shed light on the physical significance of certain jet substructure.

Speaker: Dr Yang-Ting Chien (Stony Brook University)
• 9
CLARIANT: Covariant Lorentz Group Architecture for Artificial Neural Networks

We present a new neural network architecture, \NetworkName: a Lorentz covariant neural network architecture for learning the kinematics and properties of complex systems of particles. The novel design of this network implements activations as vectors that transform according to arbitrary finite-dimensional representations of the underlying symmetry group that governs particle physics, the Lorentz group. The fundamental nonlinearity in the network is given by the tensor product of such vectors, stored as collections of irreducible components, followed by the Clebsch-Gordan decomposition back into irreducibles. Consequently, the architecture of the network is inherently covariant under Lorentz transformations and is capable of learning not only fully Lorentz-invariant objectives such as classification probabilities, but also Lorentz-covariant vector-valued objectives such as 4-momenta, while exactly respecting the action of the group. Imposing the symmetry leads to a significantly smaller ansatz (fewer learnable parameters than competing non-covariant networks), and potentially a much more interpretable model. To demonstrate the capability and performance of this network, we study the ability to classify systems of charged and neutral particles at the Large Hadron Collider resulting from the production and decay of highly energetic quarks and gluons. Specifically, we choose the benchmark task of classifying and discriminating jets formed from the hadronic decays of Lorentz-boosted massive particles from the background of light quarks and gluon jets. We show that we are able to achieve similar performance compared to other state-of-the-art neural networks trained to perform this classification task while also maintaining significantly broader generality regarding the structural origin of the physical processes involved.

Speaker: Alexander Bogatskiy
• 12:50 PM
Lunch
• Generative Models KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Conveners: Jana Schaarschmidt (University of Washington (US)), Martin Erdmann (Rheinisch Westfaelische Tech. Hoch. (DE))
• 10
Introduction
Speakers: Jana Schaarschmidt (University of Washington (US)), Martin Erdmann (Rheinisch Westfaelische Tech. Hoch. (DE))
• 11
DijetGAN: A Generative-Adversarial Network approach for the simulation of QCD Dijet events at the LHC

In the coming years, the experiments at the LHC will collect a significant amount of data which will require a similarly large increase in number of Monte Carlo (MC) events. This will force experiments to move to fast simulation to be able to produce the required number of MC events. At the same time, theorists are developing better but more time-consuming event generators that will put further pressure on the limited computing resources.
In this context, Machine Learning (ML) techniques are considered to significantly speed-up both the generation and the simulation step of the MC production. In the work that will be presented, two Generative Adversial Networks (GANs) are used to generate and simulate dijet events. The GANs are trained on events generated using MadGraph5 + Pythia8, and simulated and reconstructed in Delphes3. A number of kinematic distributions, both at MC truth-level and after the detector simulation, are considered and the GANs can reproduce the input distribution.

Speaker: Serena Palazzo (The University of Edinburgh (GB))
• 12
A generator cell for LHC event GANs

We present a network for generative modeling of LHC events. We use Lorentz boosts, rotations, momentum and energy conservation to build a network cell generating a 2-body particle decay. We allow for modifications of the resulting four-vectors following a StyleGAN approach. We train the generator using the Lorentz Boost Network as a pre-stage of the critic’s network. We present first evaluations of the generator quality using Drell-Yan processes.

Speaker: Niclas Eich (RWTH Aachen University (DE))
• 13
How to GAN LHC Events

Event generation for the LHC can be supplemented by generative adversarial networks, which generate physical events and avoid highly inefficient event unweighting. For top pair production we show how such a network describes intermediate on-shell particles, phase space boundaries, and tails of distributions. It can be extended in a straightforward manner to include for instance off-shell contributions, higher orders, or approximate detector effects.

Speaker: Ramon Winterhalder (Universität Heidelberg)
• 14
GAN based event subtraction for Monte Carlo methods

We propose a novel method to subtract distributions represented by samples. We train a subGAN that takes event samples from two distributions to generate samples filling the difference between the distributions. While the algorithm can have various applications for Monte Carlo methods, we illustrate its performance for Z + jets NLO event generation.

Speaker: Anja Butter
• 15
Teaching a Computer to Integrate

Abstract: As the integrated luminosity of the LHC increases, the number of Monte Carlo (MC) events required increases as well. The cost of generating these events will eventually be cost prohibitive. Thus, improvements are required in event generation. The major inefficiency in the MC generation is from generating unweighted events. Using machine learning techniques, I will propose a new phase space integrator that will reduce the cost of generating high multiplicity events. Additionally, my approach improves on previous attempts in computational costs for high multiplicity events. The flow integrator will allow event generators to keep up with the needs of the LHC.

Speakers: Claudius Krause (Fermilab), Christina Gao
• 3:55 PM
Coffee break
• 16
Lund jet images from generative and cycle-consistent adversarial networks

We introduce a generative model to simulate radiation patterns within a jet using the Lund jet plane. We show that using an appropriate neural network architecture with a stochastic generation of images, it is possible to construct a generative model which retrieves the underlying two-dimensional distribution to within a few percent. We compare our model with several alternative state-of-the-art generative techniques. Finally, we show how a mapping can be created between different categories of jets, and use this method to retroactively change simulation settings or the underlying process on an existing sample. These results provide a framework for significantly reducing simulation times through fast inference of the neural network as well as for data augmentation of physical measurements.

Speakers: Frederic Alexandre Dreyer (Oxford), Stefano Carrazza (CERN)
• 17
Fast Calorimeter Simulation in ATLAS: FastCaloSimV2

The ATLAS physics program relies on very large samples of GEANT4 simulated events, which provide a highly detailed and accurate simulation of the ATLAS detector. But this accuracy comes with a high price in CPU, predominantly caused by the calorimeter simulation. The sensitivity of many physics analyses is already limited by the available Monte Carlo statistics and will be even more so in the future. Therefore, sophisticated fast simulation tools are developed. The calorimeter shower simulation of most samples in Run-3 will be based on a new parametrized description of longitudinal and lateral energy deposits (FastCaloSimV2). FastCaloSimV2 includes machine learning approaches to achieve a fast and accurate description, and to ensure its applicability to a broad variety of physics cases. In this talk, we will describe this new tool, focussing on the modelling of hadronic showers, and demonstrate its potential for physics applications.

Speaker: Sean Joseph Gasiorowski (University of Washington (US))
• 18
Fast Calorimeter Simulation in ATLAS with DNNs

The ATLAS physics program relies on very large samples of GEANT4 simulated events, which provide a highly detailed and accurate simulation of the ATLAS detector. But this accuracy comes with a high price in CPU, predominantly caused by the calorimeter simulation. The sensitivity of many physics analyses is already limited by the available Monte Carlo statistics and will be even more so in the future. Therefore, sophisticated fast simulation tools are developed. Prototypes are being developed using cutting edge machine learning approaches to learn the appropriate calorimeter response, which are expected to improve modeling of correlations within showers. Two different approaches, using Variational Auto-Encoders or Generative Adversarial Networks, are trained to model the shower simulation. These new tools are described and first results presented.

Speaker: Dalila Salamani (Universite de Geneve (CH))
• 19
Generative Models in CALICE

In this talk, we demonstrate the usage of Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) for modeling the electromagnetic showers in the context of proposed International Large Detector (ILD), in the central region of Silicon-Tungsten (Si-W) Electromagnetic Calorimeter. After successful completion of the training processes, the properties of synthesized showers are compared to the showers from a full detector simulation using Geant4. Our results demonstrates the potential of using such networks for fast calorimeter simulation for ILD detector in the future and opens the possibility to complement current simulation techniques.

Speaker: Engin Eren
• Workshop Dinner Bier Haus (Radegast Hall & Biergarten)

### Bier Haus

113 North 3rd Street Williamsburg, Brooklyn NY 11249

Workshop dinner at Radegast Hall, Bier Haus

• Thursday, January 16
• Decorrelation and Semi/Unsupervised approaches KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Conveners: Chase Owen Shimmin (Yale University (US)), Nhan Viet Tran (Fermi National Accelerator Lab. (US))
• 20
Data ex Machina: Machine Learning with Jets in CMS Open Data

In this talk, I explore unsupervised and supervised machine learning techniques using CMS Open Data. I introduce a metric between jets based on the earth (or energy) mover's distance: the “work” required to rearrange one event into the other. Using this metric, I will probe the metric space of jets using unsupervised methods. Further, training supervised jet classifiers directly on data can potentially overcome the problematic reliance on simulated training data. I apply weakly supervised methods to train quark/gluon classifiers directly on the data and probe what the machine has learned. To enable machine learning research for jet physics using real LHC data, this dataset of over one million jets is made publicly available along with corresponding simulation.

Speaker: Eric Metodiev (Massachusetts Institute of Technology)
• 21
Representation Learning for Collider Events

Variational Autoencoders (VAEs) can be trained to learn representations of metric spaces. I will show how a VAE trained to minimize the Earth Movers Distance (EMD) between input and reconstructed jets learns to represent jet features associated with hierarchically different energy scales in orthogonal directions of its latent space. I will also illustrate the relationship between the scale-dependent dimensionality of the learnt representation and the dimensionality of the metric space.

Speaker: Jack Collins (SLAC)
• 22
Tagger-mass decorrelation: experience within CMS

Jet substructure tagging of highly boosted heavy resonances decaying to quarks has become an important tool for Standard Model (SM) measurements and searches for beyond the SM physics. Background estimation typically rely on at least 3 data sideband regions that can be separated from the signal region with the physics process of interest by a set of two uncorrelated variables. For searches with boosted objects jet substructure taggers that are decorrelated from the jet mass have proven very useful. This talk discussed such tagger with and without the use of machine learning and explain their relevance to physics analyses.

Speaker: Huilin Qu (Univ. of California Santa Barbara (US))
• 23
Disco Fever

With great classification power comes great responsibility: Now that deep-learning is the de-facto standard for jet classification in high-energy physics, attention needs to be paid to aspects beyond performance. A key issue is the question of decorrelation - how a classifier output can be made independent of other salient variables such as the jet's mass. Achieving reliable decorrelation is crucial for stabilising the network response against systematic uncertainties and for building robust analysis strategies in background rich environments. So far, the most powerful decorrelation approaches are based on adversarial training: two networks performing competing tasks. These are notoriously difficult to train, as the two networks must be carefully tuned against one another, and their objective is unbounded from below. We show how a positive regulariser term based on the distance correlation metric can achieve state-of-the-art decorrelation performance with much simpler training.

Speaker: Gregor Kasieczka (Hamburg University (DE))
• 24
A Normalizing Flow Model for Boosted Jets

In this work, we consider dijet production and present a model for the distribution of three-momenta of particles constituting the two boosted jets. Our method involves starting with a simple probability distribution for the momenta in the rest frame of the jets’ parent. Then, we use the Lorentz transformation to map the rest frame momentum distribution into a model of the boosted momenta parameterized by the velocity of the jets’ parent. Maximum likelihood estimation can then quickly estimate this velocity for a dataset. Future work could use this model for efficiently generating jet data, tagging jets, or inferring other physical parameters.

Speaker: Matthew Drnevich (NYU)
• 25
Metrics and Machine Learning Algorithms for Collider Space

When the space of collider events is equipped with a metric, many simple-to-use machine learning algorithms can be applied to perform the task of jet tagging. Here we explore several different generalizations of the Energy Mover’s Distance. The computed distance matrices are fed into both supervised and unsupervised learning models, and their performances in distinguishing various types of jets are quantified and compared. This in turn offers an estimate on the suitability of the metrics themselves for the underlying event space at hand, aiding the selection of the most appropriate metric-model pair. The framework thus paves the way for future applications of metric-based machine learning for collider physics.

Speaker: Ms Tianji Cai (University of California, Santa Barbara)
• 11:00 AM
Coffee break KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
• ML Beyond HEP KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Convener: Kyle Stuart Cranmer (New York University (US))
• 26
Optimal Transport
Speaker: Jonathan Niles-Weed (NYU)
• 27
Beyond monotonic, autoregressive sequence modeling
Speaker: Kyunghyun Cho
• 28
Guided discussion
• 1:00 PM
Lunch
• Anomaly detection (LHCO) KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Conveners: David Shih (Rutgers University), Gregor Kasieczka (Hamburg University (DE))
• 29
LHCO Introduction and overview
Speakers: David Shih (Rutgers University), Gregor Kasieczka (Hamburg University (DE))
• 30
Unsupervised new physics searches with data-driven inference

We have developed a framework for performing unsupervised new physics searches using jet substructure in di-jet events, where the likelihood functions are inferred in a data-driven manner. This framework is based on a machine learning algorithm called Latent Dirichlet Allocation, a statistical model initially used for describing the topical structure of documents. In this talk I will present an overview of and results from this recent work. The results will include: a proof-of-principle analysis on boosted final states from top-quark pair-production, a new physics search for a 3TeV W' boson decaying to boosted jets, and results from the LHC Olympics challenge.

Speaker: Dr Dillon Barry (Jozef Stefan Institute)
• 31
Tag N’ Train : Combining Autoencoders and CWoLa for Better Unsupervised Searches

As our jet classifiers grow in complexity, limitations in simulating QCD will start to bottleneck our ability to train classifiers that perform as well on data as they do in simulation. One proposed approach to avoid this problem is the CWoLa method, in which the classifier is trained directly on data to distinguish between statistical mixtures of classes. The main challenge when applying this technique is that it can be difficult to find information orthogonal to the classification task and that can be used to select the mixed samples in data. To address this, we introduce a new approach, called Tag N’ Train (TNT) where one uses a weak classifier in order to tag signal-rich samples that are used to train a stronger classifier. To demonstrate the power of this approach we apply it to an unsupervised dijet search. In the search, separate autoencoders are trained on the leading and sub-leading jets in the sample. Then, one defines signal-rich and background-rich samples of events based on the autoencoder reconstruction loss of the leading jet. This allows one to use the CWoLa method to train a new classifier for the sub-leading jet to distinguish between these two mixed samples. This procedure can then be swapped to train a classifier for the leading jet. We show that the resulting TNT classifiers perform significantly better than using the autoencoders as classifiers, thus greatly enhancing the sensitivity of the search.

Speaker: Oz Amram (Johns Hopkins University (US))
• 32
Variational Autoencoders for Anomalous Jet Tagging

We present a detailed study on Variational Autoencoders (VAEs) performing in anomalous jet tagging. By taking in low-level jet constituents' information, and only training with background jets in an unsupervised manner, the VAE is able to encode important information for reconstructing jets, while learning an expressive posterior distribution in the latent space. The encoder (inference) and decoder (generation) can be used together or seperately to identify out-of-distribution anomalous jets. We employed different techniques to regularize the latent representation, and show how the behavior changes. When using VAE as anomaly detector, we present two approaches to detect anomalies: directly comparing in input space or, instead, working in latent space. Results of tagging performance for different jet types and full kinematic range are shown. In addition, we also study a few tricks to make VAE more sensitive to anomalies.

Speaker: Taoli Cheng (University of Montreal)
• 3:50 PM
Coffee break
• 33
Comparing weak and unsupervised anomaly detection

A comparison of CWoLa hunting and autoencoders.

Speaker: Pablo Martín
• 34
LHC Olympics 2020: Columbia University

We present results of an anomaly detection method via sequence modeling.

Speaker: Alan Mathew Kahn (Columbia University (US))
• 35
LHC Olympics 2020: MIT

<empty>

Speaker: Nilai Sarda (MIT)
• 36
LHC Olympics 2020: Berkeley Cosmology
Speaker: George Stein
• 37
LHCO2020: Outcome of the Challenge

We will unveil the answer to Black Box 1 and discuss the outcomes of the challenge. Note that there are two talks here - one that was hurriedly prepared for the unveiling of the results at the time of the challenge and a second, polished set of slides that we have prepared without time pressure after the workshop.

Speakers: Ben Nachman (Lawrence Berkeley National Lab. (US)), Gregor Kasieczka (Hamburg University (DE)), David Shih (Rutgers University)
• Friday, January 17
• Machine Learning Inference and Interpretation KC 914

### KC 914

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Conveners: Anja Butter, Jesse Thaler (MIT)
• 38
Deep-Learning Jets with Uncertainties and More

Bayesian neural networks allow us to keep track of uncertainties, for example in top tagging, by learning a tagger output together with an error band. We illustrate the main features of Bayesian versions of established deep-learning taggers. We show how they capture statistical uncertainties from finite training samples, systematics related to the jet energy scale, and stability issues through pile-up. Altogether, Bayesian networks offer many new handles to understand and control deep learning at the LHC without introducing a visible prior effect and without compromising the network performance.

Speaker: Michel Luchmann (Universität Heidelberg)
• 39
Bounding high-dimensional uncertainties with adversarial approaches
Speaker: Chase Owen Shimmin (Yale University (US))
• 40
OmniFold: Simultaneously Unfolding All Observables

Unfolding is the procedure by which the "recorded" detector-level distribution of an observable is corrected for detector effects and other sources of noise to obtain the "true" particle-level distribution. In high-energy particle physics, unfolding is a ubiquitous part of measurements at the LHC. The current state-of-the-art procedure, Iterated Bayesian Unfolding (IBU), is typically applied to only a one-dimensional recorded distribution to obtain a one-dimensional true distribution, ignorant of other correlations present and requiring a separate unfolding for each observable. In this talk, I will exhibit a method, called Omniscient Unfolding (OmniFold), that takes advantage of the full phase-space information at both detector and truth levels to solve the unfolding problem. OmniFold learns a universal weighting of truth events such that the distribution of any observable can be calculated. The method is demonstrated and compared to IBU using the Herwig and Pythia event generators, the Delphes simulation package, and a variety of jet substructure observables, showing equal or improved robustness in all cases.

Speaker: Patrick Komiske (Massachusetts Institute of Technology)
• 41
GANning away detector effects

ML tools based on generative models, such as cycleGANs and invertible architectures, can be used to address the problem of unfolding detector effects, a challenge for data analysis at hadronic colliders.

Speaker: Marco Bellagente (Universität Heidelberg)
• 10:20 AM
Coffee break
• 42
Looking into Jets with Machine Learning

In this talk, we review how machine learning is changing the way we are thinking about jets. First, we present a simplified model to aid in machine learning research for jet physics, that captures the essential ingredients of parton shower generators in full physics simulations. We study how to unify generation and inference, where we aim to invert the generative model to estimate the clustering history (or posterior distribution on histories) conditioned on the observed particles. For this task, we introduce new algorithms (in the context of jet physics), together with visualizations, and metrics to compare them and probe the generative model.

Speaker: Sebastian Macaluso (New York University)
• 43
Realigning the goals of machine learning with the goals of physics

One of the most common applications of machine learning in high energy physics is in event selection (and categorization). The physics goals of event selection and categorization are to improve the significance of a potential excess (for signal discovery/upper limit setting analyses), and to reduce the uncertainty of a parameter measurement (parameter measurement analyses).

Event selection using machine learning is based on the "signal is better than background" heuristic. While it is clear how the heuristic would help with the physics goals, it turns out that they are not completely aligned. In fact, certain signal events could be worse for the sensitivity of an analyses than certain background events.

In this talk we will provide optimal event selector and categorizer training prescriptions designed to maximize the expected statistical significance of an excess (by changing how ML outputs are used), and minimize the statistical uncertainty of a measurement (by changing the supervisory signal used in training the ML algorithms). Along the way, we will point out exactly how our methods realign the goals of event selection and categorization with the physics goals. Finally, we will indicate how our method can be extended to minimize the systematic uncertainties in parameter measurements as well.

Speaker: Prasanth Shyamsundar (University of Florida)
• 44
ROB: Reproducible Open Benchmarks for Data Analysis Platform

In this talk, we present exploratory work to enable benchmark tests for physics challenges, such as “The Machine Learning Landscape of Top Taggers” comparison or the LHCOlympics2020. We introduce the “Reproducible Open Benchmarks for Data Analysis Platform” (ROB) for this task and we aim to show a demo where ROB is implemented on a sample case. Given a benchmark workflow, users would provide code with their algorithm (e.g. docker containers) and trained parameters. Then the back-end would process the workflow (the algorithms could also be part of a downstream analysis task) and evaluate the metrics on a test dataset. Finally, plots and tables would be updated.

Speakers: Sebastian Macaluso (New York University), Heiko Mueller
• 11:50 AM
Lunch
• Experimental methods KC 914

### KC 914

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Convener: Philip Coleman Harris (Massachusetts Inst. of Technology (US))
• 45
Deep Learning based Energy Reconstruction and Event Generation for the CALICE AHCAL

The CALICE collaboration is developing high-granular calorimeters for the application of particle flow reconstruction to calorimetry in future linear collider experiments. An engineering prototype for an analogue hadron calorimeter (AHCAL) was assembled by the CALICE collaboration. Events measured by the AHCAL include 5-dimensional information in 22k channels: the 3D location, energy and timing of each hit are recorded. Pion test beam runs and a Monte Carlo simulation of the test beam setup are used to determine the energy resolution of the AHCAL. An improvement over the energy resolution obtained with standard reconstruction can be achieved by employing supervised machine learning approaches.

Deep neural networks can be used to reconstruct the event energy from the full 5D calorimeter image. Multiple architectures are presented and compared to traditional approaches. Using locally connected layers with cell-wise hit energy weighting shows promising results for shower leakage compensation. Convolutional networks improve energy resolution but suffer from overfitting at the phase space boundaries. Combined approaches and networks incorporating 5D information are discussed and practical lessons learned for regression tasks are presented. Beyond regression, we also show recent progress in interpretating the latent space of Variational Autoencoders for the generation of hadronic showers.

Speaker: Erik Buhmann (Hamburg University (DE))
• 46
Convolutional neural networks with event images for pileup mitigation [cancelled due to illness]

The addition of multiple, nearly simultaneous proton proton collisions to hard-scatter collisions (in-time pileup) is a significant challenge for most physics analyses at the LHC. Many techniques have been proposed to mitigate the impact of pileup on jets and other reconstructed objects. This study investigates the application of convolutional neural networks to pileup mitigation by treating events as images. By optimally combining low-level information about the event, the neural network can potentially provide a eventwise pileup energy correction. The impact of this correction is studied in the context of a global event observable: the missing transverse momentum, a variable particularly sensitive to pileup. The potential benefits of a neural network approach are analyzed alongside other constituent pileup mitigation techniques and the ATLAS default reconstruction algorithm.

Speaker: Bernard Brickwedde (Johannes Gutenberg Universitaet Mainz (DE))
• 47
Secondary Vertex finding in Jets with Graph Neural Networks

Secondary vertex finding is a crucial task for identifying jets containing heavy flavor hadron decays.
Bottom jets in particular have a very distinctive topology of 𝑏→𝑐→𝑠 decay which gives rise to two secondary vertices with high invariant mass and several associated charged tracks.

Existing secondary vertex finding algorithms search for intersecting particle tracks, and group them into secondary vertices based on geometrical constraints only. We propose an algorithm where the vertex finding step is performed with a graph neural network. Particle tracks are represented as nodes in a fully connected graph, and the task of vertex finding is cast as a node and edge classification task.

We present performance metrics for evaluating vertex finding performance, and compare the performance of several different graph network architectures on a simulated dataset.

Speaker: Eilam Gross (Weizmann Institute of Science (IL))
• 48
Mixture Density Networks for tracking in dense environments on ATLAS

The high collision energy and luminosity of the LHC allow to study jets and hadronically-decaying tau leptons at extreme energies with the ATLAS detector. These signatures lead to topologies with charged particles with an angular separation smaller than the size of the ATLAS Inner Detector sensitive elements and consequently to a reduced track reconstruction efficiency. In order to regain part of the track reconstruction efficiency loss, a neural network (NN) based approach was adopted in the ATLAS pixel detector in 2011 for estimating particle hit multiplicity, hit positions and associated uncertainties. Currently used algorithms and their performance in ATLAS will be summarized in the talk. An alternative algorithm based on Mixture Density Network (MDN) is currently being studied and the initial performance is promising. An overview of MDN algorithm and its performance will be highlighted in the talk. Comparisons will also be made with the currently used NNs in ATLAS tracking.

Speaker: Elham E Khoda (University of British Columbia (CA))
• 49
Deep learning methods to improve Particle Flow reconstruction

Canonical particle flow algorithm tries to estimate neutral energy deposition in calorimeter by first performing a matching between calorimeter deposits and track
direction and subsequently subtracting the track momenta from the matched cluster energy deposition.
We propose a Deep Learning based method
for estimating the energy fraction of individual components for each cell of the calorimeter.
We build the dataset by a toy detector (with different resolutions per calorimeter layer) using GEANT and apply image based deep neural network models to regress the fraction of neutral energy per cell of the
detector. Comparison of the performance of several different models is carried out.

Speaker: Sanmay Ganguly (Weizmann Institute of Science (IL))
• 50
Machine Learning Based Jet $p_{T}$ Reconstruction in ALICE

Reconstructing the jet transverse momentum ($p_{\rm T}$) is a challenging task, particularly in heavy-ion collisions due to the large fluctuating background from the underlying event. While ALICE's standard area-based method effectively corrects for the average background, it does not account for region-to-region fluctuations. These residual fluctuations are handled in an unfolding procedure following the background subtraction, which is made easier when these fluctuations are reduced.

A novel method to correct the jet $p_{\rm T}$ on a jet-by-jet basis using machine learning techniques to reduce these fluctuations will be presented. This approach uses jet properties, including the constituents of the jet to create a mapping between the corrected and uncorrected jet $p_{\rm T}$. The performance of this approach is evaluated using jets from PYTHIA simulations embedded into ALICE Pb--Pb data. Various machine learning techniques are compared including shallow neural network, random forest, and linear regression algorithms. This method introduces some dependence on the fragmentation of the jet and investigations into the extent and impact of this bias will be shown. In comparison to the area-based method, these machine learning based estimators show a significantly improved performance, which enables measurements of jets to lower transverse momenta and larger jet radii.

Speaker: Hannah Bossi (Yale University (US))
• 2:50 PM
Coffee break KC 914

### KC 914

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
• Applications KC 914

### KC 914

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Conveners: Christine Angela McLean (SUNY Buffalo), Dan Guest (University of California Irvine (US))
• 51
Jet substructure tagging and pileup mitigation

Recent advances in neural networks and harsh pileup conditions in the second half on LHC Run 2 with on average 38 PU interactions, have sparked significant developments in techniques for jet tagging and missing transverse momentum reconstruction. Through the study of jet substructure properties, jets originating from quarks, gluons, W/ Z/Higgs bosons, top quarks and pileup interactions are distinguished, surpassing previous performance at lower pileup conditions by using new approaches. This talk will give an overview of the new jet substructure and pileup mitigation tools and advances in performance of jet and missing transverse momentum reconstruction in CMS.

Speaker: Alejandro Gomez Espinosa (ETH Zurich (CH))
• 52
Machine learning approaches to the identification of jets originating from heavy-flavor quarks.

The identification of jets originating from heavy-flavor quarks (b-quark, c-quark) is central to the LHC physics program. High-performance heavy-flavor tagging is necessary both in precise standard model measurements as well as in searches for new physics. Jets containing heavy-flavor have a distinct characteristics, but the production rate of such jets is several orders of magnitude smaller than the backgrounds. To identify b- and c-jets with the necessary background rejection, ATLAS uses BDTs, RNNs, and deep learning techniques to combine many low-level discriminating observables reconstructed in LHC collision events. We present the latest heavy-flavor jet tagging algorithms developed by the ATLAS collaboration and discuss their expected performance in simulation as well as their measured performance in collision data.

Speaker: Philipp Windischhofer (University of Oxford (GB))
• 53
Searching for long lived particles with a neural-network-based displaced jet tagger

We present a neural-network-based tagger that is trained to identify the presence of displaced jets arising from the decays of new long-lived particle (LLP) states in data recorded by the CMS detector at the CERN LHC. Information from individual particles and secondary vertices within jets are refined through the use of convolutional networks before being combined with high-level engineered variables via a dense network. The LLP lifetime is an input parameter of the network, which allows for hypothesis testing over several orders of magnitude in lifetime, from cτ = 10 μm to 10 m. We define a method based on truth information from Monte Carlo simulation to reliably label jets originating from an LLP decay, for the purposes of supervised training. The training is performed by streaming ROOT trees containing O(100M) jets directly into the TensorFlow queue and threading system. This custom workflow allows a flexible selection of input features and the asynchronous preprocessing of data, such as the resampling and shuffling of batches on the CPU, in parallel to training on the GPU. Domain adaptation is performed with control samples of pp collision data to ensure good agreement between data and Monte Carlo simulation. The tagger performance demonstrates only a moderate dependence on the new-physics model. The tagger is applied in a search for split supersymmetry in final states with jets and significant missing energy.

Speaker: Robert John Bainbridge (Imperial College (GB))
• 54
Jet or Event? - Physics at Future $e^-e^+$ Colliders

Information loss caused by dimension reduction in jet clustering is one of the major limitations for the measurement precision of hadronic events at future $e^-e^+$ colliders, where the precision frontier of particle physics for next decades is expected to be defined. Such measurements are key for probing, e.g., the nature of Higgs boson, since the hadronic events are dominant in Higgs data. We show that this difficulty could be well-addressed using the machine-learning (ML) techniques at event-level. For this purpose, a comparative ML-based study is pursued between jet-level and event-level analyses, in the benchmark scenarios with two, four, and six expected jets in each event, respectively. We explore how the precision of the benchmark measurements gets improved with the assistance of the information beyond jet level. As an application of this method, we analyze the precision of measuring the Higgs total width at 240 GeV $e^-e^+$ colliders (which involves analyzing the hadronic events with two, four and six expected jets) and its dependence on the detector resolution, and show that the precision can be significantly improved in comparison to the one presented in literatures and documents. We expect that the proposed method can be broadly applied to many other hadronic-event measurements at future $e^-e^+$ colliders.

Speaker: Mr Sijun Xu (Department of Physics, The Hong Kong University of Science and Technology)
• 55
The Di-Higgs Photography with Deep Neural Networks

We search for a hint of new physics concealed in the structure of the Standard Model (SM) via double Higgs production. Focusing on a relatively overlooked bbWW* final state, we portray an entire final state using charged/neutral hadron, lepton, and reconstructed neutrino images. We design various types of residual neural networks (ResNet), which efficiently exploit the correlations among the images, to disentangle the Di-Higgs images of anomalous Higgs self-coupling against the SM backgrounds. The proposed method has a potential to improve the precise measurement of the Higgs self-coupling, and has a wide applicability to disentangle the higher dimensional operators in the effective field theory (EFT) framework.

Speaker: Dr Jeong Han Kim (University of Notre Dame)
• 56
Cornering charming Higgs decays

This talk discusses on how to identify events with fatjets from charming Higgs decays, $H\to cc$, at the LHC. To reduce the overwhelmingly large backgrounds and to reduce false positives, we consider applying a combination of jet shape observables and imaging techniques, using a selection of neural network architectures.

Speaker: Mr Joseph Walker (University of Durham )
• 57
Using machine learning to constrain the Higgs total width

Despite the discovery of the Higgs boson decay in five separate channels many parameters of the Higgs boson remain largely unconstrained. In this paper, we present a new approach to constraining the Higgs total width by requiring the Higgs to be resolved as a single high pT jet and measuring the visible and partially visible Higgs boson cross section. This approach complements existing approaches from the off-shell technique and lepton colliders. To measure the Higgs boson decays, we rely on new ideas from machine learning for jet classification and a modified jet reconstruction that uses a dedicated missing energy regression. With some assumptions, this approach is found to be capable of yielding similar sensitivity to the off-shell projections with the full High Luminosity-LHC dataset. We outline the theoretical and experimental limitations of this approach and present a path towards making a truly model-independent measurement of the Higgs boson total width.

Speakers: Dylan Sheldon Rankin (Massachusetts Inst. of Technology (US)), Cristina Ana Mantilla Suarez (Johns Hopkins University (US))
• 58
Closeout KC 802

### KC 802

#### Kimmel Center for University Life

60 Washington Square S, New York, NY 10012
Speaker: Tilman Plehn