Jan 15 – 17, 2020
Kimmel Center for University Life
America/New_York timezone

Data ex Machina: Machine Learning with Jets in CMS Open Data

Jan 16, 2020, 9:00 AM
KC 802 (Kimmel Center for University Life)

KC 802

Kimmel Center for University Life

60 Washington Square S, New York, NY 10012


Eric Metodiev (Massachusetts Institute of Technology)


In this talk, I explore unsupervised and supervised machine learning techniques using CMS Open Data. I introduce a metric between jets based on the earth (or energy) mover's distance: the “work” required to rearrange one event into the other. Using this metric, I will probe the metric space of jets using unsupervised methods. Further, training supervised jet classifiers directly on data can potentially overcome the problematic reliance on simulated training data. I apply weakly supervised methods to train quark/gluon classifiers directly on the data and probe what the machine has learned. To enable machine learning research for jet physics using real LHC data, this dataset of over one million jets is made publicly available along with corresponding simulation.

Primary authors

Eric Metodiev (Massachusetts Institute of Technology) Patrick Komiske (Massachusetts Institute of Technology) Radha Mastandrea (Massachusetts Institute of Technology) Preksha Naik (Massachusetts Institute of Technology) Jesse Thaler (MIT)

Presentation materials