15–17 Jan 2020
Kimmel Center for University Life
America/New_York timezone

LHCOlympics2020 R&D

Despite an impressive and extensive effort by the LHC collaborations, there is currently no convincing evidence for new particles produced in high-energy collisions.  At the same time, there has been a growing interest in machine learning techniques to enhance potential signals using all of the available information.  

In the spirit of the first LHC Olympics (circa 2005-2006) [1st, 2nd, 3rd, 4th], we are organizing the 2020 LHC Olympics.  Our goal is to ensure that the LHC search program is sufficiently well-rounded to capture "all" rare and complex signals.  The final state for this olympics will be focused (generic dijet events) but the observable phase space and potential BSM parameter space(s) are large: all hadrons in the event can be used for learning (be it "cuts", supervised machine learning, or unsupervised machine learning).

For setting up, developing, and validating your methods, we provide background events and a benchmark signal model.  You can download these from this page.  To help get you started, we have also prepared simple python scripts to read in the data and do some basic processing. 

The final test will happen 2 weeks before the ML4Jets2020 workshop.  We will release a new dataset where the "background" will be similar to but not identical to the one in the development set (as is true in real data!).  The goal of the challenge is to see who can "best" identify BSM (yes/no, what mass, what cross-section) in the dataset.  There are many ways to quantify "best" and we will use all of the submissions to explore the pros/cons of the various approaches.

To keep the scope limited, all signals will be of the form X -> hadrons, where X is a new massive particle with an O(TeV) mass.  The events require at least one R = 1.0 jet with pT > 1.2 TeV.  For each event, we provide a list of all hadrons (pT, eta, phi, pT, eta, phi, ...) zero-padded up to 700 hadrons.

We strongly encourage you to publish your original research methods using these datasets (before or after) the unveiling of the results.  Anyone who participates will be part of a summary paper to be prepared following the workshop.

Please do not hesitate to ask questions:  we will use the ML4Jets slack channel to discuss technical questions related to this challenge. 

Good luck!

Gregor Kasieczka, Ben Nachman, and David Shih