Jan 15 – 17, 2020
Kimmel Center for University Life
America/New_York timezone

Tag N’ Train : Combining Autoencoders and CWoLa for Better Unsupervised Searches

Jan 16, 2020, 3:10 PM
KC 802 (Kimmel Center for University Life)

KC 802

Kimmel Center for University Life

60 Washington Square S, New York, NY 10012


Oz Amram (Johns Hopkins University (US))


As our jet classifiers grow in complexity, limitations in simulating QCD will start to bottleneck our ability to train classifiers that perform as well on data as they do in simulation. One proposed approach to avoid this problem is the CWoLa method, in which the classifier is trained directly on data to distinguish between statistical mixtures of classes. The main challenge when applying this technique is that it can be difficult to find information orthogonal to the classification task and that can be used to select the mixed samples in data. To address this, we introduce a new approach, called Tag N’ Train (TNT) where one uses a weak classifier in order to tag signal-rich samples that are used to train a stronger classifier. To demonstrate the power of this approach we apply it to an unsupervised dijet search. In the search, separate autoencoders are trained on the leading and sub-leading jets in the sample. Then, one defines signal-rich and background-rich samples of events based on the autoencoder reconstruction loss of the leading jet. This allows one to use the CWoLa method to train a new classifier for the sub-leading jet to distinguish between these two mixed samples. This procedure can then be swapped to train a classifier for the leading jet. We show that the resulting TNT classifiers perform significantly better than using the autoencoders as classifiers, thus greatly enhancing the sensitivity of the search.

Primary author

Oz Amram (Johns Hopkins University (US))


Cristina Ana Mantilla Suarez (Johns Hopkins University (US))

Presentation materials