Oct 10 – 14, 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Tracking Machine Learning Challenge

Oct 10, 2016, 11:30 AM
15m
Sierra A (San Francisco Mariott Marquis)

Sierra A

San Francisco Mariott Marquis

Oral Track 5: Software Development Track 5: Software Development

Speaker

Paolo Calafiura (Lawrence Berkeley National Lab. (US))

Description

The instantaneous luminosity of the LHC is expected to increase at HL-LHC so that the amount of pile-up can reach a level of 200 interaction per bunch crossing, almost a factor of 10 w.r.t the luminosity reached at the end of run 1. In addition, the experiments plan a 10-fold increase of the readout rate. This will be a challenge for the ATLAS and CMS experiments, in particular for the tracking, which will be performed with a new all Silicon tracker in both experiments. In terms of software, the increased combinatorial complexity will have to be dealt with within flat budget at best.
Preliminary studies show that the CPU time to reconstruct the events explodes with the increased pileup level. The increase is dominated by the increase of the CPU time of the tracking, itself dominated by the increase of the CPU time of the pattern recognition stage. In addition to traditional CPU optimisation and better use of parallelism, exploration of completely new approaches to pattern recognition has been started.
To reach out to Computer Science specialists, a Tracking Machine Learning challenge (trackML) has been set up, building on the experience of the successful Higgs Machine Learning challenge in 2014 (see talk by Glen Cowan at CHEP 2015). It associates ATLAS and CMS physicists with Computer Scientists. A few relevant points:

  • A dataset consisting of a simulation of a typical full Silicon LHC experiments has been created, listing for each event the measured 3D points, and the list of 3D points associated to a true track. The data set is large to allow the training of data hungry Machine Learning methods : the orders of magnitude are : one million event, 10 billion tracks, 1 terabyte.
  • The participants to the challenge should find the tracks in an additional test dataset, meaning building the list of 3D points belonging to each track (deriving the track parameters is not the topic of the challenge)
  • A figure of merit has been defined which combines the CPU time, the efficiency and the fake rate (with an emphasis on CPU time)
  • The challenge platforms allow measuring the figure of merit and to rate the different algorithms submitted.

The emphasis is to expose innovative approaches, rather than hyper-optimising known approaches. Machine Learning specialists have shown a deep interest to participate to the challenge, with new approaches like Convolutional Neural Network, Deep Neural Net, Monte Carlo Tree Search and others.

Primary Keyword (Mandatory) Algorithms
Secondary Keyword (Optional) Artificial intelligence/Machine learning
Tertiary Keyword (Optional) Outreach

Primary author

David Rousseau (LAL-Orsay, FR)

Co-authors

Andreas Salzburger (CERN) Cecile Germain (Universite Paris Sud) Davide Costanzo (University of Sheffield (GB)) Isabelle Guyon Dr Jean-Roch Vlimant (California Institute of Technology (US)) Markus Elsing (CERN) Michael Aaron Kagan (SLAC National Accelerator Laboratory (US)) Paolo Calafiura (Lawrence Berkeley National Lab. (US)) Rebecca Carney (Stockholm University (SE)) Riccardo Cenci (U) Steven Andrew Farrell (Lawrence Berkeley National Lab. (US)) Tobias Golling (Universite de Geneve (CH)) Tony Tong (Harvard University (US)) Vincenzo Innocente (CERN)

Presentation materials