Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

Data Science @ LHC 2015 Workshop

Europe/Zurich
222/R-001 (CERN)

222/R-001

CERN

200
Show room on map
Andrew Lowe (Hungarian Academy of Sciences (HU)), Cecile Germain-Renaud (LRI), Daniel Whiteson (University of California Irvine (US)), David Rousseau (LAL-Orsay, FR), Gilles Louppe (CERN), Jean-Roch Vlimant (California Institute of Technology (US)), Kyle Stuart Cranmer (New York University (US)), Maria Spiropulu (California Institute of Technology (US)), Maurizio Pierini (California Institute of Technology (US)), Vladimir Gligorov (CERN)
Description

The LHC experiments have been producing the largest amount of complex data.  100TB/s of real-time data analyses and analyses of 100 EB of data are anticipated  and planned for. The field of data science beyond statistical methods  has been producing advanced, intelligent methods for data analysis, pattern recognition and model inference. This workshop will engage the two communities towards cross exchanges and applications that can forge accelerated progress in big basic science questions. 

Some of the topics that will be addressed  include cutting edge pattern recognition methods for  elementary particle identification; intelligent detectors that learn from their failures and self-adjust to increase their performance efficiency; fast reconstruction of  charged particle tracks;  high-rate event selection algorithms that learn to select rare physics processes;  advanced data  techniques  that can guide discovery and other challenges that can profit from advanced  computational methods and resources. 

The workshop includes plenary presentations, tutorials and hands-on hackathon-type of ML exercises as well as directed and undirected discussion and brainstorming time.

Subscribe  to the participants mailing list for discussions on the topic and announcements before and during the workshop by sending email  to: HEP-data-science+subscribe@googlegroups.com

Follow the workshop official account @DataScienceLHC . Feel free to tweet using the recommended hash tag #DSLHC15

The workshop will take place at CERN, it is open to anyone with an interest on Data Science application to High Energy Physics. There are no fees but registration for attendance in person is necessary for organization purposes. Registration for non-CERN users is prerequisite in order to gain access to the CERN site during the workshop.

Registration is closed at this time. However, the event will be in video conference and on CERN webcast.

For accommodation and access to CERN as well as laptop registration, check the registration page

Participants
  • Adam Elwood
  • Adinda de Wit
  • Adrian Bevan
  • Akshay Katre
  • Albert Puig Navarro
  • Alejandro Gomez Espinosa
  • Alexander Zamyatin
  • Alexei Klimentov
  • Allen Egon Cholakian
  • Amithabh Shrinivas
  • Ananya Ananya
  • Andre Georg Holzner
  • Andrea Coccaro
  • Andrea Bocci
  • Andrea Giammanco
  • Andrew Hard
  • Andrew Lowe
  • Andrey Ustyuzhanin
  • Andrzej Siodmok
  • André David
  • Anjishnu Bandyopadhyay
  • Anna Zaborowska
  • Arash Jofrehei
  • Avishek Chatterjee
  • Balazs Kegl
  • Balazs Ujvari
  • Balint Radics
  • Ben Couturier
  • Bhawna Gomber
  • Bianca-Cristina Cristescu
  • Bohmer Felix
  • Bornheim Adi
  • Borun Chowdhury
  • Caterina Doglioni
  • Catrin Bernius
  • Cecile Germain
  • Chao Wang
  • Christian Contreras-Campana
  • Christian L. Müller
  • Christian Ohm
  • Christos Leonidopoulos
  • Clemens Lange
  • Cristiano Alpigiani
  • Daniel Hay Guest
  • Daniel Kudlowiez Franch
  • Daniel Meister
  • Daniel Patrick O'Hanlon
  • Daniela Bortoletto
  • Daniela Paredes
  • Daniele Bonacorsi
  • Darren Price
  • David Rousseau
  • Davide Castelvecchi
  • Demis Hassabis
  • Demouth Julien
  • Devdatta Majumder
  • Dirk Duellmann
  • Donghee Kang
  • Dustin Burns
  • Eamonn Maguire
  • Ece Akilli
  • Eduardo Rodrigues
  • Eilam Gross
  • Elizabeth Sexton-Kennedy
  • Elli Papadopoulou
  • Ellie Dobson
  • Emma Tolley
  • Farbin Amir
  • Federico De Guio
  • Federico Ferri
  • Federico Preiato
  • Florencia Canelli
  • Foreman-Mackey Dan
  • Francesco Guescini
  • Francesco Lo Sterzo
  • Francesco Spano
  • Francisco Anuar Arduh
  • Frank Deppisch
  • Frederic Alexandre Dreyer
  • Fuquan Wang
  • Gabor Boros
  • Geoffrey Nathan Smith
  • Georg Zobernig
  • Georgios Krintiras
  • Georgios Krintiras
  • Gergely Devenyi
  • Giacomo Artoni
  • Gianfranco Bertone
  • Gilles Louppe
  • Giovanni Punzi
  • Greeshma Koyithatta Meethaleveedu
  • Gregory Dobler
  • Guenter Duckeck
  • Harinder Singh Bawa
  • Head Timothy Daniel
  • Hongtao Yang
  • Ian Fisk
  • Ian Michael Snyder
  • Igor Altsybeev
  • Iosif-Charles Legrand
  • Ivan Glushkov
  • Jack Wright
  • James Catmore
  • James K
  • Jared Vasquez
  • Jean-Roch Vlimant
  • Jeffrey Wayne Hetherly
  • Jesse Heilman
  • Jitendra Kumar
  • Joaquin Hoya
  • John Apostolakis
  • Jonathan Shlomi
  • Joschka Lingemann
  • Jose David Ruiz Alvarez
  • Josh Bendavid
  • Jovan Mitrevski
  • Judita Mamuzic
  • Juerg Beringer
  • Juergen Schmidhuber
  • Karel Ha
  • Karim El Morabit
  • Katharine Leney
  • Khristian Kotov
  • Konstantinos Karakostas
  • Kyle Cranmer
  • Kyle Martin Tos
  • Laser Seymour Kaplan
  • Lashkar Kashif
  • Leonardo Cristella
  • Levente Torok, Phd
  • Lily Asquith
  • Lorenzo Bianchini
  • Lorenzo Moneta
  • Lucian Stefan Ancu
  • Luke de Oliveira
  • M Spiropúlu
  • Maciej Pawel Szymanski
  • Marc Schoenauer
  • Marcel Rieger
  • Marco A. Harrendorf
  • Marco Meoni
  • Marco Rovee
  • Maria Girone
  • Marie Lanfermann
  • Marilyn Marx
  • Mario Lassnig
  • Mariyan Petrov
  • Matteo Negrini
  • Maurizio Pierini
  • Mauro Donega
  • Mazin Woodrow Khader
  • Messmer Peter
  • MIAOYUAN LIU
  • Michael Kagan
  • Michela Paganini
  • Michelangelo Mangano
  • Michele Floris
  • Miguel Vidal Marono
  • Mikael Kuusela
  • Mikael Mieskolainen
  • Mirena Ivova Paneva
  • Mirkoantonio Casolino
  • Mohammed Mahmoud Mohammed
  • Nan Lu
  • Nikola Lazar Whallon
  • Nikos Karastathis
  • Olaf Nackenhorst
  • Oliver Maria Kind
  • Olivier Bondu
  • Olivier Mattelaer
  • Othmane Rifki
  • Pablo De Castro Manzano
  • Panagiotis Spentzouris
  • Paolo Calafiura
  • Patricia Rebello Teles
  • Patrick Koppenburg
  • Patrick Rieck
  • Pedro Vieira De Castro Ferreira Da Silva
  • Peter Elmer
  • Peter Sadowski
  • Philip Chang
  • Pierre Baldi
  • Pietro Vischia
  • Pooja Saxena
  • QAMAR UL HASSAN
  • Qi Zeng
  • Rachel Yohay
  • Raghav Kunnawalkam Elayavalli
  • Raghava Varma
  • Reina Coromoto Camacho Toro
  • Renato Aparecido Negrao De Oliveira
  • Riccardo Iaconelli
  • Riccardo Russo
  • Richard Wilkinson
  • Robert Roser
  • Roberto Ruiz de Austri
  • Roth Gunter
  • Ruchi Gupta
  • Rui Zhang
  • Russell Woods Smith
  • Ryan Heller
  • Sabine Kraml
  • Samuel James Greydanus
  • Samuel Meehan
  • Savannah Thais
  • Savvas Kyriacou
  • Sean Flowers
  • Sean-Jiun Wang
  • Sergei Gleyzer
  • Sezen Sekmen
  • Shawn Williamson
  • Shih-Chieh Hsu
  • Sijin Qian
  • Simeon Bamford
  • Simone Amoroso
  • Soren Stamm
  • Stefano Mattei
  • Steven Alkire
  • Steven Randolph Schramm
  • Suchita Kulkarni
  • Tamas Almos Vami
  • Tanmay Sarkar
  • Thorben Quast
  • Tibor Kiss
  • Tobias Golling
  • Tobias Tekampe
  • Tom Stevenson
  • Tomo Lazovich
  • Tova Holmes
  • Tuan Mate Nguyen
  • Tyler Henry Ruggles
  • Valentin Volkl
  • Valentin Y Kuznetsov
  • Valerio Ippolito
  • Vincent Alexander Croft
  • Vincenzo Innocente
  • Virginia Azzolini
  • Vitaliano Ciulli
  • Vittorio Raoul Tavolaro
  • Vladimir Gligorov
  • Wells Wulsin
  • Wen Guan
  • William Dmitri Breaden Madden
  • Wooyoung Moon
  • Xabier Cid Vidal
  • Xiangyang Ju
  • Xiaohu Sun
  • Yasaman Fereydooni
  • Zongchang Yang
Webcast
There is a live webcast for this event
    • 08:30 13:00
      Welcome and Introduction 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Jean-Roch Vlimant (California Institute of Technology (US))
      • 09:00
        Welcome 15m
        Speaker: Jean-Roch Vlimant (California Institute of Technology (US))
      • 09:15
        Data and Science in HEP 45m
        Speaker: Vincenzo Innocente (CERN)
      • 10:00
        Data Science in industry 45m
        Speaker: Ellie Dobson (Pivotal)
      • 10:45
        Coffe Break 15m
      • 11:00
        ML at ATLAS&CMS : setting the stage 40m
        In the early days of the LHC the canonical problems of classification and regression were mostly addressed using simple cut-based techniques. Today, ML techniques (some already pioneered in pre-LHC or non collider experiments) play a fundamental role in the toolbox of any experimentalist. The talk will introduce, through a representative collection of examples, the problems addressed with ML techniques at the LHC. The goal of the talk is to set the stage for a constructive discussion with non-HEP ML practitioners, focusing on the specificities of HEP applications.
        Speaker: Mauro Donega (Eidgenoessische Tech. Hochschule Zuerich (CH))
      • 11:40
        Preparing for the future: opportunities for ML in ATLAS & CMS 40m
        ML is an established tool in HEP and there are many examples which demonstrate its importance for the kind of classification and regression problem we have in our field. However, there is also a big potential for future applications in yet untapped areas. I will summarise these opportunities and highlight recent, ongoing and planned studies of novel ML applications in HEP. Certain aspects of the problems we are faced with in HEP are quite unique and represent interesting benchmark problems for the ML community as a whole. Hence, efficient communication and close interaction between the ML and HEP community is expected to lead to promising cross-fertilisation. This talk attempts to serve as a starting point for such a prospective collaboration.
        Speaker: Tobias Golling (Universite de Geneve (CH))
    • 13:00 14:00
      Lunch Break 1h 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 14:00 14:45
      Symposium 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Jean-Roch Vlimant (California Institute of Technology (US))
      • 14:00
        Deep Learning RNNaissance 45m
        In recent years, our deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. They are now widely used in industry. I will briefly review deep supervised / unsupervised / reinforcement learning, and discuss the latest state of the art results in numerous applications. Bio : Since age 15 or so, Prof. Jürgen Schmidhuber's main scientific ambition has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. He has pioneered self-improving general problem solvers since 1987, and Deep Learning Neural Networks (NNs) since 1991. The recurrent NNs (RNNs) developed by his research groups at the Swiss AI Lab IDSIA & USI & SUPSI and TU Munich were the first RNNs to win official international contests. They have revolutionised connected handwriting recognition, speech recognition, machine translation, optical character recognition, image caption generation, and are now in use at Google, Microsoft, IBM, Baidu, and many other companies. Founders & staff of DeepMind (sold to Google for over 600M) include 4 former PhD students from his lab. His team's Deep Learners were the first to win object detection and image segmentation contests, and achieved the world's first superhuman visual classification results, winning nine international competitions in machine learning & pattern recognition (more than any other team). They also were the first to learn control policies directly from high-dimensional sensory input using reinforcement learning. His research group also established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age's extreme form of minimal art. Since 2009 he has been member of the European Academy of Sciences and Arts. He has published 333 peer-reviewed papers, earned seven best paper/best video awards, the 2013 Helmholtz Award of the International Neural Networks Society, and the 2016 IEEE Neural Networks Pioneer Award. He is president of NNAISENSE, which aims at building the first practical general purpose AI.
        Speaker: Juergen Schmidhuber (IDSIA)
    • 14:45 15:00
      Coffee Break 15m 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 15:00 18:00
      Monday Afternoon Session 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: David Rousseau (LAL-Orsay, FR)
      • 15:00
        Feature Extraction 45m
        Feature selection and reduction are key to robust multivariate analyses. In this talk I will focus on pros and cons of various variable selection methods and focus on those that are most relevant in the context of HEP.
        Speaker: Dr Sergei Gleyzer (University of Florida (US))
      • 15:45
        TMVA tutorial 2h 15m
        This tutorial will both give an introduction on how to use TMVA in root6 and showcase some new features, such as modularity, variable importance, interfaces to R and python. After explaining the basic functionality, the typical steps required during a real life application (such as variable selection, pre-processing, tuning and classifier evaluation) will be demonstrated on simple examples. First part of the tutorial will use the usual Root interface (please make sure you have Root 6.04 installed somewhere). The second part will utilize the new server notebook functionality of Root as a Service. If you are within CERN but outside the venue or outside CERN please consult the notes attached.
        Speakers: Helge Voss (Max-Planck-Gesellschaft (DE)), Dr Sergei Gleyzer (University of Florida (US))
    • 19:00 20:00
      Reception 1h Restaurant 1

      Restaurant 1

      CERN

    • 09:00 13:00
      Tuesday Morning Session 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Cecile Germain-Renaud (LRI)
      • 09:00
        ME technique plus experience ttH 45m
        The Matrix Element Method (MEM) is a HEP-specific technique to directly calculate the likelihood for a collision event based on the “matrix elements” of quantum field theory and a simplified detector description. The goal of this talk is to be a description of the matrix element method, current implementations, and comparisons with other multivariate approaches.
        Speaker: Lorenzo Bianchini (Eidgenoessische Tech. Hochschule Zuerich (CH))
      • 09:45
        ABC Method 45m
        Approximate Bayesian computation (ABC) is the name given to a collection of Monte Carlo algorithms used for fitting complex computer models to data. The methods rely upon simulation, rather than likelihood based calculation, and so can be used to calibrate a much wider set of simulation models. The simplest version of ABC is intuitive: we sample repeatedly from the prior distribution, and accept parameter values that give a close match between the simulation and the data. This has been extended in many ways, for example, reducing the dimension of the datasets using summary statistics and then calibrating to the summaries instead of the full data; using more efficient Monte Carlo algorithms (MCMC, SMC, etc); and introducing modelling approaches to overcome computational cost and to minimize the error in the approximation. The two key challenges for ABC methods are i) dealing with computational constraints; and ii) finding good low dimensional summaries. Much of the early work on i) was based upon finding efficient sampling algorithms, adapting methods such as MCMC and sequential Monte Carlo methods, to more efficiently find good regions of parameter space. Although these methods can dramatically reduce the amount of computation needed, they still require hundreds of thousands of simulations. Recent work has instead focused on the use of meta-models or emulators. These are cheap statistical surrogates that approximate the simulator, and which can be used in place of the simulator to find the posterior distribution. A key question when using these methods concerns the experimental design: where should we next run the simulator, in order to maximise our information about the posterior distribution?
        Speaker: Richard Wilkinson (University of Sheffield)
      • 10:30
        Coffee Break 15m
      • 10:45
        Approximate Likelihood 45m
        Most physics results at the LHC end in a likelihood ratio test. This includes discovery and exclusion for searches as well as mass, cross-section, and coupling measurements. The use of Machine Learning (multivariate) algorithms in HEP is mainly restricted to searches, which can be reduced to classification between two fixed distributions: signal vs. background. I will show how we can extend the use of ML classifiers to distributions parameterized by physical quantities like masses and couplings as well as nuisance parameters associated to systematic uncertainties. This allows for one to approximate the likelihood ratio while still using a high dimensional feature vector for the data. Both the MEM and ABC approaches mentioned above aim to provide inference on model parameters (like cross-sections, masses, couplings, etc.). ABC is fundamentally tied Bayesian inference and focuses on the “likelihood free” setting where only a simulator is available and one cannot directly compute the likelihood for the data. The MEM approach tries to directly compute the likelihood by approximating the detector. This approach is similar to ABC in that it provides parameter inference in the “likelihood free” setting by using a simulator, but it does not require one to use Bayesian inference and it cleanly separates issues of statistical calibration from the approximations that are being made. The method is much faster to evaluate than the MEM approach and does not require a simplified detector description. Furthermore, it is a generalization of the LHC experiments current use of multivariate classifiers for searches and integrates well into our existing statistical procedures.
        Speaker: Kyle Stuart Cranmer (New York University (US))
      • 11:30
        Stochastic optimization: beyond mathematical programming 45m
        Stochastic optimization, among which bio-inspired algorithms, is gaining momentum in areas where more classical optimization algorithms fail to deliver satisfactory results, or simply cannot be directly applied. This presentation will introduce baseline stochastic optimization algorithms, and illustrate their efficiency in different domains, from continuous non-convex problems to combinatorial optimization problem, to problems for which a non-parametric formulation can help exploring unforeseen possible solution spaces.
        Speaker: Marc Schoenauer (INRIA)
      • 12:15
        Software R&D for Next Generation of HEP Experiments, Inspired by Theano 45m
        In the next decade, the frontiers of High Energy Physics (HEP) will be explored by three machines: the High Luminosity Large Hadron Collider (HL-LHC) in Europe, the Long Base Neutrino Facility (LBNF) in the US, and the International Linear Collider (ILC) in Japan. These next generation experiments must address two fundamental problems in the current generation of HEP experimental software: the inability to take advantage and adapt to the rapidly evolving processor landscape, and the difficulty in developing and maintaining increasingly complex software systems by physicists. I will propose a strategy, inspired by the automatic optimization and code generation in Theano, to simultaneously address both problems. I will describe three R&D projects with short-term physics deliverables aimed at developing this strategy. The first project is to develop maximally sensitive General Search for New Physics at the LHC by applying the Matrix Element Method running GPUs of HPCs. The second is to classify and reconstruct Liquid Argon Time Projection Chambers (LArTPC) events with Deep Learning techniques. The final project is to optimize tomographic reconstruction of LArTPC events, inspired by medical imaging.
        Speaker: Amir Farbin (University of Texas at Arlington (US))
    • 13:00 14:00
      Lunch Break 1h 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 14:00 14:45
      Symposium 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Jean-Roch Vlimant (California Institute of Technology (US))
      • 14:00
        Better Cities through Imaging 45m
        I will describe how persistent, synoptic imaging of an urban skyline can be used to better understand a city, in analogy to the way persistent, synoptic imaging of the sky can be used to better understand the heavens. At the newly created Urban Observatory at the Center for Urban Science and Progress (CUSP), we are combining techniques from the domains of astronomy, computer vision, remote sensing, and machine learning to address a myriad of questions related to urban informatics. I will go through several specific methodological examples including energy consumption, public health, and air quality which can lead to improved city functioning and quality of life.
        Speaker: Gregory Dobler (NYU CUSP)
    • 14:45 15:00
      Coffee Break 15m 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 15:00 18:10
      Tutorials II: Matrix Element Method (MEM) 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Vladimir Gligorov (CERN)
      • 15:00
        Introduction 1m
        Speaker: Kyle Stuart Cranmer (New York University (US))
      • 15:01
        Cross Disciplinary Discussion 19m
      • 15:20
        MadWeight Tutorial 1h
        This tutorial will introduce the matrix-element method and will be use to extract the top-quark mass from a data sample. For this you will learn how to use MadWeight, a program which performs the phase-space integration and returns the weight of the matrix-element method. We will discuss the speed issue and the various options in MadWeight to reduce the cpu-time of your computation.
        Speaker: Olivier Pierre C Mattelaer (IPPP Durham)
      • 16:20
        Break 20m
      • 16:40
        MemTk (Matrix Element Toolkit) Tutorial 50m
        This session will include a tutorial for tools developed for MEM calculation and lightning talks about recent experience with MEM in ATLAS and CMS.
        Speakers: Oliver Maria Kind (Humboldt-Universitaet zu Berlin (DE)), Patrick Rieck (Humboldt-Universitaet zu Berlin (DE)), Soren Stamm (Humboldt-Universitaet zu Berlin (DE))
      • 17:30
        Recent Experience, Challenges, & Discussion 30m
        Speakers: Lorenzo Bianchini (Eidgenoessische Tech. Hochschule Zuerich (CH)), Olaf Nackenhorst (Universite de Geneve (CH)), Olivier Pierre C Mattelaer (IPPP Durham), Patrick Rieck (Humboldt-Universitaet zu Berlin (DE))
    • 09:00 13:00
      Wednesday Morning Session 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Maria Spiropulu (California Institute of Technology (US))
      • 09:00
        Data science in ALICE 45m
        ALICE is the LHC experiment dedicated to the study of Heavy Ion collisions. In particular, the detector features low momentum tracking and vertexing, and comprehensive particle identification capabilities. In a single central heavy ion collision at the LHC, thousands of particles per unit rapidity are produced, making the data volume, track reconstruction and search of rare signals particularly challenging. Data science and machine learning techniques could help to tackle some of the challenges outlined above. In this talk, we will discuss some early attempts to use these techniques for the processing of detector signals and for the physics analysis. We will also highlight the most promising areas for the application of these methods.
        Speaker: Michele Floris (CERN)
      • 09:45
        Deep Learning and its Applications in the Natural Sciences 45m
        Starting from a brief historical perspective on scientific discovery, this talk will review some of the theory and open problems of deep learning and describe how to design efficient feedforward and recursive deep learning architectures for applications in the natural sciences. In particular, the focus will be on multiple particle problems at different scales: in biology (e.g. prediction of protein structures), chemistry (e.g. prediction of molecular properties and reactions), and high-energy physics (e.g. detection of exotic particles, jet substructure and tagging, "dark matter and dark knowledge")
        Speaker: Pierre Baldi (UCI)
      • 10:30
        Coffee Break 15m
      • 10:45
        A ground-up construction of deep learning 45m
        I propose to give a ground up construction of deep learning as it is in it's modern state. Starting from it's beginnings in the 90's, I plan on showing the relevant (for physics) differences in optimization, construction, activation functions, initialization, and other tricks that have been accrued over the last 20 years. In addition, I plan on showing why deeper, wider basic feedforward architectures can be used. Coupling this with MaxOut layers, modern GPUs, and including both l1 and l2 forms of regularization, we have the current "state of the art" in basic feedforward networks. I plan on discussing pre-training using deep autoencoders and RBMs, and explaining why this has fallen out of favor when you have lots of labeled data. While discussing each of these points, I propose to explain why these particular characteristics are valuable for HEP. Finally, the last topic on basic feedforward networks -- interpretation. I plan on discussing latent representations of important variables (i.e., mass, pT) that are contained in a dense or distributed fashion inside the hidden layers, as well as nifty ways of extracting variable importance. I also propose a short discussion on dark knowledge -- i.e., training very deep, very wide neural nets then using the outputs of these as targets for a smaller, shallower neural networks -- this has been shown to be incredibly useful for focusing the network to learn important information. Why is this relevant for physics? well, we could think of trigger level or hardware level applications, where we need FPGA level (for example) implementations of nets that cannot be very deep. Then I propose to discuss (relatively briefly) the uses cases of Convolution Networks (one current area of research for me) and recurrent neural networks in physics, as well as giving a broad overview of what they are and what domains they typically belong to -- i.e., jet image work with convolutional nets, or jet tagging that can read in info from each track in the case of RNNs.
        Speaker: Luke Percival De Oliveira (SLAC National Accelerator Laboratory (US))
      • 11:30
        Neuromorphic silicon chips 45m
        Neuromorphic silicon chips have been developed over the last 30 years, inspired by the design of biological nervous systems and offering an alternative paradigm for computation, with real-time massively parallel operation and potentially large power savings with respect to conventional computing architectures. I will present the general principles with a brief investigation of the design choices that have been explored, and I'll discuss how such hardware has been applied to problems such as classification.
        Speakers: Giacomo Indiveri (INI Zurich), Sim Bamford (INI Labs)
      • 12:15
        Artificial Intelligence and the Future of Science 45m
        Dr. Demis Hassabis is the Co-Founder and CEO of DeepMind, the world’s leading General Artificial Intelligence (AI) company, which was acquired by Google in 2014 in their largest ever European acquisition. Demis will draw on his eclectic experiences as an AI researcher, neuroscientist and videogames designer to discuss what is happening at the cutting edge of AI research, its future impact especially in helping with scientific advances in other fields such as physics, and how developing AI may help us better understand the human mind.
        Speaker: Demis Hassabis (Google DeepMind)
        • Response 5m
          Speaker: Maria Spiropulu (California Institute of Technology (US))
    • 13:00 14:00
      Lunch Break 1h 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 14:00 14:45
      Symposium 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Jean-Roch Vlimant (California Institute of Technology (US))
      • 14:00
        Scalable Gaussian Processes and the search for exoplanets 45m
        Gaussian Processes are a class of non-parametric models that are often used to model stochastic behavior in time series or spatial data. A major limitation for the application of these models to large datasets is the computational cost. The cost of a single evaluation of the model likelihood scales as the third power of the number of data points. In the search for transiting exoplanets, the datasets of interest have tens of thousands to millions of measurements with uneven sampling, rendering naive application of a Gaussian Process model impractical. To attack this problem, we have developed robust approximate methods for Gaussian Process regression that can be applied at this scale. I will describe the general problem of Gaussian Process regression and offer several applicable use cases. Finally, I will present our work on scaling this model to the exciting field of exoplanet discovery and introduce a well-tested open source implementation of these new methods.
        Speaker: Daniel ForemanMackey (University of Washington)
    • 14:45 15:00
      Coffee Break 15m 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 15:00 18:00
      Tutorial III: Deep Learning 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Andrew Lowe (Hungarian Academy of Sciences (HU))
      • 15:00
        Deep Learning Tutorial 3h
        This tutorial will introduce the latest deep learning software packages and explain how to get started using deep neural networks. We will train deep neural networks using the Theano and Pylearn2 software packages in Python, and then replicate results from Baldi et. al. 2014, Searching for exotic particles in high-energy physics with deep learning. We will also discuss techniques for model selection and automatic hyperparameter tuning. NOTE: In order to run the examples yourself, please install Theano on your system prior to arrival, and then download pylearn2 from github and put it in your python path. http://deeplearning.net/software/theano/install.html https://github.com/lisa-lab/pylearn2/tree/master/pylearn2
        Speaker: Peter Sadowski (University of California Irvine)
    • 19:00 22:00
      Buffet & Cocktail 3h Restaurant 1

      Restaurant 1

      CERN

    • 09:00 13:00
      Thursday Mornning Session 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Xabier Cid Vidal (CERN)
      • 09:00
        Data science in LHCb 45m
        Machine learning is used at all stages of the LHCb experiment. It is routinely used: in the process of deciding which data to record and which to reject forever, as part of the reconstruction algorithms (feature engineering), and in the extraction of physics results from our data. This talk will highlight current use cases, as well as ideas for ambitious future applications, and how we can collaborate on them.
        Speaker: Tim Head (Ecole Polytechnique Federale de Lausanne (CH))
      • 09:45
        Reusing ML tools and approaches for HEP data analysis 45m
        In my talk I'm going to give an overview of the ML tools/services Yandex School of Data Analysis (YSDA) team has developed. In particular I will focus on approaches that our team has developed during collaboration with LHCb on HEP data analysis (uGB+FL, GB-reweighting). Each approach is implemented within hep_ml Python package. To get acquainted with this tool you can install it right away in your environment or experiment with it within Reproducible Experiment Platform. I will give initial guidance how you can get started playing with it.
        Speaker: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
      • 10:30
        Coffee Break 15m
      • 10:45
        Real Time Processing 45m
        The LHC provides experiments with an unprecedented amount of data. Experimental collaborations need to meet storage and computing requirements for the analysis of this data: this is often a limiting factor in the physics program that would be achievable if the whole dataset could be analysed. In this talk, I will describe the strategies adopted by the LHCb, CMS and ATLAS collaborations to overcome these limitations and make the most of LHC data: data parking, data scouting, and real-time analysis.
        Speakers: Caterina Doglioni (Lund University (SE)), Dustin James Anderson (California Institute of Technology (US)), Vladimir Gligorov (CERN)
      • 11:30
        The Retina Algorithm 45m
        Charge particle reconstruction is one of the most demanding computational tasks found in HEP, and it becomes increasingly important to perform it in real time. We envision that HEP would greatly benefit from achieving a long-term goal of making track reconstruction happen transparently as part of the detector readout ("detector-embedded tracking"). We describe here a track-reconstruction approach based on a massively parallel pattern-recognition algorithm, inspired by studies of the processing of visual images by the brain as it happens in nature ('RETINA algorithm'). It turns out that high-quality tracking in large HEP detectors is possible with very small latencies, when this algorithm is implemented in specialized processors, based on current state-of-the-art, high-speed/high-bandwidth digital devices.
        Speakers: Giovanni Punzi (Universita di Pisa & INFN (IT)), Luciano Frances Ristori (Fermi National Accelerator Lab. (US))
      • 12:15
        Machine learning, computer vision, and probabilistic models in jet physics 45m
        In this talk we present recent developments in the application of machine learning, computer vision, and probabilistic models to the analysis and interpretation of LHC events. First, we will introduce the concept of jet-images and computer vision techniques for jet tagging. Jet images enabled the connection between jet substructure and tagging with the fields of computer vision and image processing for the first time, improving the performance to identify highly boosted W bosons with respect to state-of-the-art methods, and providing a new way to visualize the discriminant features of different classes of jets, adding a new capability to understand the physics within jets and to design more powerful jet tagging methods. Second, we will present Fuzzy jets: a new paradigm for jet clustering using machine learning methods. Fuzzy jets view jet clustering as an unsupervised learning task and incorporate a probabilistic assignment of particles to jets to learn new features of the jet structure. In particular, we will show how fuzzy jets can learn the shape of jets providing a new observable that improves the W boson and top tagging performance in highly boosted final states.
        Speakers: Ben Nachman (SLAC National Accelerator Laboratory (US)), Michael Aaron Kagan (SLAC National Accelerator Laboratory (US))
    • 13:00 14:00
      Lunch Break 1h 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 14:00 14:45
      Symposium 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Jean-Roch Vlimant (California Institute of Technology (US))
      • 14:00
        High-dimensional model estimation and model selection 45m
        I will review concepts and algorithms from high-dimensional statistics for linear model estimation and model selection. I will particularly focus on the so-called p>>n setting where the number of variables p is much larger than the number of samples n. I will focus mostly on regularized statistical estimators that produce sparse models. Important examples include the LASSO and its matrix extension, the Graphical LASSO, and more recent non-convex methods such as the TREX. I will show the applicability of these estimators in a diverse range of scientific applications, such as sparse interaction graph recovery and high-dimensional classification and regression problems in genomics.
        Speaker: Christian Mueller (Simons Foundation)
    • 14:45 15:00
      Coffee Break 15m 222/R-001

      222/R-001

      CERN

      200
      Show room on map
    • 15:00 18:45
      Thursday Afternoon Session 222/R-001

      222/R-001

      CERN

      200
      Show room on map
      Convener: Maurizio Pierini (CERN)
      • 15:00
        An introduction to machine learning with Scikit-Learn 2h 15m
        This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.
        Speaker: Dr Gilles Louppe (CERN)
      • 17:15
        TMVA R/Scikit-learn interface 45m
        In these tutorials we show how to use external classifiers from R/scikitlearn within TMVA.
        Speaker: Dr Sergei Gleyzer (University of Florida (US))