Data Science @ LHC 2015 Workshop

Name: Data Science @ LHC 2015 Workshop
Start: 2015-11-09T08:30:00+01:00
End: 2015-11-13T18:00:00+01:00
Location: CERN

9 Nov 2015, 08:30 → 13 Nov 2015, 18:00 Europe/Zurich

222/R-001 (CERN)

222/R-001

CERN

200

Show room on map

Andrew Lowe (Hungarian Academy of Sciences (HU)), Cecile Germain-Renaud (LRI), Daniel Whiteson (University of California Irvine (US)), David Rousseau (LAL-Orsay, FR), Gilles Louppe (CERN), Jean-Roch Vlimant (California Institute of Technology (US)), Kyle Stuart Cranmer (New York University (US)), Maria Spiropulu (California Institute of Technology (US)), Maurizio Pierini (California Institute of Technology (US)), Vladimir Gligorov (CERN)

Description

The LHC experiments have been producing the largest amount of complex data. 100TB/s of real-time data analyses and analyses of 100 EB of data are anticipated and planned for. The field of data science beyond statistical methods has been producing advanced, intelligent methods for data analysis, pattern recognition and model inference. This workshop will engage the two communities towards cross exchanges and applications that can forge accelerated progress in big basic science questions.

Some of the topics that will be addressed include cutting edge pattern recognition methods for elementary particle identification; intelligent detectors that learn from their failures and self-adjust to increase their performance efficiency; fast reconstruction of charged particle tracks; high-rate event selection algorithms that learn to select rare physics processes; advanced data techniques that can guide discovery and other challenges that can profit from advanced computational methods and resources.

The workshop includes plenary presentations, tutorials and hands-on hackathon-type of ML exercises as well as directed and undirected discussion and brainstorming time.

Subscribe to the participants mailing list for discussions on the topic and announcements before and during the workshop by sending email to: HEP-data-science+subscribe@googlegroups.com

Follow the workshop official account @DataScienceLHC . Feel free to tweet using the recommended hash tag #DSLHC15

The workshop will take place at CERN, it is open to anyone with an interest on Data Science application to High Energy Physics. There are no fees but registration for attendance in person is necessary for organization purposes. Registration for non-CERN users is prerequisite in order to gain access to the CERN site during the workshop.

Registration is closed at this time. However, the event will be in video conference and on CERN webcast.

For accommodation and access to CERN as well as laptop registration, check the registration page

Participants

249 View full list

Webcast

There is a live webcast for this event

Monday 9 November
- 08:30 → 13:00
  Welcome and Introduction 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Jean-Roch Vlimant (California Institute of Technology (US))
  - 09:00
    
    Welcome 15m
    
    Speaker: Jean-Roch Vlimant (California Institute of Technology (US))
    
    Intro_DSLHC15_9Nov2015.pdf
    
    Recording
  - 09:15
    
    Data and Science in HEP 45m
    
    Speaker: Vincenzo Innocente (CERN)
    
    HEP@ML15.pdf
    
    Recording
  - 10:00
    
    Data Science in industry 45m
    
    Speaker: Ellie Dobson (Pivotal)
    
    CERN talk.pdf
    
    Recording
  - 10:45
    
    Coffe Break 15m
  - 11:00
    
    ML at ATLAS&CMS : setting the stage 40m
    
    In the early days of the LHC the canonical problems of classification and regression were mostly addressed using simple cut-based techniques. Today, ML techniques (some already pioneered in pre-LHC or non collider experiments) play a fundamental role in the toolbox of any experimentalist. The talk will introduce, through a representative collection of examples, the problems addressed with ML techniques at the LHC. The goal of the talk is to set the stage for a constructive discussion with non-HEP ML practitioners, focusing on the specificities of HEP applications.
    
    Speaker: Mauro Donega (Eidgenoessische Tech. Hochschule Zuerich (CH))
    
    MLatLHC.pdf
    
    Recording
  - 11:40
    Preparing for the future: opportunities for ML in ATLAS & CMS 40m
    
    ML is an established tool in HEP and there are many examples which demonstrate its importance for the kind of classification and regression problem we have in our field. However, there is also a big potential for future applications in yet untapped areas. I will summarise these opportunities and highlight recent, ongoing and planned studies of novel ML applications in HEP. Certain aspects of the problems we are faced with in HEP are quite unique and represent interesting benchmark problems for the ML community as a whole. Hence, efficient communication and close interaction between the ML and HEP community is expected to lead to promising cross-fertilisation. This talk attempts to serve as a starting point for such a prospective collaboration.
    
    Speaker: Tobias Golling (Universite de Geneve (CH))
    
    DataScience_Golling_Nov092015.pdf
    
    Recording
    
    Response 5m
    
    Speaker: Cecile Germain-Renaud (LRI)
    
    ReponseV2.pdf
    
    ReponseV2.pptx
- 13:00 → 14:00
  
  Lunch Break 1h 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 14:00 → 14:45
  Symposium 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Jean-Roch Vlimant (California Institute of Technology (US))
  - 14:00
    
    Deep Learning RNNaissance 45m
    
    In recent years, our deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. They are now widely used in industry. I will briefly review deep supervised / unsupervised / reinforcement learning, and discuss the latest state of the art results in numerous applications. Bio : Since age 15 or so, Prof. Jürgen Schmidhuber's main scientific ambition has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. He has pioneered self-improving general problem solvers since 1987, and Deep Learning Neural Networks (NNs) since 1991. The recurrent NNs (RNNs) developed by his research groups at the Swiss AI Lab IDSIA & USI & SUPSI and TU Munich were the first RNNs to win official international contests. They have revolutionised connected handwriting recognition, speech recognition, machine translation, optical character recognition, image caption generation, and are now in use at Google, Microsoft, IBM, Baidu, and many other companies. Founders & staff of DeepMind (sold to Google for over 600M) include 4 former PhD students from his lab. His team's Deep Learners were the first to win object detection and image segmentation contests, and achieved the world's first superhuman visual classification results, winning nine international competitions in machine learning & pattern recognition (more than any other team). They also were the first to learn control policies directly from high-dimensional sensory input using reinforcement learning. His research group also established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age's extreme form of minimal art. Since 2009 he has been member of the European Academy of Sciences and Arts. He has published 333 peer-reviewed papers, earned seven best paper/best video awards, the 2013 Helmholtz Award of the International Neural Networks Society, and the 2016 IEEE Neural Networks Pioneer Award. He is president of NNAISENSE, which aims at building the first practical general purpose AI.
    
    Speaker: Juergen Schmidhuber (IDSIA)
    
    deep2015white_new.pdf
- 14:45 → 15:00
  
  Coffee Break 15m 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 15:00 → 18:00
  Monday Afternoon Session 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: David Rousseau (LAL-Orsay, FR)
  - 15:00
    
    Feature Extraction 45m
    
    Feature selection and reduction are key to robust multivariate analyses. In this talk I will focus on pros and cons of various variable selection methods and focus on those that are most relevant in the context of HEP.
    
    Speaker: Dr Sergei Gleyzer (University of Florida (US))
    
    Feature_Selection_Sergei_Gleyzer.pdf
    
    Feature_Selection_Sergei_Gleyzer.pptx
    
    Recording
  - 15:45
    
    TMVA tutorial 2h 15m
    
    This tutorial will both give an introduction on how to use TMVA in root6 and showcase some new features, such as modularity, variable importance, interfaces to R and python. After explaining the basic functionality, the typical steps required during a real life application (such as variable selection, pre-processing, tuning and classifier evaluation) will be demonstrated on simple examples. First part of the tutorial will use the usual Root interface (please make sure you have Root 6.04 installed somewhere). The second part will utilize the new server notebook functionality of Root as a Service. If you are within CERN but outside the venue or outside CERN please consult the notes attached.
    
    Speakers: Helge Voss (Max-Planck-Gesellschaft (DE)), Dr Sergei Gleyzer (University of Florida (US))
    
    Notes_about_New_TMVA_features_demo.pdf
    
    Recording
    
    TMVA exercised tgz
    
    TMVA exercises pdf
    
    TMVA_New_Features_Sergei_Gleyzer.pdf
    
    TMVA_New_Features_Sergei_Gleyzer.pptx
    
    TMVA_New_Features_Tutorial.pdf
    
    TMVA pptx
- 19:00 → 20:00
  
  Reception 1h Restaurant 1
  
  Restaurant 1
  
  CERN
Tuesday 10 November
- 09:00 → 13:00
  Tuesday Morning Session 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Cecile Germain-Renaud (LRI)
  - 09:00
    
    ME technique plus experience ttH 45m
    
    The Matrix Element Method (MEM) is a HEP-specific technique to directly calculate the likelihood for a collision event based on the “matrix elements” of quantum field theory and a simplified detector description. The goal of this talk is to be a description of the matrix element method, current implementations, and comparisons with other multivariate approaches.
    
    Speaker: Lorenzo Bianchini (Eidgenoessische Tech. Hochschule Zuerich (CH))
    
    bianchini_DataAtLHC.pdf
    
    Recording
  - 09:45
    ABC Method 45m
    
    Approximate Bayesian computation (ABC) is the name given to a collection of Monte Carlo algorithms used for fitting complex computer models to data. The methods rely upon simulation, rather than likelihood based calculation, and so can be used to calibrate a much wider set of simulation models. The simplest version of ABC is intuitive: we sample repeatedly from the prior distribution, and accept parameter values that give a close match between the simulation and the data. This has been extended in many ways, for example, reducing the dimension of the datasets using summary statistics and then calibrating to the summaries instead of the full data; using more efficient Monte Carlo algorithms (MCMC, SMC, etc); and introducing modelling approaches to overcome computational cost and to minimize the error in the approximation. The two key challenges for ABC methods are i) dealing with computational constraints; and ii) finding good low dimensional summaries. Much of the early work on i) was based upon finding efficient sampling algorithms, adapting methods such as MCMC and sequential Monte Carlo methods, to more efficiently find good regions of parameter space. Although these methods can dramatically reduce the amount of computation needed, they still require hundreds of thousands of simulations. Recent work has instead focused on the use of meta-models or emulators. These are cheap statistical surrogates that approximate the simulator, and which can be used in place of the simulator to find the posterior distribution. A key question when using these methods concerns the experimental design: where should we next run the simulator, in order to maximise our information about the posterior distribution?
    
    Speaker: Richard Wilkinson (University of Sheffield)
    
    CERN_Wilkinson_talk.pdf
    
    Recording
    
    Response 5m
    
    Speaker: Josh Bendavid (California Institute of Technology (US))
    
    abcresponse-Nov10-2015.pdf
    
    Recording
  - 10:30
    
    Coffee Break 15m
  - 10:45
    
    Approximate Likelihood 45m
    
    Most physics results at the LHC end in a likelihood ratio test. This includes discovery and exclusion for searches as well as mass, cross-section, and coupling measurements. The use of Machine Learning (multivariate) algorithms in HEP is mainly restricted to searches, which can be reduced to classification between two fixed distributions: signal vs. background. I will show how we can extend the use of ML classifiers to distributions parameterized by physical quantities like masses and couplings as well as nuisance parameters associated to systematic uncertainties. This allows for one to approximate the likelihood ratio while still using a high dimensional feature vector for the data. Both the MEM and ABC approaches mentioned above aim to provide inference on model parameters (like cross-sections, masses, couplings, etc.). ABC is fundamentally tied Bayesian inference and focuses on the “likelihood free” setting where only a simulator is available and one cannot directly compute the likelihood for the data. The MEM approach tries to directly compute the likelihood by approximating the detector. This approach is similar to ABC in that it provides parameter inference in the “likelihood free” setting by using a simulator, but it does not require one to use Bayesian inference and it cleanly separates issues of statistical calibration from the approximations that are being made. The method is much faster to evaluate than the MEM approach and does not require a simplified detector description. Furthermore, it is a generalization of the LHC experiments current use of multivariate classifiers for searches and integrates well into our existing statistical procedures.
    
    Speaker: Kyle Stuart Cranmer (New York University (US))
    
    DSatLHC.pdf
    
    Recording
    
    scikit-learn-wrapper-demo.ipynb
  - 11:30
    Stochastic optimization: beyond mathematical programming 45m
    
    Stochastic optimization, among which bio-inspired algorithms, is gaining momentum in areas where more classical optimization algorithms fail to deliver satisfactory results, or simply cannot be directly applied. This presentation will introduce baseline stochastic optimization algorithms, and illustrate their efficiency in different domains, from continuous non-convex problems to combinatorial optimization problem, to problems for which a non-parametric formulation can help exploring unforeseen possible solution spaces.
    
    Speaker: Marc Schoenauer (INRIA)
    
    cernSchoenauer-final.pdf
    
    Recording
    
    Response 5m
    
    Speaker: André David (CERN)
    
    151110 Stochatisc Data@LHC.pdf
    
    Recording
  - 12:15
    
    Software R&D for Next Generation of HEP Experiments, Inspired by Theano 45m
    
    In the next decade, the frontiers of High Energy Physics (HEP) will be explored by three machines: the High Luminosity Large Hadron Collider (HL-LHC) in Europe, the Long Base Neutrino Facility (LBNF) in the US, and the International Linear Collider (ILC) in Japan. These next generation experiments must address two fundamental problems in the current generation of HEP experimental software: the inability to take advantage and adapt to the rapidly evolving processor landscape, and the difficulty in developing and maintaining increasingly complex software systems by physicists. I will propose a strategy, inspired by the automatic optimization and code generation in Theano, to simultaneously address both problems. I will describe three R&D projects with short-term physics deliverables aimed at developing this strategy. The first project is to develop maximally sensitive General Search for New Physics at the LHC by applying the Matrix Element Method running GPUs of HPCs. The second is to classify and reconstruct Liquid Argon Time Projection Chambers (LArTPC) events with Deep Learning techniques. The final project is to optimize tomographic reconstruction of LArTPC events, inspired by medical imaging.
    
    Speaker: Amir Farbin (University of Texas at Arlington (US))
    
    DataScienceAtLHCWS.pdf
    
    Recording
- 13:00 → 14:00
  
  Lunch Break 1h 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 14:00 → 14:45
  Symposium 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Jean-Roch Vlimant (California Institute of Technology (US))
  - 14:00
    
    Better Cities through Imaging 45m
    
    I will describe how persistent, synoptic imaging of an urban skyline can be used to better understand a city, in analogy to the way persistent, synoptic imaging of the sky can be used to better understand the heavens. At the newly created Urban Observatory at the Center for Urban Science and Progress (CUSP), we are combining techniques from the domains of astronomy, computer vision, remote sensing, and machine learning to address a myriad of questions related to urban informatics. I will go through several specific methodological examples including energy consumption, public health, and air quality which can lead to improved city functioning and quality of life.
    
    Speaker: Gregory Dobler (NYU CUSP)
    
    dobler_dslhc15.pdf
- 14:45 → 15:00
  
  Coffee Break 15m 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 15:00 → 18:10
  Tutorials II: Matrix Element Method (MEM) 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Vladimir Gligorov (CERN)
  - 15:00
    
    Introduction 1m
    
    Speaker: Kyle Stuart Cranmer (New York University (US))
  - 15:01
    
    Cross Disciplinary Discussion 19m
  - 15:20
    
    MadWeight Tutorial 1h
    
    This tutorial will introduce the matrix-element method and will be use to extract the top-quark mass from a data sample. For this you will learn how to use MadWeight, a program which performs the phase-space integration and returns the weight of the matrix-element method. We will discuss the speed issue and the various options in MadWeight to reduce the cpu-time of your computation.
    
    Speaker: Olivier Pierre C Mattelaer (IPPP Durham)
    
    tutorial.pdf
    
    tutorial.tgz
  - 16:20
    
    Break 20m
  - 16:40
    
    MemTk (Matrix Element Toolkit) Tutorial 50m
    
    This session will include a tutorial for tools developed for MEM calculation and lightning talks about recent experience with MEM in ATLAS and CMS.
    
    Speakers: Oliver Maria Kind (Humboldt-Universitaet zu Berlin (DE)), Patrick Rieck (Humboldt-Universitaet zu Berlin (DE)), Soren Stamm (Humboldt-Universitaet zu Berlin (DE))
    
    MemToolkit_part1_Concept.pdf
    
    MemToolkit_part2_Demo.pdf
    
    Recording
  - 17:30
    
    Recent Experience, Challenges, & Discussion 30m
    
    Speakers: Lorenzo Bianchini (Eidgenoessische Tech. Hochschule Zuerich (CH)), Olaf Nackenhorst (Universite de Geneve (CH)), Olivier Pierre C Mattelaer (IPPP Durham), Patrick Rieck (Humboldt-Universitaet zu Berlin (DE))
    
    MemToolkit_part3_Application_public.pdf
    
    nackenhorst_DS_workshop15_101115_v2.pdf
Wednesday 11 November
- 09:00 → 13:00
  Wednesday Morning Session 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Maria Spiropulu (California Institute of Technology (US))
  - 09:00
    Data science in ALICE 45m
    
    ALICE is the LHC experiment dedicated to the study of Heavy Ion collisions. In particular, the detector features low momentum tracking and vertexing, and comprehensive particle identification capabilities. In a single central heavy ion collision at the LHC, thousands of particles per unit rapidity are produced, making the data volume, track reconstruction and search of rare signals particularly challenging. Data science and machine learning techniques could help to tackle some of the challenges outlined above. In this talk, we will discuss some early attempts to use these techniques for the processing of detector signals and for the physics analysis. We will also highlight the most promising areas for the application of these methods.
    
    Speaker: Michele Floris (CERN)
    
    mfloris-20150112-machinelearning.pdf
    
    Recording
    
    Response 5m
    
    Speaker: Chirstian Mueller (Simons Foundation)
    
    AliceDataScienceResponse.pdf
    
    Recording
  - 09:45
    Deep Learning and its Applications in the Natural Sciences 45m
    
    Starting from a brief historical perspective on scientific discovery, this talk will review some of the theory and open problems of deep learning and describe how to design efficient feedforward and recursive deep learning architectures for applications in the natural sciences. In particular, the focus will be on multiple particle problems at different scales: in biology (e.g. prediction of protein structures), chemistry (e.g. prediction of molecular properties and reactions), and high-energy physics (e.g. detection of exotic particles, jet substructure and tagging, "dark matter and dark knowledge")
    
    Speaker: Pierre Baldi (UCI)
    
    CERN2015short16x9-1.pdf
    
    Recording
    
    Response 5m
    
    Speaker: Balázs Kégl (Linear Accelerator Laboratory)
    
    answerToBaldi1511.pdf
    
    Recording
  - 10:30
    
    Coffee Break 15m
  - 10:45
    
    A ground-up construction of deep learning 45m
    
    I propose to give a ground up construction of deep learning as it is in it's modern state. Starting from it's beginnings in the 90's, I plan on showing the relevant (for physics) differences in optimization, construction, activation functions, initialization, and other tricks that have been accrued over the last 20 years. In addition, I plan on showing why deeper, wider basic feedforward architectures can be used. Coupling this with MaxOut layers, modern GPUs, and including both l1 and l2 forms of regularization, we have the current "state of the art" in basic feedforward networks. I plan on discussing pre-training using deep autoencoders and RBMs, and explaining why this has fallen out of favor when you have lots of labeled data. While discussing each of these points, I propose to explain why these particular characteristics are valuable for HEP. Finally, the last topic on basic feedforward networks -- interpretation. I plan on discussing latent representations of important variables (i.e., mass, pT) that are contained in a dense or distributed fashion inside the hidden layers, as well as nifty ways of extracting variable importance. I also propose a short discussion on dark knowledge -- i.e., training very deep, very wide neural nets then using the outputs of these as targets for a smaller, shallower neural networks -- this has been shown to be incredibly useful for focusing the network to learn important information. Why is this relevant for physics? well, we could think of trigger level or hardware level applications, where we need FPGA level (for example) implementations of nets that cannot be very deep. Then I propose to discuss (relatively briefly) the uses cases of Convolution Networks (one current area of research for me) and recurrent neural networks in physics, as well as giving a broad overview of what they are and what domains they typically belong to -- i.e., jet image work with convolutional nets, or jet tagging that can read in info from each track in the case of RNNs.
    
    Speaker: Luke Percival De Oliveira (SLAC National Accelerator Laboratory (US))
    
    Recording
    
    talk-lukedeo.pdf
  - 11:30
    Neuromorphic silicon chips 45m
    
    Neuromorphic silicon chips have been developed over the last 30 years, inspired by the design of biological nervous systems and offering an alternative paradigm for computation, with real-time massively parallel operation and potentially large power savings with respect to conventional computing architectures. I will present the general principles with a brief investigation of the design choices that have been explored, and I'll discuss how such hardware has been applied to problems such as classification.
    
    Speakers: Giacomo Indiveri (INI Zurich), Sim Bamford (INI Labs)
    
    SimBamFord2015-11Cern.pdf
    
    SimBamFord2015-11Cern.ppt
    
    Response 5m
    
    Speaker: Jean-Roch Vlimant (California Institute of Technology (US))
    
    NHResponse_DSLHC15_11Nov2015.pdf
    
    Recording
  - 12:15
    Artificial Intelligence and the Future of Science 45m
    
    Dr. Demis Hassabis is the Co-Founder and CEO of DeepMind, the world’s leading General Artificial Intelligence (AI) company, which was acquired by Google in 2014 in their largest ever European acquisition. Demis will draw on his eclectic experiences as an AI researcher, neuroscientist and videogames designer to discuss what is happening at the cutting edge of AI research, its future impact especially in helping with scientific advances in other fields such as physics, and how developing AI may help us better understand the human mind.
    
    Speaker: Demis Hassabis (Google DeepMind)
    
    Response 5m
    
    Speaker: Maria Spiropulu (California Institute of Technology (US))
- 13:00 → 14:00
  
  Lunch Break 1h 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 14:00 → 14:45
  Symposium 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Jean-Roch Vlimant (California Institute of Technology (US))
  - 14:00
    
    Scalable Gaussian Processes and the search for exoplanets 45m
    
    Gaussian Processes are a class of non-parametric models that are often used to model stochastic behavior in time series or spatial data. A major limitation for the application of these models to large datasets is the computational cost. The cost of a single evaluation of the model likelihood scales as the third power of the number of data points. In the search for transiting exoplanets, the datasets of interest have tens of thousands to millions of measurements with uneven sampling, rendering naive application of a Gaussian Process model impractical. To attack this problem, we have developed robust approximate methods for Gaussian Process regression that can be applied at this scale. I will describe the general problem of Gaussian Process regression and offer several applicable use cases. Finally, I will present our work on scaling this model to the exciting field of exoplanet discovery and introduce a well-tested open source implementation of these new methods.
    
    Speaker: Daniel ForemanMackey (University of Washington)
    
    dfm-dslhc-cern_new.pdf
    
    Recording
- 14:45 → 15:00
  
  Coffee Break 15m 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 15:00 → 18:00
  Tutorial III: Deep Learning 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Andrew Lowe (Hungarian Academy of Sciences (HU))
  - 15:00
    
    Deep Learning Tutorial 3h
    
    This tutorial will introduce the latest deep learning software packages and explain how to get started using deep neural networks. We will train deep neural networks using the Theano and Pylearn2 software packages in Python, and then replicate results from Baldi et. al. 2014, Searching for exotic particles in high-energy physics with deep learning. We will also discuss techniques for model selection and automatic hyperparameter tuning. NOTE: In order to run the examples yourself, please install Theano on your system prior to arrival, and then download pylearn2 from github and put it in your python path. http://deeplearning.net/software/theano/install.html https://github.com/lisa-lab/pylearn2/tree/master/pylearn2
    
    Speaker: Peter Sadowski (University of California Irvine)
    
    lhc2015_dl_tutorial.pdf
    
    lhc2015-dl-tutorial.zip
    
    Recording
- 19:00 → 22:00
  
  Buffet & Cocktail 3h Restaurant 1
  
  Restaurant 1
  
  CERN
Thursday 12 November
- 09:00 → 13:00
  Thursday Mornning Session 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Xabier Cid Vidal (CERN)
  - 09:00
    Data science in LHCb 45m
    
    Machine learning is used at all stages of the LHCb experiment. It is routinely used: in the process of deciding which data to record and which to reject forever, as part of the reconstruction algorithms (feature engineering), and in the extraction of physics results from our data. This talk will highlight current use cases, as well as ideas for ambitious future applications, and how we can collaborate on them.
    
    Speaker: Tim Head (Ecole Polytechnique Federale de Lausanne (CH))
    
    ds4lhc-lhcb-tim-head.pdf
    
    Recording
    
    Response 5m
    
    Speaker: Dr Gilles Louppe (CERN)
    
    Recording
    
    slides.pdf
  - 09:45
    
    Reusing ML tools and approaches for HEP data analysis 45m
    
    In my talk I'm going to give an overview of the ML tools/services Yandex School of Data Analysis (YSDA) team has developed. In particular I will focus on approaches that our team has developed during collaboration with LHCb on HEP data analysis (uGB+FL, GB-reweighting). Each approach is implemented within hep_ml Python package. To get acquainted with this tool you can install it right away in your environment or experiment with it within Reproducible Experiment Platform. I will give initial guidance how you can get started playing with it.
    
    Speaker: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
    
    01-howto-Classifiers.ipynb
    
    02-howto-Factory.ipynb
    
    Recording
    
    Ustyuzhanin_ML_reuse.pdf
  - 10:30
    
    Coffee Break 15m
  - 10:45
    
    Real Time Processing 45m
    
    The LHC provides experiments with an unprecedented amount of data. Experimental collaborations need to meet storage and computing requirements for the analysis of this data: this is often a limiting factor in the physics program that would be achievable if the whole dataset could be analysed. In this talk, I will describe the strategies adopted by the LHCb, CMS and ATLAS collaborations to overcome these limitations and make the most of LHC data: data parking, data scouting, and real-time analysis.
    
    Speakers: Caterina Doglioni (Lund University (SE)), Dustin James Anderson (California Institute of Technology (US)), Vladimir Gligorov (CERN)
    
    20151113_RealTimeAnalysisLHC-4.pdf
    
    Recording
  - 11:30
    
    The Retina Algorithm 45m
    
    Charge particle reconstruction is one of the most demanding computational tasks found in HEP, and it becomes increasingly important to perform it in real time. We envision that HEP would greatly benefit from achieving a long-term goal of making track reconstruction happen transparently as part of the detector readout ("detector-embedded tracking"). We describe here a track-reconstruction approach based on a massively parallel pattern-recognition algorithm, inspired by studies of the processing of visual images by the brain as it happens in nature ('RETINA algorithm'). It turns out that high-quality tracking in large HEP detectors is possible with very small latencies, when this algorithm is implemented in specialized processors, based on current state-of-the-art, high-speed/high-bandwidth digital devices.
    
    Speakers: Giovanni Punzi (Universita di Pisa & INFN (IT)), Luciano Frances Ristori (Fermi National Accelerator Lab. (US))
    
    DataScience-Punzi.odp
    
    DataScience-Punzi.pdf
    
    Recording
  - 12:15
    Machine learning, computer vision, and probabilistic models in jet physics 45m
    
    In this talk we present recent developments in the application of machine learning, computer vision, and probabilistic models to the analysis and interpretation of LHC events. First, we will introduce the concept of jet-images and computer vision techniques for jet tagging. Jet images enabled the connection between jet substructure and tagging with the fields of computer vision and image processing for the first time, improving the performance to identify highly boosted W bosons with respect to state-of-the-art methods, and providing a new way to visualize the discriminant features of different classes of jets, adding a new capability to understand the physics within jets and to design more powerful jet tagging methods. Second, we will present Fuzzy jets: a new paradigm for jet clustering using machine learning methods. Fuzzy jets view jet clustering as an unsupervised learning task and incorporate a probabilistic assignment of particles to jets to learn new features of the jet structure. In particular, we will show how fuzzy jets can learn the shape of jets providing a new observable that improves the W boson and top tagging performance in highly boosted final states.
    
    Speakers: Ben Nachman (SLAC National Accelerator Laboratory (US)), Michael Aaron Kagan (SLAC National Accelerator Laboratory (US))
    
    Recording
    
    SLAC_StanfordHEPML.pdf
    
    Response 5m
    
    Speaker: Ian Fisk (Simons Foundation)
    
    Response.pdf
    
    Response.pptx
- 13:00 → 14:00
  
  Lunch Break 1h 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 14:00 → 14:45
  Symposium 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Jean-Roch Vlimant (California Institute of Technology (US))
  - 14:00
    
    High-dimensional model estimation and model selection 45m
    
    I will review concepts and algorithms from high-dimensional statistics for linear model estimation and model selection. I will particularly focus on the so-called p>>n setting where the number of variables p is much larger than the number of samples n. I will focus mostly on regularized statistical estimators that produce sparse models. Important examples include the LASSO and its matrix extension, the Graphical LASSO, and more recent non-convex methods such as the TREX. I will show the applicability of these estimators in a diverse range of scientific applications, such as sparse interaction graph recovery and high-dimensional classification and regression problems in genomics.
    
    Speaker: Christian Mueller (Simons Foundation)
    
    LHCDataScience.pdf
    
    Recording
- 14:45 → 15:00
  
  Coffee Break 15m 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 15:00 → 18:45
  Thursday Afternoon Session 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Maurizio Pierini (CERN)
  - 15:00
    
    An introduction to machine learning with Scikit-Learn 2h 15m
    
    This tutorial gives an introduction to the scientific ecosystem for data analysis and machine learning in Python. After a short introduction of machine learning concepts, we will demonstrate on High Energy Physics data how a basic supervised learning analysis can be carried out using the Scikit-Learn library. Topics covered include data loading facilities and data representation, supervised learning algorithms, pipelines, model selection and evaluation, and model introspection.
    
    Speaker: Dr Gilles Louppe (CERN)
    
    Jupyter notebook
    
    Recording
  - 17:15
    
    TMVA R/Scikit-learn interface 45m
    
    In these tutorials we show how to use external classifiers from R/scikitlearn within TMVA.
    
    Speaker: Dr Sergei Gleyzer (University of Florida (US))
    
    Notes_about_New_TMVA_features_demo.pdf
    
    Recording
    
    TMVA_New_Features_Sergei_Gleyzer.pdf
Friday 13 November
- 09:00 → 12:00
  Tutorials V : caffe/theano 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Jean-Roch Vlimant (California Institute of Technology (US))
  - 09:00
    
    NVidia Tutorial 3h
    
    This tutorial will present Caffee, a powerful Python library to implement solutions working on CPUs and GPUs, and explain how to use it to build and train Convolutional Neural Networks using NVIDIA GPUs. The session requires no prior experience with GPUs or Caffee.
    
    Speakers: Gunter Roeth (NVidia), Julien Demouth (NVidia), Peter Messmer (NVidia)
    
    Getting_started_with_caffe_v2.pdf
    
    Recording
    
    For the tutorial
    
    after having the created your account
    
    go to https://nvlabs.qwiklab.com
    
    and log in
    
    you will see the LHC lab
  - 10:20
    
    Coffee Break 20m
- 12:00 → 12:30
  
  Closing Remarks 30m 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Speaker: Maria Spiropulu (California Institute of Technology (US))
  
  DS@LHC15.pdf
  
  Recording
- 12:30 → 14:00
  
  Lunch Break 1h 30m 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
- 14:00 → 16:00
  Open Data & round table on data access 222/R-001
  
  222/R-001
  
  CERN
  
  200
  Show room on map
  
  Convener: Jamie Shiers (CERN)
  - 14:00
    
    Welcome & Introduction 10m
    
    Speaker: Jamie Shiers (CERN)
    
    DPHEP Blue Too Summary.docx
    
    DPHEP Blue Too Summary.pdf
    
    DPHEP-OpenDataPanel.pdf
    
    DPHEP-OpenDataPanel.pptx
    
    DPHEP / WLCG Workshop
  - 14:10
    
    ALICE 5m
    
    Speaker: Markus Bernhard Zimmermann (Westfaelische Wilhelms-Universitaet Muenster (DE))
    
    openData.pdf
  - 14:15
    
    ATLAS 5m
    
    Speaker: Claire Adam Bourdarios (Laboratoire de l'Accelerateur Lineaire (FR))
    
    ATLASinput.pdf
  - 14:20
    
    CMS 5m
    
    Speaker: Kati Lassila-Perini (Helsinki Institute of Physics (FI))
    
    CMSOpenDataNov2015.pdf
    
    COPD: About CMS Data
  - 14:25
    
    LHCb 5m
    
    Speakers: Silvia Amerio (Universita e INFN, Padova (IT)), Silvia Amerio (University of Padova & INFN)
    
    amerio_DP_LHCb_13112015.pdf
  - 14:30
    
    BaBar 5m
    
    Speaker: Concetta Cartaro (SLAC)
    
    BaBar-OpenData-Nov2015.pdf
  - 14:35
    
    Open Data @CERN 15m
    
    Speakers: Sunje Dallmeier-Tiessen (Humboldt-Universitaet zu Berlin (DE)), Sunje Dallmeier-Tiessen (CERN)
    
    DataScience_ODPanel_Nov13rd_SDT.pdf
    
    DataScience_ODPanel_Nov13rd_SDT.pptx
  - 14:50
    
    LEP & Recast 5m
    
    Speaker: Kyle Stuart Cranmer (New York University (US))
    
    A twitter conversation about theorists and data products
    
    DSLHC-open-data.pdf
  - 14:55
    
    Open data and science reproducibility 20m
    
    Speaker: Prof. Victoria Stodden (University of Illinois at Urbana-Champaign)
    
    CERN-Nov132015-STODDEN.pdf
  - 15:15
    
    Discussion 45m

Data Science @ LHC 2015 Workshop

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

Restaurant 1

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

Restaurant 1

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN

222/R-001

CERN