Heavy Flavour Data Mining workshop

Name: Heavy Flavour Data Mining workshop
Start: 2016-02-18T08:00:00+01:00
End: 2016-02-20T17:00:00+01:00
Location: University of Zurich, Irchel Campus

18 Feb 2016, 08:00 → 20 Feb 2016, 17:00 Europe/Zurich

Y16 G15 (University of Zurich, Irchel Campus)

Y16 G15

University of Zurich, Irchel Campus

Andrey Ustyuzhanin (Yandex School of Data Analysis (RU)), Francesco Dettori (CERN), Marc Olivier Bettler (CERN), Marcin Chrzaszcz (University of Zurich (CH)), Thomas Blake (University of Warwick), Tim Head (Ecole Polytechnique Federale de Lausanne (CH))

Description

This workshop is designed to provide overview and hands-on experience for popular tools and methods in various fields of Machine Learning. Physicists are welcome to share challenges they are facing to facilitate collaboration with ML-practitioners. Active Machine Learning practitioners are invited to share their experience and their instruments that they use to achieve meaningful results in their domains of interest. Those fields might include Natural Language Processing, Image Recognition, Robotics. Those areas seemingly being away from HEP still have plenty of tools and algorithms could be applied to HEP challenges as well.

To foster the interaction OpenSpace Technology will be used, that is recognized as an approach for boosting creativity in variety of contexts and that "can lead to surprising results and fascinating new questions".

Winners of the Physics Prize of Flavours of Physics challenge organized by CERN, Yandex at Kaggle will present their solutions. Practical introduction into ML toolkits would be covered by tutorials on scikti-learn (by Gilles Loupe - core developer of scikit-learn), REP, hep_ml, Deep Learning tools.

Thursday 18 February
- 08:30 → 09:00
  
  Registration 30m
- 09:00 → 09:10
  
  Welcome 10m
- 09:10 → 12:30
  HEP challenges
  
  Convener: Marcin Chrzaszcz (Universitaet Zuerich (CH), Institute of Nuclear Physics (PL))
  - 09:10
    
    Data Science at LHCb 40m
    
    Speaker: Tim Head (Ecole Polytechnique Federale de Lausanne (CH))
    
    tim-uzh.pdf
  - 09:50
    
    Summary of «Flavors of Physics» Challenge 40m
    
    Speaker: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
    
    Ustyuzhanin__FoP_Summary.pdf
  - 10:30
    
    Coffee Break 20m
  - 10:50
    
    Data Doping solution for "Flavours of Physics" challenge 40m
    
    Speaker: Dr Vicens Gaitan
    
    Zurich_MachineLearning_VicensGaitan.pdf
  - 11:30
    
    Transfer Learning solution for "Flavours of Physics" challenge 20m
    
    Speaker: Dr Alexander Rakhlin
    
    Flavours of Physics.pdf
  - 11:50
    
    Pitfalls of evaluating a classifier's performance in high energy physics applications 30m
    
    Speaker: Dr Gilles Louppe (New York University (US))
    
    slides.pdf
- 12:30 → 13:30
  
  Lunch 1h
- 13:30 → 15:00
  
  HEP challenges discussions
  
  Convener: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
- 15:00 → 19:00
  Machine Learning tools & tutorials
  
  Convener: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
  - 15:00
    
    An introduction to machine learning with Scikit-Learn 2h
    
    https://github.com/glouppe/tutorial-scikit-learn
    
    Speaker: Dr Gilles Louppe (New York University (US))
    
    Materials
  - 17:00
    
    Cofee break 20m
  - 17:20
    
    Boosting applications for HEP 1h 30m
    
    Speaker: Aleksei Rogozhnikov (Yandex School of Data Analysis (RU))
    
    Rogozhnikov_boosting_in_HEP.pdf
- 19:00 → 20:30
  
  Reception/Dinner 1h 30m
Friday 19 February
- 09:00 → 12:30
  Data Science Applications
  
  Convener: Tim Head (Ecole Polytechnique Federale de Lausanne (CH))
  - 09:00
    
    Classifier output calibration to probability 40m
    
    Speaker: Tatiana Likhomanenko (National Research Centre Kurchatov Institute (RU))
    
    Likhomanenko_Classifier_calibration.pdf
  - 09:40
    
    Classifiers for centrality determination in proton-nucleus and nucleus-nucleus collisions 20m
    
    Centrality, as a geometrical property of the collision, is crucial for the physical interpretation of proton-nucleus and nucleus-nucleus experimental data. However, it cannot be directly accessed in event-by-event data analysis. Contemporary methods of the centrality estimation in A-A and p-A collisions usually rely on a single detector (either on the signal in zero-degree calorimeters or on the multiplicity in some semi-central rapidity range). In the present work, we develop an approach for centrality determination that is based on machine-learning techniques and utilizes information from several detector subsystems simultaneously. Different event classifiers are suggested and evaluated for their selectivity power in terms of the number of nucleons-participants and the impact parameter of the collision. The authors acknowledge Saint-Petersburg State University for a research grant 11.38.242.2015.
    
    Speaker: Igor Altsybeev (St. Petersburg State University (RU))
    
    classifiers_for_centrality_in_pPb_PbPb_Altsybeev_19.02.2016.pdf
  - 10:00
    
    Data Fusion Surogate Modeling on Incomplete Factorial Design of Experiments 40m
    
    This work concerns a construction of surrogate models for a specific aerodynamic data base. This data base is generally available from wind tunnel testing or from CFD aerodynamic simulations and contains aerodynamic coefficients for different flight conditions and configurations (such as Mach number, angle-of-attack, vehicle configuration angle) encountered over different space vehicles mission. The main peculiarity of aerodynamic data base is a specific design of experiment which is a union of grids of low and high fidelity data with considerably different sizes. Universal algorithms can’t approximate accurately such significantly non-uniform data. In this work a fast and accurate algorithm was developed which takes into account different fidelity of the data and special design of experiments
    
    Speaker: Prof. Eugene Burnaev (IITP)
    
    Burnaev_v5_final.pdf
  - 10:40
    
    Coffee Break 20m
  - 11:00
    
    Mathematics of Big Data 40m
    
    Speaker: Prof. Dmitry Vetrov (Skoltech, Yandex School of Data Analysis, Higher School of Economics)
    
    ZurichVetrov.pdf
    
    ZurichVetrov.pdf
  - 11:40
    
    Efficient Elastic Net Regularization for Sparse Linear Models in the Multilabel Setting 30m
    
    Speaker: Mr Zachary Chase Lipton (University of California, Amazon)
    
    multilabel_deep_learning.1.pdf
  - 12:10
    
    Optimized Methods to Apply Neural Networks in HEP 20m
    
    Different steps of NN application in HEP are considered. Possible optimization methods for each of the steps are discussed. The proposed methods were applied for the single top quark analysis in CMS and corresponding examples are presented in the talk.
    
    Speaker: Lev Dudko (M.V. Lomonosov Moscow State University (RU))
    
    nn_workshop_19.02.16.pdf
- 12:30 → 13:30
  
  Lunch 1h
- 13:30 → 15:00
  
  HEP challenges discussions
  
  Convener: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
- 15:00 → 18:50
  Machine Learning tools & tutorials
  - 15:00
    
    OpenML: Collaborative machine learning 40m
    
    Speaker: Dr Joaquin Vanschoren (https://www.tue.nl)
    
    OpenMLDemo.ipynb
    
    OpenML Zurich.pdf
  - 15:40
    
    Coffee break 30m
  - 16:10
    
    Calibration curves as tool to test for over- and underfitting 20m
    
    We present a simple approach to test correctness of bias or regularization strength, or other hyperparameters. The main idea is to fit hyperparameters so that test and train calibration curves after applying proper isotonic regression should intersect at diagonal.
    
    Speaker: Dr Artem Vorozhtsov (Yandex)
    
    calibration_curve_meta_learning.pdf
    
    calibration_curve_meta_learning.pptx
  - 16:30
    
    Reproducible Experiment Platform & Everware 2h
    
    Speakers: Aleksei Rogozhnikov (Yandex School of Data Analysis (RU)), Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
    
    REP tutorial
    
    Ustyuzhanin_Towards RR.pdf
    
    Ustyuzhanin_Towards RR.pptx
Saturday 20 February
- 09:00 → 11:50
  Data Science Applications
  
  Convener: Marc Olivier Bettler (CERN)
  - 09:00
    
    Automatic Tuning of Hyperparameters 40m
    
    The training process of a machine learning algorithm includes tuning of hyperparameters, such as the regularization coefficient of a linear model or the depth of a decision tree. Unfortunately, it usually is conducted manually, what is very expensive to be done on a regular basis. Moreover, the growing number of hyperparameters in modern complex machine learning methods additionally complicates this problem. In our talk, we overview methods to make the process of hyperparameters tuning more autonomous, i.e. make it less requiring help of experts.
    
    Speaker: Mr Alexander Fonarev (Skoltech)
    
    fonarev_hyperparams.pdf
  - 09:40
    
    Deep Learning for event reconstruction 40m
    
    Speaker: Amir Farbin (University of Texas at Arlington (US))
    
    HeavyFlavor-DL.pdf
  - 10:20
    
    Coffee Break 20m
  - 10:40
    
    Fast multimodal clustering: searching for optimal patterns 40m
    
    In Machine Learning, we usually deal with object-attribute tables. However, underlying objects may have other modalities than attributes only. For instance, an object may have a certain attribute only under specific conditions. The real examples came from gene expression data, where a gene can be active (expressed) in particular situations at a certain moment of time, implying ternary relation with triples (g,s,t). One more example came from resource sharing systems like Flickr or Bibsonomy, i.e. a user u can assign a certain tag t to a resource r. One may ask how to find homogeneous patterns, groups of genes with similar properties or communities in such data. This talk presents several definitions of “optimal patterns” in triadic data and results of experimental comparison of five triclustering algorithms on real-world and synthetic datasets. The evaluation is carried over such criteria as resource efficiency, noise tolerance and quality scores involving cardinality, density, coverage, and diversity of the patterns. An ideal triadic pattern is a totally dense maximal cuboid (formal triconcept). Relaxations of this notion under consideration are: OAC-triclusters; triclusters optimal with respect to the least-square criterion; and graph partitions obtained by using spectral clustering. We show that searching for an optimal tricluster cover is an NP-complete problem, whereas determining the number of such covers is #P-complete. Our extensive computational experiments lead us to a clear strategy for choosing a solution at a given dataset guided by the principle of Pareto-optimality according to the proposed criteria. In the end on the talk, we will outline future prospects of multimodal triclustering and its relationship with tensor factorisation.
    
    Speaker: Dr Dmitry Ignatov (HSE)
    
    Slides2.pdf
  - 11:20
    
    Summary of open space discussions 20m
    
    Speaker: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
    
    open-space-overview.pdf
    
    open-space-overview.pptx
- 11:45 → 12:40
  Machine Learning tools & tutorials
  
  Convener: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))
  - 11:50
    
    Nvidia tutorial 10m
    
    alison.pdf
    
    Deep Learning with GPU
  - 12:00
    
    TensorFlow introduction & tutorial #1 30m
    
    Introduction into deep learning, hands-on tutorial and demonstration of TensorFlow using HiggsML challenge dataset.
    
    Speaker: Rafal Jozefowicz (Google)
    
    tensorflow_introduction.pdf
- 12:40 → 13:30
  
  Lunch 50m
- 13:30 → 14:50
  Machine Learning tools & tutorials
  - 13:30
    
    TensorFlow introduction & tutorial. Continuation 1h 20m
    
    Speaker: Rafal Jozefowicz (Google)
- 14:50 → 15:00
  
  Closing Remarks 10m
  
  Speakers: Andrey Ustyuzhanin (Yandex School of Data Analysis (RU)), Marcin Chrzaszcz (Universitaet Zuerich (CH), Institute of Nuclear Physics (PL))
  
  mchrzasz.pdf