IML Machine Learning Working Group Meeting - Bayesian Optimization and Generative/Adversarial Models

Name: IML Machine Learning Working Group Meeting - Bayesian Optimization and Generative/Adversarial Models
Start: 2016-12-15T15:00:00+01:00
End: 2016-12-15T17:00:00+01:00
Location: CERN

Thursday 15 Dec 2016, 15:00 → 17:00 Europe/Zurich

4/3-006 - TH Conference Room (CERN)

4/3-006 - TH Conference Room

CERN

110

Show room on map

Hide

Sergei, News

Next meeting (January): multi-class and multi-objective classification/regression, catch-all, workshop reports
- Contrary to what we announced in November, the topic is not going to be a tagging. We plan to have a dedicated workshop (see below), due to the large number of proposed contributions
Community White Paper (CWP) of Hep Software Foundation to be produced by sometime next summer, includes machine learning. IML should contribute to that.
IML Workshop planned for March 20-22 (see slides for details)
- Tagging workshop (with hands-on) session
- CWP discussion
- Tutorials

Enrico Guiraud, Generative models and EM algorithm

Introduction to generative models
Main ingredients
- Latent variables (not observed, e.g. the model is made by 3 gaussians the parameters of the individual gaussians are the latent variables)
- Observed variables
- Model parameters
Methematically: conditional probability of observed data depending on hidden states (latent variables) and parameters
- Can be treated as a likelihood maximization problem
- Directly maximizing the log likelihood very difficult/impossible due to the large sum over hidden states
- EM algorithm: workaround the problem maximizing the "free energy", obtained introducing variational distributions q(s)
- Details on the choice of the q(s) are discussed in the slides
An example is shown with the "noisy-OR" model
A more complex example is applied to the Mnist data set
- Recovers "digits"-like latents
Difference with respect to other approaches: the model is explicit (you have to write a model in terms of hidden variables)
Questions
- Kyle: Latent variables in HEP are Monte Carlo truth, but much more complex: millions of hidden variables in geant simulation
  - Possibility to merge to kinds of generative models, non-explicit for simplified model of the detector, and EM minimization to get access to hidden variables
  - Does not replace a full simulation, but it could provide insight on a few carefully chosen hidden variables
- Sergei: can think about using this for a fast simulation.

Gilles Louppe, Learning to Pivot with adversaria networks

How to use a generative model to constrain a classifier
One of the typical problems in phsyics: how to incorporate/treat systematic uncertainties coming from the model uncertainties
Goal: find a classifier which is not sensitive to systematic variations of nuisance parameters
Slide 4: it means finding a classifier f which is a "pivotal quantity"
2 networks:
- one is the classifier
- one is the adversary, which produces the posterior value of the nuisance parameters based on the output of the classifier
  - If the adversary can produce a meaningful posterior of z it means that the classifier depends on the nuisance parameter
  - Want to make the adversary very bad
Details on the architecture shown in the slides
Strategy: minimize a loss function built with the loss function of the classifier minus the loss function of the adversary
- (Proof on the mini-max optimization and minimization algorithm shown in the slides)
- Weight of the adversary can be controlled by a parameter λ, controls the trade off between accuracy and robustness
Toy example discussed in the slides: two classes (gaussians) where the exact relative position is not known
- Shows that the method works and that the robustness comes at the price of poorer classification performance
HEP-inspired example also shown (W and QCD jets discrimination)
- Nuisance: pileup (extreme cases of 0 or 50 pileup events considered)
Slide 17: optimize λ with respect to some other objective, e.g. the final median statistical significance of the signal
Questions
- Sergei Slide 18: what are the bands? Experiment repeated multiple times.
  - interesting to observe that the treshold changes
  - Yes, but this is effectively a different classifier.
- Sergei: What do the curves look like for 10< λ < 500, same general shape?
  - Answer: looked at it, didn’t show but generally yes
- Sergei: did you try with multiple nuisance parameters? Not yet, but extension should be trivial
- Tatiana: What's the proportion of events with 0 and 50 pile-up events? 50-50
  - It seems the result does not depend on whether the training is done with the Z=1 or Z=0 class
  - Question deals with robustness of the result to various levels of pile-up
- Tatiana: Did you compare with the approach of Mike Williams (uBoost https://arxiv.org/abs/1305.7248)? no
  - one difference: this method was developed for non-observable parameters.
  - Subsequent discussion focused on whether the choice of pile-up as a nuisance parameter was justified.

Mike Williams, Event generator tuning using Bayesian Optimization

Introduction on what Bayesian optimization is
Investigate if it can be used for MC tuning -> generate posterior distributions for Pythia parameter based on data
This method can automatically assign uncertainties to the parameters
Use "Monash" as true data, see if they can recover Monash parameters via bayesian optimization
Slide 3: if a parameter is not really constrained by a particular observation it correctly gets a huge error bar
- Convergence is relatively fast (order of 50 queries * nParameters)
Global fit of all 20 parameters is less precise than "block" training, but still very satisfactory
Potential improvements discussed on the slides: expert knowledge, knowledge transfer, pre-simulation of small samples, treatment of discrete parameters, extension to larger parameter spaces
Questions
- Sergei : bayesian optimization of ML hyperparameters, is there anybody who tried that?
  - Baldi et. al.
  - Some other people did it, for instance there is an entry on Tim Head's blog
- Sergei: Does the tool (spearmint) work in parallel? Yes, but few choiches on how parallelize (e.g. pythia -> trivial parallelization). The tool allows for parallel optimization with different data set, useful if you have distributed resources.
- Sergei: Next step? Will you try this with real data?
  - Yes, will be tested on real data. Was tested on ee, finds better chi2 than default tune, but some parameters seem unphysical

Jonah Bernhard, Applying Bayesian parameter estimation to relativistic heavy-ion collisions

Postponed due to technical problems with Vidyo

There are minutes attached to this event. Show them.

- 15:00 → 15:15
  
  News and group updates 15m
  
  Speakers: Lorenzo Moneta (CERN), Michele Floris (CERN), Paul Seyfert (Universita & INFN, Milano-Bicocca (IT)), Dr Sergei Gleyzer (University of Florida (US)), Steven Randolph Schramm (Universite de Geneve (CH))
  
  Sergei_Gleyzer_IML_121516.pdf
- 15:15 → 15:35
  
  Generative Models and EM Learning Algorithm 20m
  
  Speaker: Enrico Guiraud (Università degli Studi e INFN Milano (IT))
  
  Generative models and EM algorithm - google slides
  
  Generative models and EM algorithm.pdf
- 15:35 → 15:55
  
  Learning to Pivot with Adversarial Networks 20m
  
  Speakers: Gilles Louppe (New York University (US)), Kyle Stuart Cranmer (New York University (US)), Michael Aaron Kagan (SLAC National Accelerator Laboratory (US))
  
  slides.pdf
- 15:55 → 16:15
  
  Simulation tuning using Bayes optimization 20m
  
  Speaker: J Michael Williams (Massachusetts Inst. of Technology (US))
  
  MCTune.pdf
- 16:15 → 16:35
  
  Applying Bayesian parameter estimation to relativistic heavy-ion collisions (postponed) 20m
  
  Speaker: Jonah Bernhard
  
  jbernhard.pdf
- 16:35 → 16:36
  
  Experimenting with Generative Adversarial Networks for Fast Simulation (postponed) 1m
  
  Speaker: Sebastian Neubert (Ruprecht-Karls-Universitaet Heidelberg (DE))