IML Machine Learning Working Group Meeting - Bayesian Optimization and Generative/Adversarial Models

4/3-006 - TH Conference Room (CERN)

4/3-006 - TH Conference Room


Show room on map

Sergei, News

  • Next meeting (January): multi-class and multi-objective classification/regression, catch-all, workshop reports 
    • Contrary to what we announced in November, the topic is not going to be a tagging.  We plan to have a dedicated workshop (see below), due to the large number of proposed contributions
  • Community White Paper (CWP) of Hep Software Foundation to be produced by sometime next summer, includes machine learning. IML should contribute to that.
  • IML Workshop planned for March 20-22 (see slides for details)
    • Tagging workshop (with hands-on) session
    • CWP discussion
    • Tutorials


Enrico Guiraud, Generative models and EM algorithm 

  • Introduction to generative models
  • Main ingredients
    • Latent variables (not observed, e.g. the model is made by 3 gaussians the parameters of the individual gaussians are the latent variables)
    • Observed variables
    • Model parameters
  • Methematically: conditional probability of observed data depending on hidden states (latent variables) and parameters
    • Can be treated as a likelihood maximization problem
    • Directly maximizing the log likelihood very difficult/impossible due to the large sum over hidden states
    • EM algorithm: workaround the problem maximizing the "free energy", obtained introducing variational distributions q(s)
    • Details on the choice of the q(s) are discussed in the slides
  • An example is shown with the "noisy-OR" model
  • A more complex example is applied to the Mnist data set
    • Recovers "digits"-like latents
  • Difference with respect to other approaches: the model is explicit (you have to write a model in terms of hidden variables)
  • Questions
    • Kyle: Latent variables in HEP are Monte Carlo truth, but much more complex: millions of hidden variables in geant simulation
      • Possibility to merge to kinds of generative models, non-explicit for simplified model of the detector, and EM minimization to get access to hidden variables
      • Does not replace a full simulation, but it could provide insight on a few carefully chosen hidden variables
    • Sergei: can think about using this for a fast simulation.


Gilles Louppe, Learning to Pivot with adversaria networks

  • How to use a generative model to constrain a classifier
  • One of the typical problems in phsyics: how to incorporate/treat systematic uncertainties coming from the model uncertainties
  • Goal: find a classifier which is not sensitive to systematic variations of nuisance parameters
  • Slide 4: it means finding a classifier f which is a "pivotal quantity"
  • 2 networks:
    • one is the classifier
    • one is the adversary, which produces the posterior value of the nuisance parameters based on the output of the classifier
      • If the adversary can produce a meaningful posterior of z it means that the classifier depends on the nuisance parameter
      • Want to make the adversary very bad
  • Details on the architecture shown in the slides
  • Strategy: minimize a loss function built with the loss function of the classifier minus the loss function of the adversary
    • (Proof on the mini-max optimization and minimization algorithm shown in the slides)
    • Weight of the adversary can be controlled by a parameter λ, controls the trade off between accuracy and robustness
  • Toy example discussed in the slides: two classes (gaussians) where the exact relative position is not known 
    • Shows that the method works and that the robustness comes at the price of poorer classification performance
  • HEP-inspired example also shown (W and QCD jets discrimination)
    • Nuisance: pileup (extreme cases of 0 or 50 pileup events considered)
  • Slide 17: optimize λ with respect to some other objective, e.g. the final median statistical significance of the signal
  • Questions
    • Sergei Slide 18: what are the bands? Experiment repeated multiple times.
      • interesting to observe that the treshold changes
      • Yes, but this is effectively a different classifier.
    • Sergei: What do the curves look like for  10< λ < 500, same general shape?
      • Answer: looked at it, didn’t show but generally yes
    • Sergei: did you try with multiple nuisance parameters? Not yet, but extension should be trivial
    • Tatiana: What's the proportion of events with 0 and 50 pile-up events? 50-50
      • It seems the result does not depend on whether the training is done with the Z=1 or Z=0 class
      • Question deals with robustness of the result to various levels of pile-up
    • Tatiana: Did you compare with the approach of Mike Williams (uBoost no
      • one difference: this method was developed for non-observable parameters.
      • Subsequent discussion focused on whether the choice of pile-up as a nuisance parameter was justified.


Mike Williams, Event generator tuning using Bayesian Optimization

  • Introduction on what Bayesian optimization is
  • Investigate if it can be used for MC tuning -> generate posterior distributions for Pythia parameter based on data
  • This method can automatically assign uncertainties to the parameters
  • Use "Monash" as true data, see if they can recover Monash parameters via bayesian optimization
  • Slide 3: if a parameter is not really constrained by a particular observation it correctly gets a huge error bar
    • Convergence is relatively fast (order of 50 queries * nParameters)
  • Global fit of all 20 parameters is less precise than "block" training, but still very satisfactory
  • Potential improvements discussed on the slides: expert knowledge, knowledge transfer, pre-simulation of small samples, treatment of discrete parameters, extension to larger parameter spaces
  • Questions
    • Sergei : bayesian optimization of ML hyperparameters, is there anybody who tried that? 
      • Baldi et. al.  
      • Some other people did it, for instance there is an entry on Tim Head's blog
    • Sergei: Does the tool (spearmint) work in parallel? Yes, but few choiches on how  parallelize (e.g. pythia -> trivial parallelization). The tool allows for parallel optimization with different data set, useful if you have distributed resources.
    • Sergei: Next step? Will you try this with real data?
      • Yes, will be tested on real data. Was tested on ee, finds better chi2 than default tune, but some parameters seem unphysical


Jonah Bernhard,  Applying Bayesian parameter estimation to relativistic heavy-ion collisions

  • Postponed due to technical problems with Vidyo


There are minutes attached to this event. Show them.
    • 15:00 15:15
      News and group updates 15m
      Speakers: Lorenzo Moneta (CERN), Michele Floris (CERN), Paul Seyfert (Universita & INFN, Milano-Bicocca (IT)), Dr Sergei Gleyzer (University of Florida (US)), Steven Randolph Schramm (Universite de Geneve (CH))
    • 15:15 15:35
      Generative Models and EM Learning Algorithm 20m
      Speaker: Enrico Guiraud (Università degli Studi e INFN Milano (IT))
    • 15:35 15:55
      Learning to Pivot with Adversarial Networks 20m
      Speakers: Gilles Louppe (New York University (US)), Kyle Stuart Cranmer (New York University (US)), Michael Aaron Kagan (SLAC National Accelerator Laboratory (US))
    • 15:55 16:15
      Simulation tuning using Bayes optimization 20m
      Speaker: J Michael Williams (Massachusetts Inst. of Technology (US))
    • 16:15 16:35
      Applying Bayesian parameter estimation to relativistic heavy-ion collisions (postponed) 20m
      Speaker: Jonah Bernhard
    • 16:35 16:36
      Experimenting with Generative Adversarial Networks for Fast Simulation (postponed) 1m
      Speaker: Sebastian Neubert (Ruprecht-Karls-Universitaet Heidelberg (DE))