IML Machine Learning Working Group Meeting - Bayesian Optimization and Generative/Adversarial Models
→
Europe/Zurich
4/3-006 - TH Conference Room (CERN)
Sergei, News
- Next meeting (January): multi-class and multi-objective classification/regression, catch-all, workshop reports
- Contrary to what we announced in November, the topic is not going to be a tagging. We plan to have a dedicated workshop (see below), due to the large number of proposed contributions
- Community White Paper (CWP) of Hep Software Foundation to be produced by sometime next summer, includes machine learning. IML should contribute to that.
- IML Workshop planned for March 20-22 (see slides for details)
- Tagging workshop (with hands-on) session
- CWP discussion
- Tutorials
Enrico Guiraud, Generative models and EM algorithm
- Introduction to generative models
- Main ingredients
- Latent variables (not observed, e.g. the model is made by 3 gaussians the parameters of the individual gaussians are the latent variables)
- Observed variables
- Model parameters
- Methematically: conditional probability of observed data depending on hidden states (latent variables) and parameters
- Can be treated as a likelihood maximization problem
- Directly maximizing the log likelihood very difficult/impossible due to the large sum over hidden states
- EM algorithm: workaround the problem maximizing the "free energy", obtained introducing variational distributions q(s)
- Details on the choice of the q(s) are discussed in the slides
- An example is shown with the "noisy-OR" model
- A more complex example is applied to the Mnist data set
- Recovers "digits"-like latents
- Difference with respect to other approaches: the model is explicit (you have to write a model in terms of hidden variables)
- Questions
- Kyle: Latent variables in HEP are Monte Carlo truth, but much more complex: millions of hidden variables in geant simulation
- Possibility to merge to kinds of generative models, non-explicit for simplified model of the detector, and EM minimization to get access to hidden variables
- Does not replace a full simulation, but it could provide insight on a few carefully chosen hidden variables
- Sergei: can think about using this for a fast simulation.
- Kyle: Latent variables in HEP are Monte Carlo truth, but much more complex: millions of hidden variables in geant simulation
Gilles Louppe, Learning to Pivot with adversaria networks
- How to use a generative model to constrain a classifier
- One of the typical problems in phsyics: how to incorporate/treat systematic uncertainties coming from the model uncertainties
- Goal: find a classifier which is not sensitive to systematic variations of nuisance parameters
- Slide 4: it means finding a classifier f which is a "pivotal quantity"
- 2 networks:
- one is the classifier
- one is the adversary, which produces the posterior value of the nuisance parameters based on the output of the classifier
- If the adversary can produce a meaningful posterior of z it means that the classifier depends on the nuisance parameter
- Want to make the adversary very bad
- Details on the architecture shown in the slides
- Strategy: minimize a loss function built with the loss function of the classifier minus the loss function of the adversary
- (Proof on the mini-max optimization and minimization algorithm shown in the slides)
- Weight of the adversary can be controlled by a parameter λ, controls the trade off between accuracy and robustness
- Toy example discussed in the slides: two classes (gaussians) where the exact relative position is not known
- Shows that the method works and that the robustness comes at the price of poorer classification performance
- HEP-inspired example also shown (W and QCD jets discrimination)
- Nuisance: pileup (extreme cases of 0 or 50 pileup events considered)
- Slide 17: optimize λ with respect to some other objective, e.g. the final median statistical significance of the signal
- Questions
- Sergei Slide 18: what are the bands? Experiment repeated multiple times.
- interesting to observe that the treshold changes
- Yes, but this is effectively a different classifier.
- Sergei: What do the curves look like for 10< λ < 500, same general shape?
- Answer: looked at it, didn’t show but generally yes
- Sergei: did you try with multiple nuisance parameters? Not yet, but extension should be trivial
- Tatiana: What's the proportion of events with 0 and 50 pile-up events? 50-50
- It seems the result does not depend on whether the training is done with the Z=1 or Z=0 class
- Question deals with robustness of the result to various levels of pile-up
- Tatiana: Did you compare with the approach of Mike Williams (uBoost https://arxiv.org/abs/1305.7248)? no
- one difference: this method was developed for non-observable parameters.
- Subsequent discussion focused on whether the choice of pile-up as a nuisance parameter was justified.
- Sergei Slide 18: what are the bands? Experiment repeated multiple times.
Mike Williams, Event generator tuning using Bayesian Optimization
- Introduction on what Bayesian optimization is
- Investigate if it can be used for MC tuning -> generate posterior distributions for Pythia parameter based on data
- This method can automatically assign uncertainties to the parameters
- Use "Monash" as true data, see if they can recover Monash parameters via bayesian optimization
- Slide 3: if a parameter is not really constrained by a particular observation it correctly gets a huge error bar
- Convergence is relatively fast (order of 50 queries * nParameters)
- Global fit of all 20 parameters is less precise than "block" training, but still very satisfactory
- Potential improvements discussed on the slides: expert knowledge, knowledge transfer, pre-simulation of small samples, treatment of discrete parameters, extension to larger parameter spaces
- Questions
- Sergei : bayesian optimization of ML hyperparameters, is there anybody who tried that?
- Baldi et. al.
- Some other people did it, for instance there is an entry on Tim Head's blog
- Sergei: Does the tool (spearmint) work in parallel? Yes, but few choiches on how parallelize (e.g. pythia -> trivial parallelization). The tool allows for parallel optimization with different data set, useful if you have distributed resources.
- Sergei: Next step? Will you try this with real data?
- Yes, will be tested on real data. Was tested on ee, finds better chi2 than default tune, but some parameters seem unphysical
- Sergei : bayesian optimization of ML hyperparameters, is there anybody who tried that?
Jonah Bernhard, Applying Bayesian parameter estimation to relativistic heavy-ion collisions
- Postponed due to technical problems with Vidyo
There are minutes attached to this event.
Show them.