IML meeting: May 17, 2016 Peak people on Vidyo: 18 Peak people in the room: 21 Sergei: Intro and news - IML is now recognized as an LPCC group - Paul Seyfert is replacing Tim Head as the LHCb contact - Lorenzo Moneta is the LPCC representative - This meeting focuses on regression, but some planned contributions for today have been delayed to the next meeting (scheduling time constraints) - Next meeting: July 5 Sergei: Regression in TMVA update - Generally works very well out of the box - Some known shortcomings to be addressed this summer - Single loss function - Distributed memory management - Additional features to be added - Multi-objective regression - Questions deferred to after other talks, as they will touch on the topics David: b-jet energy regression - Energy we get for jets (neutrino decays in particular) will be under-reconstructed, especially for b-jets - Number of corrections to address this - Already five corrections, the regression is a sixth step (L6) - Focus is on Hbb, looking at jet pT and dijet mass (try to make as narrow as possible around 125 GeV) - Making use of TMVA BDT regression package, only minor changes from defaults - One limitation encountered was the single loss function in TMVA - Regression takes width from 18.2 GeV to 16.5 GeV, a width/mean improvement of 20% - Validated in data by looking at the two leptons on the other side of the event to balance - Also tried another package outside of TMVA and saw similar results (21% improvement with best loss function) - Question (Sergei): did you try anything other than BDTs for regression in TMVA? - David: only tried BDTs within TMVA - Outside of TMVA, tried SciKit, slightly worse performance but less time optimizing - Expect approximately the same performance after more optimization - Question (Steven): Hbb is stats limited so I guess not much focus beyond BDTs, but any future ideas? - David: looking into ttbar samples from top decays, avoiding Hbb sample dependence and providing much more stats - So far this hasn't been done and only used BDTs, but want to think about this more in the future - Question (Sergei): in terms of the features, any optimization? - David: started with a huge list, started cutting out one at a time - Basic hyperparameter optimization done too, standard optimization Marian: Function approximation and fitting in ALICE - Huge number of use cases for regression within ALICE (see slide 5 for a big list) - Motivations: - Typically, analytical models are not sufficiently precise (need to apply N-dim correction) - Underlying model is too complex, CPU limitations - Parameters of known functions are unknown and have to be fitted - Fast prototyping needed to consider multiple models - Generic code - Implemented a local polynomial regression with kernel smoother (single parameter sets smoothness) - Code tested in real applications up to 5 dimensions - Working on visualization tools, interfaced to TFormula through a static function - Used for "interactive" visualization of the fit function projection and residuals - Interface and code examples provided on slide 11 - Space point distortion calibration a major challenge for ALICE TPCs - Full analytical solution is very CPU/GPU consuming when using required time granularity - Linear approximation of correction seems promising - Successfully tested in 1D (variation with time only in z-dir) works - Need to expand the approximation to 3D and ensure it works as needed - Very low momenta track finders: loop finder - Strongly distorted tracks, high density (cluster pileup) - Unknown z position and t0 offset, to be fitted to estimate the distortion - Comment (Sergei): have own code developed for local regression - Might be good to try out-of-the-box BDT regression as some baseline step (before moving to Deep learning or similar) - Question (Sergei): for local regression work, are there things you are doing not yet in ROOT/etc that you would find useful to add? - Marian - Yes, will discuss with ROOT Dev. this week Tom: TMVA cross-validation update - Update on previous talk (linked in slides, from a past IML meeting) - Train classifier with k-1 folds and test with the remaining fold, repeat k times - Then take the average of your metrics (for example) - Basic version already integrated in TMVA (latest ROOT version) - Currently only works for MVAs with implementations of "OptimizeTuningParameters" - Just BDTs and SVMs for now (?) - Comment (Sergei): fairly new feature, will be demonstrated in tomorrow's ALICE workshop via a jupyter notebook - Question (Sergei): in terms of new metrics, can you be a bit more specific? - Tom: idea is to have same output as for a single MVA, but would have each fold listed and then an average - Question (Steven): Have you looked at the number of folds and determined when it becomes a waste of CPU or similar? - Tom: dataset specific, idea is to look at your dataset and trying different numbers of folds to see which performs best - Question (Gilles - Vidyo): Nested cross-validation, best parameters of model and independent estimate of how good that model is? - Tom: in the current implementation too, but with hyperparameter tuning you could - Sergei: short answer is not yet, but it's planned Serguei: Multivariate histogram comparison - Slides are complete, please see for full details, below is a very short summary - Given a pair of histograms, how do we assess whether they are similar? - Usually we consider a test statistic, several definitions, such as chi^2 - We can use the distributions of some test statistics as inputs to MVAs - Consider the significance of the difference with multiple test statistics bin-by-bin - Use these to calculate statistical moments of the distribution - Hypothesis testing requires knowledge of the distributions of the test statistics, can use MC - Many advantages of this approach, especially the simple extension to multi-dimensionality - Can be extended to include any univariate test statistic as additional coordinates - TODO: need some help here for what we say - Question (Sergei): The example was only 2D, can this be extended? - Serguei: Two dimensions shown, but can be extended in a straightforward fashion - Comment (Sergei): This is still a direct cut approach with multiple variables - Will be interesting to see with ML applied on it - Very useful for anomaly detection, a direct application - Question (Sergei): How is bin size determined? - Serguei: independent from the bin size - Comment (Sergei): Interesting to try this on the "anomaly detection RAMP dataset" from the HSF workshop - Will be interesting to see how this compares to the other methods All: AOB - Sergei: new CERN SFT/IT service called SWAN (Service for Web-based ANalysis), swan.cern.ch - Run on the cloud, interfaced to other CERN services - Useful for tutorials/etc, directly executable examples (don't need to worry about installing software)