IML meeting: May 17, 2016
Peak people on Vidyo: 18
Peak people in the room: 21

Sergei: Intro and news
    - IML is now recognized as an LPCC group
    - Paul Seyfert is replacing Tim Head as the LHCb contact
    - Lorenzo Moneta is the LPCC representative
    - This meeting focuses on regression, but some planned contributions for today have been delayed to the next meeting (scheduling time constraints)
    - Next meeting: July 5

Sergei: Regression in TMVA update
    - Generally works very well out of the box
    - Some known shortcomings to be addressed this summer
        - Single loss function
        - Distributed memory management
    - Additional features to be added
        - Multi-objective regression
    - Questions deferred to after other talks, as they will touch on the topics

David: b-jet energy regression
    - Energy we get for jets (neutrino decays in particular) will be under-reconstructed, especially for b-jets
    - Number of corrections to address this
    - Already five corrections, the regression is a sixth step (L6)
    - Focus is on Hbb, looking at jet pT and dijet mass (try to make as narrow as possible around 125 GeV)
    - Making use of TMVA BDT regression package, only minor changes from defaults
    - One limitation encountered was the single loss function in TMVA
    - Regression takes width from 18.2 GeV to 16.5 GeV, a width/mean improvement of 20%
    - Validated in data by looking at the two leptons on the other side of the event to balance
    - Also tried another package outside of TMVA and saw similar results (21% improvement with best loss function)
    - Question (Sergei): did you try anything other than BDTs for regression in TMVA?
        - David: only tried BDTs within TMVA
        - Outside of TMVA, tried SciKit, slightly worse performance but less time optimizing
        - Expect approximately the same performance after more optimization
    - Question (Steven): Hbb is stats limited so I guess not much focus beyond BDTs, but any future ideas?
        - David: looking into ttbar samples from top decays, avoiding Hbb sample dependence and providing much more stats
        - So far this hasn't been done and only used BDTs, but want to think about this more in the future
    - Question (Sergei): in terms of the features, any optimization?
        - David: started with a huge list, started cutting out one at a time
        - Basic hyperparameter optimization done too, standard optimization

Marian: Function approximation and fitting in ALICE
    - Huge number of use cases for regression within ALICE (see slide 5 for a big list)
    - Motivations:
        - Typically, analytical models are not sufficiently precise (need to apply N-dim correction)
        - Underlying model is too complex, CPU limitations
        - Parameters of known functions are unknown and have to be fitted
        - Fast prototyping needed to consider multiple models
        - Generic code
    - Implemented a local polynomial regression with kernel smoother (single parameter sets smoothness)
        - Code tested in real applications up to 5 dimensions
        - Working on visualization tools, interfaced to TFormula through a static function
            - Used for "interactive" visualization of the fit function projection and residuals
    - Interface and code examples provided on slide 11
    - Space point distortion calibration a major challenge for ALICE TPCs
        - Full analytical solution is very CPU/GPU consuming when using required time granularity
        - Linear approximation of correction seems promising
            - Successfully tested in 1D (variation with time only in z-dir) works
            - Need to expand the approximation to 3D and ensure it works as needed
    - Very low momenta track finders: loop finder
        - Strongly distorted tracks, high density (cluster pileup)
        - Unknown z position and t0 offset, to be fitted to estimate the distortion
    - Comment (Sergei): have own code developed for local regression
        - Might be good to try out-of-the-box BDT regression as some baseline step (before moving to Deep learning or similar)
    
    - Question (Sergei): for local regression work, are there things you are doing not yet in ROOT/etc that you would find useful to add?
        - Marian - Yes, will discuss with ROOT Dev. this week

Tom: TMVA cross-validation update
    - Update on previous talk (linked in slides, from a past IML meeting)
    - Train classifier with k-1 folds and test with the remaining fold, repeat k times
    - Then take the average of your metrics (for example)
    - Basic version already integrated in TMVA (latest ROOT version)
    - Currently only works for MVAs with implementations of "OptimizeTuningParameters"
        - Just BDTs and SVMs for now (?)
    - Comment (Sergei): fairly new feature, will be demonstrated in tomorrow's ALICE workshop via a jupyter notebook
    - Question (Sergei): in terms of new metrics, can you be a bit more specific?
        - Tom: idea is to have same output as for a single MVA, but would have each fold listed and then an average
    - Question (Steven): Have you looked at the number of folds and determined when it becomes a waste of CPU or similar?
        - Tom: dataset specific, idea is to look at your dataset and trying different numbers of folds to see which performs best
    - Question (Gilles - Vidyo): Nested cross-validation, best parameters of model and independent estimate of how good that model is?
        - Tom: in the current implementation too, but with hyperparameter tuning you could
        - Sergei: short answer is not yet, but it's planned

Serguei: Multivariate histogram comparison
    - Slides are complete, please see for full details, below is a very short summary
    - Given a pair of histograms, how do we assess whether they are similar?
        - Usually we consider a test statistic, several definitions, such as chi^2
    - We can use the distributions of some test statistics as inputs to MVAs
    - Consider the significance of the difference with multiple test statistics bin-by-bin
    - Use these to calculate statistical moments of the distribution
    - Hypothesis testing requires knowledge of the distributions of the test statistics, can use MC
    - Many advantages of this approach, especially the simple extension to multi-dimensionality
    - Can be extended to include any univariate test statistic as additional coordinates
    - TODO: need some help here for what we say
    - Question (Sergei): The example was only 2D, can this be extended?
        - Serguei: Two dimensions shown, but can be extended in a straightforward fashion
    - Comment (Sergei): This is still a direct cut approach with multiple variables
        - Will be interesting to see with ML applied on it
        - Very useful for anomaly detection, a direct application
    - Question (Sergei): How is bin size determined?
        - Serguei: independent from the bin size
    - Comment (Sergei): Interesting to try this on the "anomaly detection RAMP dataset" from the HSF workshop
        - Will be interesting to see how this compares to the other methods


All: AOB
    - Sergei: new CERN SFT/IT service called SWAN (Service for Web-based ANalysis), swan.cern.ch
        - Run on the cloud, interfaced to other CERN services
        - Useful for tutorials/etc, directly executable examples (don't need to worry about installing software)