IML meeting: March 4, 2016
Peak people on Vidyo: 24
Peak people in the room: 21

Sergei: Intro and news
    - Michele Floris is joining IML for ALICE
    - Today: Part II of HEP-ML tools
    - Next IML meeting on April 14, focus is Deep Learning
    - Contact us if you want to present your Deep Learning related studies!
    - QCHS2016 will have a parallel session on statistics and stat methods, with a focus on ML
        - Contact Tommaso Dorigo or Sergei Gleyzer for more information

Marcin: Report on the Heavy Flavor Data Mining Workshop
    - Good mixture of ML and physics communities, lots of good discussions
    - Had four ML tutorials, uploaded to indico, people are welcome to look into them
        - SciKit-Learn
        - Google TensorFlow
        - nVidia Deep Learning
        - REP (Reproducible Experiment Platform)
    - Awarded physics prize winners for the LHCb flavours of physics challenge
        - Vicens Gaitan: "data doping"
        - Alexander Rakhlin: "transfer learning"
    - Trialed an open space discussion, worked very well (no convenors, just open discussion)
    - Lots of other interesting talks on ML related topics in HEP
        - Please take a look at the indico
    - Very useful means of increasing collaboration with ML community
    - Question (Tobias): What topics were covered in open space discussion?
        - Marcin: Prepare list of topics
        - People also proposed their own topics of interest
        - Split groups based on interests
        - Spread was very wide: large data optimization storage, GPUs, regularization, parameter optimization, ...
        - Full summary is on indico
        - Sergei: Might be a nice idea to put together "brief idea" on this
    - Question (Sergei): title of workshop is "heavy flavor", what were the connections?
        - Marcin: lots of problems finding the name, not really heavy flavor oriented
        - Only that name as connected to the LHCb heavy flavor challenge
        - Didn't discriminate, was really a general discussion on ML in HEP

Ozgur: SVM-Hint
    - General introduction to SVM provided in the slides
    - SVM interface with root, based on widely used libSVM
    - Used discovery significance based algorithms (outperform other measures for physics searches)
    - Used Asimov Significant estimator
    - Used TMVA's BDT and SVM as a benchmark for performance
    - Studied CPU performance for BDT vs SVM and TMVA vs SVM-hint, with and without threads
    - SVM-Hint with 12 threads has fastest timing performance, scales with number of inputs
    - Studied simplified T2tt model with Delphes fast simulation
    - Considered 25 variables to discriminate signal vs background
    - Separated into four subsets
        - First is all variables
        - Second is only low-level variables
        - Third only high level
        - Fourth is subset
    - SVM-hint outperforms TMVA BDT here, both benefit from high number of variables
    - Code is available on github
    - Question (Steven): high vs low complexity variables, which helps more?
        - Ozgur: high level variables help more, ~2.5 vs ~6 sigma
    - Question (Adrian): for BDT, do you use same grid search as you use for SVM?  (#trees, depth, etc)
        - Ozgur: We use TMVA with 8 configurations, got cut value proposed by TMVA's BDT
        - So answer is no, not using asimov significance in BDT side
        - Sergei: Can you tell us more about the signficiance calculation?
        - Ozgur: slide 17 shows details, estimator built on the Asimov dataset
        - Sergei: what regime is this better in?  When S ~ B?
        - Ozgur: We are normally looking at lower background
        - Sergei: less optimistic than if just using S/sqrt{B}
        - Sergei: which tool did you use to compute this?  RooStats?
        - Ozgur: we are using our own implementation, the formula on slide 17
        - Lorenzo: this is an approximation, need something like RooStats for full treatment
    - Question (Sergei): what datasets were used?
        - Ozgur: configurations are publically available, data sizes are big
        - They are on DESY resources so if you have access, can be provided
        - Same config as for Snowmass studies
        - Small toy sample used for slide 9 is available
    - Question (Adrian): what's the memory usage like for SVM-hint compared to TMVA implementation
        - Ozgur: had some problems with TMVA's SVM, but don't remember for sure
        - Can reproduce this example and check memory usage
    - Question (Vidyo): what version of TMVA and Root have you been using?
        - Ozgur: Version from 2014, officially available version on webpage, studies have been going for a while
        - Vidyo: Ok, that will be quite old then
    - Question (Sergei): What guided the fourth set of variables (slide 11)?
        - Ozgur: Main variables used in this analysis in CMS, so studying core selection
    - Question (Sergei): Cross-validation?
        - Ozgur: separated samples into three
        - For both BDT and SVM used it for two-fold cross validation
    - Question (Vidyo): Also SVM implementaion in SciKit-learn, how does SVM-hint compare to it?
        - Ozgur: haven't looked at it, confident about libSVM's performance
        - Vidyo: it also relies on libSVM, right?
        - Gilles: SKlearn should be similar in terms of performance as also relies on libSVM
    - Question (Sergei): version on git is usable?
        - Ozgur: tested and working on linux, but doesn't work on mac
    - Question (Sergei): is there a way to provide automatic ranking of features?
        - Ozgur: no, not available right now

Andrew: BDTlib
    - Andrew can't seem to connect on Vidyo, Sergei will try to present the slides
    - Gradient boosted decision tree package, growing out of attempts to do regression
    - Features of the package:
        - Feature variable ranking
        - Variety of loss functions (only one variant in TMVA)
        - Ease of adding your own loss function
        - Regression
        - Classification (recently developed and not yet tested)
        - Easy to use and well commented code
        - Store the trees into xml and read stored trees from xml
    - Andrew joined at this point, taking over from Sergei
    - Outperforms TMVA, mostly comes from new loss functions
    - Simple code example is provided
    - Package is available on github for people who want to try it out
    - Question (Steven): have you compared with XGboost, SKlearn, etc?
        - Andrew: compared with SKlearn with similar functions, seems to perform about the same
        - but only quick comparison, not formal check-through, not compared with others
    - Question (Sergei): Have rate variables, what are you doing there?
        - Andrew: can see how much a split reduces the error
        - Keep track of how much each variable reduces the error each time you split on it
        - Sum up total error for each variable, gives a ranking
        - BDTs lends themselves very well to variable ranking, calculate as you go
    - Comment (Sergei): Wrok going on to make it easier to have different loss functions in TMVA
        - Address different cases, as it's true that we need different things in HEP for different studies
        - Andrew: yes, found this very helpful in my studies, would be good to have in TVMA
    - Question (Joosep): if I start to use this package, would like to know if it's still supported in three years
        - Andrew: When I choose software, don't necessarily want best performance, but want it to be well supported
        - What's the case here?
        - Joosep: Ok, this won't really be usable for physics
        - Isn't this a waste of our time?  Shouldn't IML be about reducing this reinventing the wheel?
        - Sergei: for learning how things work for a graduate student, this is useful
        - But end result is demonstrating these loss functions should be added to TMVA
        - which is then supported on a long timescale and is applicable for physics
        - Steven: Once IML becomes more formal, we can have focus groups with experiments collaborating
        - Then this will be supported with effort from each experiment
        - Will reduce this duplication and increase long-term support, thus benefitting experiments
        - Tim: No better way to learn what something does other than do it yourself
        - Interesting to have talks to explain how these work and so people learn
        - However ultimately we should "delete" it and have some common implementation which is supported
        - Sergei: If the tool provides some feature which doesn't exist, can be used until it's added to main supported tools

Dan: deep learning with python
    - Deep learning has become very popular (possibly just buzz, but popular nonetheless)
    - What is theano - that's what ML guy said he does
    - Theano and SKlearn are ~equally popular, SKlearn is not DL while theano is DL (different choices)
        - Becoming dominant packages in field of ML
    - Within HEP, situation is different, main use is TMVA
    - Packages based in python have huge advantages
        - Major support for free from outside HEP community
        - Python arguably easier to write/use
        - Software is already there, newest cutting-edge algorithm is already integrated in most cases
    - What's holding us back?
        - TMVA has lots of inertia, already used and understood
        - Lack of "glue packages" where TMVA works as-is and others need some help
    - Our analysis runs almost entirely in C++
        - However, our optimization could be done in anything we want: python or otherwise
    - Lots of interesting results that haven't been imported back to C++ (jet images, etc)
    - For BDTs from SKlearn, it can be imported back to TMVA
    - When using modern DL, often cannot so easily re-import
        - HDF5, JSON, YAML, NPY, etc
    - Wrote "Lightweight Nural Networks"
        - Minimal dependencies
        - Just applies a neural network
        - Not a complicated thing, on github
        - Can add more as needed, but very simple so easy to expand
    - General idea:
        - Don't write more frameworks, they already exist
        - Help with glue packages instead to help existing frameworks be usable in our code
        - "If you want to use DL right now, we should talk"
    - Question (Sergei): a lot more discussion on the DL side next meeting
        - General point on glue packages and building on existing frameworks is a good one
        - If we can make standalone things that work, very beneficial
        - Dan: this is part of the design of the lightweight package
        - Relies only on boost and eigen
        - Sergei: would like to have more support for plug-and-play models trained in other areas into TMVA
        - Joosep: this is amazing, exactly what we need in the experiments
        - Can those who are used to python ML provide info/help to those who are not?
    - Question (Lorenzo): using numpy, is this satisfactory for you, or do you need more?
        - Dan: Root numpy is good, but if this was more standard, would make this a lot easier
        - Lorenzo: If we made this part of root directly, would it help?
        - Dan: absolutely
        - Lorenzo: everything needs to be in memory though, right?
        - Dan: usually not a limitation on modern machines, but can be a challenge, may need to train in batches
        - Can get around it, but may be annoying