IML meeting: March 4, 2016 Peak people on Vidyo: 24 Peak people in the room: 21 Sergei: Intro and news - Michele Floris is joining IML for ALICE - Today: Part II of HEP-ML tools - Next IML meeting on April 14, focus is Deep Learning - Contact us if you want to present your Deep Learning related studies! - QCHS2016 will have a parallel session on statistics and stat methods, with a focus on ML - Contact Tommaso Dorigo or Sergei Gleyzer for more information Marcin: Report on the Heavy Flavor Data Mining Workshop - Good mixture of ML and physics communities, lots of good discussions - Had four ML tutorials, uploaded to indico, people are welcome to look into them - SciKit-Learn - Google TensorFlow - nVidia Deep Learning - REP (Reproducible Experiment Platform) - Awarded physics prize winners for the LHCb flavours of physics challenge - Vicens Gaitan: "data doping" - Alexander Rakhlin: "transfer learning" - Trialed an open space discussion, worked very well (no convenors, just open discussion) - Lots of other interesting talks on ML related topics in HEP - Please take a look at the indico - Very useful means of increasing collaboration with ML community - Question (Tobias): What topics were covered in open space discussion? - Marcin: Prepare list of topics - People also proposed their own topics of interest - Split groups based on interests - Spread was very wide: large data optimization storage, GPUs, regularization, parameter optimization, ... - Full summary is on indico - Sergei: Might be a nice idea to put together "brief idea" on this - Question (Sergei): title of workshop is "heavy flavor", what were the connections? - Marcin: lots of problems finding the name, not really heavy flavor oriented - Only that name as connected to the LHCb heavy flavor challenge - Didn't discriminate, was really a general discussion on ML in HEP Ozgur: SVM-Hint - General introduction to SVM provided in the slides - SVM interface with root, based on widely used libSVM - Used discovery significance based algorithms (outperform other measures for physics searches) - Used Asimov Significant estimator - Used TMVA's BDT and SVM as a benchmark for performance - Studied CPU performance for BDT vs SVM and TMVA vs SVM-hint, with and without threads - SVM-Hint with 12 threads has fastest timing performance, scales with number of inputs - Studied simplified T2tt model with Delphes fast simulation - Considered 25 variables to discriminate signal vs background - Separated into four subsets - First is all variables - Second is only low-level variables - Third only high level - Fourth is subset - SVM-hint outperforms TMVA BDT here, both benefit from high number of variables - Code is available on github - Question (Steven): high vs low complexity variables, which helps more? - Ozgur: high level variables help more, ~2.5 vs ~6 sigma - Question (Adrian): for BDT, do you use same grid search as you use for SVM? (#trees, depth, etc) - Ozgur: We use TMVA with 8 configurations, got cut value proposed by TMVA's BDT - So answer is no, not using asimov significance in BDT side - Sergei: Can you tell us more about the signficiance calculation? - Ozgur: slide 17 shows details, estimator built on the Asimov dataset - Sergei: what regime is this better in? When S ~ B? - Ozgur: We are normally looking at lower background - Sergei: less optimistic than if just using S/sqrt{B} - Sergei: which tool did you use to compute this? RooStats? - Ozgur: we are using our own implementation, the formula on slide 17 - Lorenzo: this is an approximation, need something like RooStats for full treatment - Question (Sergei): what datasets were used? - Ozgur: configurations are publically available, data sizes are big - They are on DESY resources so if you have access, can be provided - Same config as for Snowmass studies - Small toy sample used for slide 9 is available - Question (Adrian): what's the memory usage like for SVM-hint compared to TMVA implementation - Ozgur: had some problems with TMVA's SVM, but don't remember for sure - Can reproduce this example and check memory usage - Question (Vidyo): what version of TMVA and Root have you been using? - Ozgur: Version from 2014, officially available version on webpage, studies have been going for a while - Vidyo: Ok, that will be quite old then - Question (Sergei): What guided the fourth set of variables (slide 11)? - Ozgur: Main variables used in this analysis in CMS, so studying core selection - Question (Sergei): Cross-validation? - Ozgur: separated samples into three - For both BDT and SVM used it for two-fold cross validation - Question (Vidyo): Also SVM implementaion in SciKit-learn, how does SVM-hint compare to it? - Ozgur: haven't looked at it, confident about libSVM's performance - Vidyo: it also relies on libSVM, right? - Gilles: SKlearn should be similar in terms of performance as also relies on libSVM - Question (Sergei): version on git is usable? - Ozgur: tested and working on linux, but doesn't work on mac - Question (Sergei): is there a way to provide automatic ranking of features? - Ozgur: no, not available right now Andrew: BDTlib - Andrew can't seem to connect on Vidyo, Sergei will try to present the slides - Gradient boosted decision tree package, growing out of attempts to do regression - Features of the package: - Feature variable ranking - Variety of loss functions (only one variant in TMVA) - Ease of adding your own loss function - Regression - Classification (recently developed and not yet tested) - Easy to use and well commented code - Store the trees into xml and read stored trees from xml - Andrew joined at this point, taking over from Sergei - Outperforms TMVA, mostly comes from new loss functions - Simple code example is provided - Package is available on github for people who want to try it out - Question (Steven): have you compared with XGboost, SKlearn, etc? - Andrew: compared with SKlearn with similar functions, seems to perform about the same - but only quick comparison, not formal check-through, not compared with others - Question (Sergei): Have rate variables, what are you doing there? - Andrew: can see how much a split reduces the error - Keep track of how much each variable reduces the error each time you split on it - Sum up total error for each variable, gives a ranking - BDTs lends themselves very well to variable ranking, calculate as you go - Comment (Sergei): Wrok going on to make it easier to have different loss functions in TMVA - Address different cases, as it's true that we need different things in HEP for different studies - Andrew: yes, found this very helpful in my studies, would be good to have in TVMA - Question (Joosep): if I start to use this package, would like to know if it's still supported in three years - Andrew: When I choose software, don't necessarily want best performance, but want it to be well supported - What's the case here? - Joosep: Ok, this won't really be usable for physics - Isn't this a waste of our time? Shouldn't IML be about reducing this reinventing the wheel? - Sergei: for learning how things work for a graduate student, this is useful - But end result is demonstrating these loss functions should be added to TMVA - which is then supported on a long timescale and is applicable for physics - Steven: Once IML becomes more formal, we can have focus groups with experiments collaborating - Then this will be supported with effort from each experiment - Will reduce this duplication and increase long-term support, thus benefitting experiments - Tim: No better way to learn what something does other than do it yourself - Interesting to have talks to explain how these work and so people learn - However ultimately we should "delete" it and have some common implementation which is supported - Sergei: If the tool provides some feature which doesn't exist, can be used until it's added to main supported tools Dan: deep learning with python - Deep learning has become very popular (possibly just buzz, but popular nonetheless) - What is theano - that's what ML guy said he does - Theano and SKlearn are ~equally popular, SKlearn is not DL while theano is DL (different choices) - Becoming dominant packages in field of ML - Within HEP, situation is different, main use is TMVA - Packages based in python have huge advantages - Major support for free from outside HEP community - Python arguably easier to write/use - Software is already there, newest cutting-edge algorithm is already integrated in most cases - What's holding us back? - TMVA has lots of inertia, already used and understood - Lack of "glue packages" where TMVA works as-is and others need some help - Our analysis runs almost entirely in C++ - However, our optimization could be done in anything we want: python or otherwise - Lots of interesting results that haven't been imported back to C++ (jet images, etc) - For BDTs from SKlearn, it can be imported back to TMVA - When using modern DL, often cannot so easily re-import - HDF5, JSON, YAML, NPY, etc - Wrote "Lightweight Nural Networks" - Minimal dependencies - Just applies a neural network - Not a complicated thing, on github - Can add more as needed, but very simple so easy to expand - General idea: - Don't write more frameworks, they already exist - Help with glue packages instead to help existing frameworks be usable in our code - "If you want to use DL right now, we should talk" - Question (Sergei): a lot more discussion on the DL side next meeting - General point on glue packages and building on existing frameworks is a good one - If we can make standalone things that work, very beneficial - Dan: this is part of the design of the lightweight package - Relies only on boost and eigen - Sergei: would like to have more support for plug-and-play models trained in other areas into TMVA - Joosep: this is amazing, exactly what we need in the experiments - Can those who are used to python ML provide info/help to those who are not? - Question (Lorenzo): using numpy, is this satisfactory for you, or do you need more? - Dan: Root numpy is good, but if this was more standard, would make this a lot easier - Lorenzo: If we made this part of root directly, would it help? - Dan: absolutely - Lorenzo: everything needs to be in memory though, right? - Dan: usually not a limitation on modern machines, but can be a challenge, may need to train in batches - Can get around it, but may be annoying