IML meeting: June 16, 2017
Peak people on Vidyo: 33
Peak people in the room: 28

Sergei: Intro and news
    - CMS now has an ML forum, convened by Sergei and Maurizio Pierini, workshop early July
    - ATLAS ML workshop happened two weeks ago
    - New monthly LPCC data science seminar series
        - See series category on indico: indico.cern.ch/category/9320/
    - ML Community White Paper (CWP) effort continues
        - Aiming towards HSF Annecy workshop, June 26-30
        - Important effort for the future of ML in HEP
    - New Marie-Curie training network on stats and ML in HEP, INSIGHTS
        - Mostly for PhDs and early career researchers
        - Around 12 or 13 PhD students over around 4 years
        - Students will work on HEP applications, software, and tools
    - MLHEP Summer School in Reading, UK (17-23 July)
    - Fermilab ML meeting, July 14
    - Next meeting will be July 13 in Salle Bohr (40-S2-B01)
        - Topic: trigger applications and community training
        - See indico for details: https://indico.cern.ch/event/638056/


Matthew Feickert: HEPML Resources - Knowledge Repository for HEP ML Work
    - Started github repository under IML project
          https://github.com/iml-wg/HEP-ML-Resources
    - Snapshot the ML work and knowledge of the HEP community
        - Current ML information, software, lectures and seminars, papers, workshops
        - Links are in the slides
    - Built while learning about ML, and discovered how much was already out there
    - Intend to provide centralized area, inspired by "Awesome X" repositories
    - If you think this is a good idea, please get involved
        - Right now, repository focused on ATLAS, as that's the speaker's collaboration
        - Great to get more resources from other experiments to ensure good representation
        - Need support to keep it current and updated
    - To contribute, take a look at the contributing document in the repository
        - Several different ways to contribute, depending on time investment
    - Question (Rob Fletcher): github has wiki feature
        - In addition to just markdown code might be useful
        - Matthew: yes, that's an excellent idea
    - Question (Sergei): great resource
        - also on us (authors of papers, etc) to put our work there when public


Josh Bendavid: Use of Machine Learning Techniques for improved Monte Carlo Integration
    - Enormous amount of details and explanations in the slides
        - Below is a small sampling, please see slides for full details
    - Integration: Given arbitrary multidimensional function, want to find the integral
    - Generation: Given an arbitrary multidimensional function, generate an unweighted set of vectors with a probability density
    - Typical algorithm: construct appropriate sampling function, then generate large number of events to evaluate integral
    - VEGAS: construct product of 1D histograms, quite simple, but non-trivial correlations introduce hard limit to precision
    - Foam: divide phase space to hyper-rectangles with optimized boundaries, can estimate non-trivial correlations
        - Close analogue to simple decision trees
        - Why not try boosted decision trees?  Know to work better than simple trees
    - Can do similar things for deep neural networks, focused on generative models
        - Several different approaches (generative adversarial networks, autoencoders, ...)
    - For any given state of a generative network, if input and output space have same dimensionality, can compute probability
    - Introduce function approximator, such as standard DNN regression, together with a weak iterative procedure
        - Addresses problems if function and/or derivatives are difficult/expensive to evaluate
    - Comparisons of VEGAS with Foam with BDTs vs Generative DNNs (slide 33)
        - ML doing excellent job, with minimal function evaluations
    - 9D integration compared (slide 36)
        - BDT does well but with more evaluations than VEGAS, but Generative DNN scales much better with dimensionality
    - Since ML for importance sampling, always have an idea of how good of a job it's doing
        - May need to generate more or less events depending on network precision
        - However, final performance of the integration is controllable
    - Question (Greame): diagnostics, one specific network or a group
        - Josh: for one specific network
        - First train network, then generate large number of samples, compute integration weight for each sample
        - From that, get the distribution on slide 37
        - Greame: comments made about tail, might depend on particular DNN
        - might want to try different networks to see if it varies
        - Josh: agreed, fair point


Bob Stienen: SUSY-AI and the iDark project
    - In BSM searches, no signs have been found, so we set limits
    - Takes a long time to exclude regions of parameter space
        - Excluding a single model point can take hours
        - Can only do this fully if you're in one of the experimental collaborations
    - SUSY-AI works in 19d pMSSM space
        - 310324 model points with known exclusions used as data input
    - Want to be able to do interpolation on this set of model points for those not generated
        - Replacing the complex procedure with ML can be done in ms, not hours
    - Does a very good job of representing the parameter space
        - Not perfect, not 1-1 correspondence, but 93% accuracy at both 8 and 13 TeV
        - Already good, but want to do better
    - What is the probability that, given my classifier output, the prediction is equal to the majority class in that bin
        - Only believe SUSY-AI prediction if it has a X% probability of being correct
        - Horizontal lines on slide 8
    - If we insist on 95% confidence, we get 99% accuracy using 70% of the total data
    - If we go to 99% CL, we get 99.7% accuracy on 50% of the total data
    - If you use SUSY-AI where it's 99.7% accurate and simulation for the other points, you cut time in half and are certain of the result
    - That's current status, now looking forward
    - Next SUSY-AI will not be pMSSM exclusive anymore, will have different types
        - Users will specify the configuration
    - Stacking will try running multiple classifiers on the same dataset at the same time
        - SUSY-AI runs on the combined result of 22 individual analyses, trains on the "Result" column (slide 14)
        - Reduces amounts of info from SUSY-AI, lose physical meaning of which analysis excludes it
        - Now, you can get info on which analysis says what
    - Server-client option added
        - Speeds up time to load classifiers for a given model point (avoid reloading for every operation)
    - Given a single model point in parameter space, can we extract information from its vicinity?
        - "Boundary exploration"
        - Cannot easily be done with the ordinary workflow, maybe SUSY-AI can help with this
    - Obtaining data is still a problem
        - Time consuming to generate, hard to make it public
        - iDark will host public database and plotting interface
        - idarksurvey.com for online demo
    - General summary
        - SUSY-AI already fast and reliable, but being further improved
        - Next version of SUSY-AI will be public in a few weeks
        - Lack of data will be addressed by iDarkSurvey
    - Question (Steven): can imagine expanding this to other areas, such as DM surveys, has this been considered?
        - Bob: Absolutely, is a use there in moving away from simplified models
        - Can create this multi-dim information within this algorithm
        - Currently aimed at SUSY, named SUSY-AI, but generalizes to AInalyses
        - Anyone with data on any parameter space could in principle make a classifier that runs through this program
        - Yes, we want to generalize this to dark matter surveys and other problems
    - Question (Sergei): did you study other methods than random forests
        - Bob: tried everything in scikit-learn, they came out to be the most reliable and fastest training
        - Remain with this algorithm, will behave correctly at energies higher than sampled as exclusion boundary doesn't change with energy
        - Sergei: probably since did it early on, done before NN, as weren't in scikit-learn yet
        - Bob: yeah, want to try with neural networks, may be worthwhile to do so


Zahari Kassabov: Learning parton densities with neural networks - The NNPDF Methodology
    - Functions f_i(x,M_x^2) need to be learned from data
        - PDF of parton i carrying a fraction of momentum x at scale M_x
    - NNPDF 3.1 NNLO produced last month, public arxiv:1706.00428
    - We want to both determine the PDFs and obtain a sensible estimate of their uncertainty
        - Uncertainties on input experimental data
        - Degenerate minima (+inefficiencies on the minimization)
        - Theory uncertainties (value of alpha_s, etc)
    - Not a well researched topic in ML
    - Constraints come from convolutions, and not so much data, only 4285 data points
        - Not really "big data", but still a complicated problem
        - 7 physical processes from 14 experiments over ~30 years
    - Compared to standard ML problem
        - Require statistically sound uncertainty estimate
        - Problem is regression but available data has complex dependence on PDFs
        - There are some physical constraints
    - NNPDF approach
        - Since we don't have constraints, we should have a very general model
        - Use a neural network, fully connected, two sigmoid hidden layers, one linear layer
        - 296 network parameters
    - Propagate experimental uncertainties by doing many fits with different fluctuations of data
        - Experiments give us covariance matrix
        - We sample data from experiments according to the covariance matrix
        - Statistics of PDFs calculated from the ensemble of PDF replicas
    - Fit PDFs using genetic algorithms, would really like to improve this
        - At each iteration, generate 80 mutants and select best mutant
        - Very easy to implement and understand, good dealing with complex analytic behaviour, doesn't require computing gradient
        - May not be close to global minimum, requires many function evaluations (convolutions), needs tuning
    - Closure tests
        - Assume the underlying PDF is known
        - Generate data, fluctuating around the prediction of the true PDF
        - Perform a fit and compare with assumed PDF
        - Check that the results are consistent
        - Various levels of closure test (slide 22)
    - No questions


Daniel Krefl: ML of CY volumes - Experimental Mathematics
    - Even in formal theory or mathematics, we are starting to use ML to our toolbox
    - The question - is the minimum volume a function of the geometry?
        - Calculate Vmin numerically (complex but possible)
        - Use ML to investigate the function
        - Can generate an infinite train and test set
    - Generated ~10k data points with 75/25 train/test split
        - From 100k data points, but remove related diagrams to ensure train/test are distinct)
    - First ansatz: linear regression
    - Second ansatz: using CNN, "deep and wide nets"
        - Question (Michele): why linear regression?
            - Daniel: stabilize the tails, problem is learning the tails of this distribution
            - Michele: don't you also lose some info
            - Daniel: yes, but here we need the tails under control and can sacrifice some info from center to do this
    - Found the minimum volume can be approximated from topological data via ML models
    - Concept can be applied to many other conjectured relations
    - "Experimental mathematics"
        - Discovery engine to find new relations (via statistical evidence)
    - Which subsequently should be made rigorous
        - Or maybe not, would an AI do math approximate/data-driven?
    - Question (Greame): inputs you had at top and bottom, same inputs?
        - Daniel: yes, looks like an autoencoder, but it's not
        - Fitting simultaneously both inputs, and they are the same inputs
        - Michele: how did you merge your branches?
        - Daniel: just concatenate them
        - Sergei: we're moving towards more automatic features
        - Daniel: This tries to combine both hand-crafted and automatic features
    - Question (Paul): would give a more in-depth talk next week, which event?
        - Daniel: String theory colloquium on Tuesday
        - Will be more in detail on the theory side rather than ML
        - https://indico.cern.ch/event/646694/
    - Question (Sergei): which tools?
        - Daniel: standard tools, python and keras based


Long-Gang Pang: EoS-meter of QCD transition from deep learning
    - Postponed to a future meeting due to Vidyo issues