IML meeting: February 24, 2016
Peak people on Vidyo: 34
Peak people in the room: 22

Steven: Intro and news
    - Next "meeting" will be the IML workshop, March 20 to 22
    - See indico for details: https://indico.cern.ch/event/595059/overview

Andrew Carnes: Internally-Parallelized Boosted Decision Trees
    - Work done in TMVA
    - Debugged regression evaluation
        - Found a bug in the evaluation of regression predictions
        - 1 million events in 10 trees should take ~1s, but TMVA was taking ~15min
        - Found that the progress bar is being drawn every event (1 million times/second)
        - Fixed this progress bar bug, time reduced to 2 seconds, 460x speed gain
    - Provided nice intro to BDTs
    - New loss functions
        - Replaced hard coded loss function by an abstract class
        - Currently least squares, absolute deviation, and Huber loss functions
        - Validated all three, working as expected
        - Available as of ROOT v6.0.8
        - TMVA user guide updated, jupyter notebook available
    - Parallelization
        - Can't build boosted trees in parallel as each new tree depends on the previous trees
        - Instead, targeted the length of boosting and building a single tree
            - Looping over collection of training data is longest process
            - Broke up into chunks to run in parallel
        - Multiple processes that loop over all events to parallelize
        - Large gains in speed, exponential increase with #cores as expected
        - Reduction in time of about 1.6x for 4 cores, 2.6x for 16 cores
        - Some of the intensive processes couldn't be parallelized, so asymptotes to ~3x reduction
        - Parallelization will be added soon to a ROOT release
    - Question (Steven): progress bar, how often now?
        - Around 100 times in total now rather than once per event
    - Question (Steven): push_back and sizes, define a max size per thread and do direct access instead of push_back
        - Andrew: Yes, may be possible, good idea, but ran out of time
        - Danilo: one vector for each processing slot
        - Pay price of having a final merge, but N times as fast during fitting procedure
        - Andrew: thought about and valid solution, should be implemented, but my fellowship is ending
        - Danilo: p threaded object allows for different objects per thread, does transparently, merges at end
    - Question (Paul): It appears that with 1 thread the parallelized version is slower than the serial reference
        - Andrew: Yes, the parallelization brings some overhead which is not eliminated for 1 thread.
    - Question (Tobias): Should compare to other packages to see how parallelization performs
        - Andrew: Yes, good idea, should do this more in the future
    - Question (Sergei): Did a great job tackling things, but lots of room in the future
        - Would be good to outline what else can be done
        - Very nice area, good place to cross-check with other packages and see how they tackled it (if at all)

Andrew Lowe: Rapid development platforms for machine learning
    - Surveyed tools with point-and-click interfaces for rapid prototyping and testing of ideas
    - Talk purpose is to promote awareness of these tools
    - ROOT is dominant in HEP, but has criticisms
        - Complex algorithms are difficult to do interactively
        - End up with huge programs
        - Is this the best choice for prototyping?
    - Tried a series of tools taken from some surveys
    - H2O flow
        - Web interface, both python and R, like jupyter notebook
        - Can build models just by clicking buttons
        - Web interface screenshots in slides
        - Don't have too many models to build, but have most of the popular ones, especially extensive&fast deep learning implementation
    - RapidMiner
        - Market leader and most popular tool of its kind
        - basic and educational free, but limited to 1GB of data
        - Professional is expensive, same with training courses
        - Full-featured, very powerful, general purpose
        - Drag and drop canvas GUI to build models and define workflow
        - Graphical results/representations were a bit primitive compared to other options, especially given the price
    - KNIME
        - Second most popular tool of its kind
        - Open source unlike RapidMiner, but can purchase "extensions" for advanced capabilities
        - Also free community extensions
        - Similar drag and drop interface to RapidMiner
        - Presentations are also a bit crude, typically raw data is taken out and plotted somewhere else
    - Orange
        - Free and open source, not as full-featured as other tools
        - Lack of statistical functions
        - Number of interesting options currently under the prototype category
        - Again, drag and drop interface
        - Have a Spark interface, may be interesting to those wanting to try distributed computing
    - WEKA
        - Free and open source, powerful and versatile, large community support
        - Offers four interface options for data mining, knowledge flow interface from v3.6
        - Explorer is like ROOT TBrowser
        - Experimenter looks for statistically significant differences between datasets
        - Knowledge Flow is like the drag and drop interface of RapidMiner and KNIME
        - Auto-WEKA runs through large set of models and hyperparameters and tries to find best model for the data
        - Installing to first model built in about 30 min
    - Rattle GUI
        - Not as full-featured as other tools
        - Builds on top of R, loads separate R packages on request
        - No visual programming workflow like others
    - Lots of tools available
    - Suggested workflow:
        - Use rapid development platform to explore "ideas space" and do fast scouting
        - Then write a prototype in a high level language for final production version (plots/publication/etc)
    - Personal preference for WEKA, but subjective
    - Question (Sergei): Jupyter notebook integration should go part of the way supporting interactive rapid development
        - Pause/resume training, model building visual form, etc
        - So for those not aware, please try it out and give feedback
    - Question (Ilija): have you tried oracle data miner?
        - Andrew: no, not tried it out, wanted to focus on tools that are open source, free or freemium
        - Ilija: we have lots of data already in oracle, we have it for free, would be interesting to try out
    - Question (Ilija): do you have somewhere with all frameworks installed which we can use to test out?
        - Andrew: didn't have problems installing and running on my laptop, easy to setup
        - If you want to try RapidMiner, sign up for educational distribution
        - KNIME core easy to install, but extra components take a long time
        - WEKA make sure to use most recent stable version, by default cap on amount to be read in, but can be changed
        - All of them though are easy to install
    - Question (Sergei to Ilija): if your data is already in oracle, already have licence?
        - Ilija: all ATLAS ADC data is straight in oracle, same with rucio
        - We do have a licence for that, but nobody asks the oracle sysadmins to enable the tool
        - So we have this nice tool to use, but nobody is using it
        - Very much reminds me of RapidMiner, and already have all the data there

Joeri Hermans: Distributed Deep Learning using Apache Spark and Keras
    - Joeri cannot connect, contributions on the agenda
    - Sergei giving some info
    - Looking at parallelization of gradient descent, etc
    - Using APACHE Spark, Keras
    - Take a look when you can, contact Joeri with questions

Gerardo Gutierrez: Parallelization in Machine Learning with Multiple Processes
    - MPI current status and future
        - Standard of communication for HPC/grid computing
        - Several implementations: OpenMPI, MPICH, IBM, Intel, etc
        - Support for remote memory access, shared memory, etc
    - Implementation within ROOT
        - Trying to communicate ROOT objects through processes via MPI
        - Trying to adapt MPI with better design for ROOT
        - Implement TMVA algorithms in parallel
        - Several features already prototyped
        - Simpler than C implementation of MPI, can use any serializable object instead of only pre-defined types
        - Examples given for sending/receiving a TMatrixD
    - New architecture for TMVA parallelization
        - New TMVA base class has MPI, threads, spark, etc implementations
        - Some jupyter notebook examples: ParallelExecutor (MultiProc) and ParallelExecutorMpi (OpenMPI)
    - RootMPI is a modern interface for MPI that uses new C++ features and ROOT
    - TMVA is developing a modern architecture for multiple parallelization paradigms
    - Question (Paul): Is ROOTMpi used elsewhere, or only for TMVA?
        - Gerardo: for all ROOT, not just TMVA
        - Examples right now are in TMVA to compare with other TMVA parallelization efforts
        - Sergei: for looking towards HPC resources in the future
    - Question (Sergei): also have SPARK/etc for distributed computing, how would you motivate this vs others?
        - Gerardo: SPARK is not as directly oriented for HPC for resource handling compared to MPI
        - Need to make a proper comparison, but expect MPI to have better performance