IML meeting: February 24, 2016 Peak people on Vidyo: 34 Peak people in the room: 22 Steven: Intro and news - Next "meeting" will be the IML workshop, March 20 to 22 - See indico for details: https://indico.cern.ch/event/595059/overview Andrew Carnes: Internally-Parallelized Boosted Decision Trees - Work done in TMVA - Debugged regression evaluation - Found a bug in the evaluation of regression predictions - 1 million events in 10 trees should take ~1s, but TMVA was taking ~15min - Found that the progress bar is being drawn every event (1 million times/second) - Fixed this progress bar bug, time reduced to 2 seconds, 460x speed gain - Provided nice intro to BDTs - New loss functions - Replaced hard coded loss function by an abstract class - Currently least squares, absolute deviation, and Huber loss functions - Validated all three, working as expected - Available as of ROOT v6.0.8 - TMVA user guide updated, jupyter notebook available - Parallelization - Can't build boosted trees in parallel as each new tree depends on the previous trees - Instead, targeted the length of boosting and building a single tree - Looping over collection of training data is longest process - Broke up into chunks to run in parallel - Multiple processes that loop over all events to parallelize - Large gains in speed, exponential increase with #cores as expected - Reduction in time of about 1.6x for 4 cores, 2.6x for 16 cores - Some of the intensive processes couldn't be parallelized, so asymptotes to ~3x reduction - Parallelization will be added soon to a ROOT release - Question (Steven): progress bar, how often now? - Around 100 times in total now rather than once per event - Question (Steven): push_back and sizes, define a max size per thread and do direct access instead of push_back - Andrew: Yes, may be possible, good idea, but ran out of time - Danilo: one vector for each processing slot - Pay price of having a final merge, but N times as fast during fitting procedure - Andrew: thought about and valid solution, should be implemented, but my fellowship is ending - Danilo: p threaded object allows for different objects per thread, does transparently, merges at end - Question (Paul): It appears that with 1 thread the parallelized version is slower than the serial reference - Andrew: Yes, the parallelization brings some overhead which is not eliminated for 1 thread. - Question (Tobias): Should compare to other packages to see how parallelization performs - Andrew: Yes, good idea, should do this more in the future - Question (Sergei): Did a great job tackling things, but lots of room in the future - Would be good to outline what else can be done - Very nice area, good place to cross-check with other packages and see how they tackled it (if at all) Andrew Lowe: Rapid development platforms for machine learning - Surveyed tools with point-and-click interfaces for rapid prototyping and testing of ideas - Talk purpose is to promote awareness of these tools - ROOT is dominant in HEP, but has criticisms - Complex algorithms are difficult to do interactively - End up with huge programs - Is this the best choice for prototyping? - Tried a series of tools taken from some surveys - H2O flow - Web interface, both python and R, like jupyter notebook - Can build models just by clicking buttons - Web interface screenshots in slides - Don't have too many models to build, but have most of the popular ones, especially extensive&fast deep learning implementation - RapidMiner - Market leader and most popular tool of its kind - basic and educational free, but limited to 1GB of data - Professional is expensive, same with training courses - Full-featured, very powerful, general purpose - Drag and drop canvas GUI to build models and define workflow - Graphical results/representations were a bit primitive compared to other options, especially given the price - KNIME - Second most popular tool of its kind - Open source unlike RapidMiner, but can purchase "extensions" for advanced capabilities - Also free community extensions - Similar drag and drop interface to RapidMiner - Presentations are also a bit crude, typically raw data is taken out and plotted somewhere else - Orange - Free and open source, not as full-featured as other tools - Lack of statistical functions - Number of interesting options currently under the prototype category - Again, drag and drop interface - Have a Spark interface, may be interesting to those wanting to try distributed computing - WEKA - Free and open source, powerful and versatile, large community support - Offers four interface options for data mining, knowledge flow interface from v3.6 - Explorer is like ROOT TBrowser - Experimenter looks for statistically significant differences between datasets - Knowledge Flow is like the drag and drop interface of RapidMiner and KNIME - Auto-WEKA runs through large set of models and hyperparameters and tries to find best model for the data - Installing to first model built in about 30 min - Rattle GUI - Not as full-featured as other tools - Builds on top of R, loads separate R packages on request - No visual programming workflow like others - Lots of tools available - Suggested workflow: - Use rapid development platform to explore "ideas space" and do fast scouting - Then write a prototype in a high level language for final production version (plots/publication/etc) - Personal preference for WEKA, but subjective - Question (Sergei): Jupyter notebook integration should go part of the way supporting interactive rapid development - Pause/resume training, model building visual form, etc - So for those not aware, please try it out and give feedback - Question (Ilija): have you tried oracle data miner? - Andrew: no, not tried it out, wanted to focus on tools that are open source, free or freemium - Ilija: we have lots of data already in oracle, we have it for free, would be interesting to try out - Question (Ilija): do you have somewhere with all frameworks installed which we can use to test out? - Andrew: didn't have problems installing and running on my laptop, easy to setup - If you want to try RapidMiner, sign up for educational distribution - KNIME core easy to install, but extra components take a long time - WEKA make sure to use most recent stable version, by default cap on amount to be read in, but can be changed - All of them though are easy to install - Question (Sergei to Ilija): if your data is already in oracle, already have licence? - Ilija: all ATLAS ADC data is straight in oracle, same with rucio - We do have a licence for that, but nobody asks the oracle sysadmins to enable the tool - So we have this nice tool to use, but nobody is using it - Very much reminds me of RapidMiner, and already have all the data there Joeri Hermans: Distributed Deep Learning using Apache Spark and Keras - Joeri cannot connect, contributions on the agenda - Sergei giving some info - Looking at parallelization of gradient descent, etc - Using APACHE Spark, Keras - Take a look when you can, contact Joeri with questions Gerardo Gutierrez: Parallelization in Machine Learning with Multiple Processes - MPI current status and future - Standard of communication for HPC/grid computing - Several implementations: OpenMPI, MPICH, IBM, Intel, etc - Support for remote memory access, shared memory, etc - Implementation within ROOT - Trying to communicate ROOT objects through processes via MPI - Trying to adapt MPI with better design for ROOT - Implement TMVA algorithms in parallel - Several features already prototyped - Simpler than C implementation of MPI, can use any serializable object instead of only pre-defined types - Examples given for sending/receiving a TMatrixD - New architecture for TMVA parallelization - New TMVA base class has MPI, threads, spark, etc implementations - Some jupyter notebook examples: ParallelExecutor (MultiProc) and ParallelExecutorMpi (OpenMPI) - RootMPI is a modern interface for MPI that uses new C++ features and ROOT - TMVA is developing a modern architecture for multiple parallelization paradigms - Question (Paul): Is ROOTMpi used elsewhere, or only for TMVA? - Gerardo: for all ROOT, not just TMVA - Examples right now are in TMVA to compare with other TMVA parallelization efforts - Sergei: for looking towards HPC resources in the future - Question (Sergei): also have SPARK/etc for distributed computing, how would you motivate this vs others? - Gerardo: SPARK is not as directly oriented for HPC for resource handling compared to MPI - Need to make a proper comparison, but expect MPI to have better performance