IML meeting: January 26, 2018 Peak people on Vidyo: 56 Peak people in the room: 23 Steven: Intro and news - Markus Stoye is the new CMS coordinator - New ALICE coordinator will be announced soon - IML annual workshop - Core workshop: April 9-11 - Full-day hackathon: April 12 - Call for abstracts is now open, due March 12 - Call for hackathon project proposals will open soon - List of common questions and answers in the slides - Next IML monthly meeting is February 28 on software+infrastructure EP-IT data science seminar before the regular meeting Soumith Chintala (Facebook): Automatic Differentiation and Deep Learning recording at http://cds.cern.ch/record/2302087 Andrea Valassi: ROC curves, AUCs, and alternatives in HEP event selection and in other domains - Got interested from LHCb challenge, where winner maximises the area under a ROC curve - Now trying to understand what AUC really is, and why it is used in other domains - Binary classifiers: true positive TP, false positive FP, true negative TN, false negative FN - Different domains have different focus and terminology on these four cases - Prevalence = S/(S+B) - ROC and PRC (precision recall) curves - ROC doesn't depend on prevalance, while PRC changes dramatically - Domain-specific challenges 1. Qualitative imbalance 2. Quantitative imbalance 3. Prevalence known? Time invariance? 4. Dimensionality? Scale invariance? 5. Ranking? Binning? - Medical diagnostics: optimise diagnostic accuracy - Different people may want to optimise different things - Most popular metric = (TP+TN)/(TP+TN+FP+FN) - Catch: super rare disease, 1 in a million, if I get it then the result doesn't change - For this reason, moved to ROC curve - AUC interpretation: probability that test result of random sick subject indicates greater suspicion than random healthy subject - Found further limitation that ROC isn't great for highly imbalanced datasets, moved to PRC - Active area of research, ROC AUC not always the best choice for medical diagnostics - Information retrieval: distinction between relevant and non-relevant documents - Metrics evaluate classifiers based on the PRC - HEP example - To minimize stat errors on cross-section measurement, maximize efficiency times purity - Eff*purity is qualitatively relevant, numerically nice, and is correct for cross-sections - However, it's not perfect for all situations, different cases want different metrics - In this type of case, AUC is irrelevant - Binary classifiers in HEP - Event reconstruction --> (software) trigger --> physics analysis - TN are relevant for reco, but not trigger and physics analysis - TN enters definition of ROC and AUC, not good/relevant for trigger+analysis - Physics is the only one of the three fields that uses binning (local efficiencies instead of global) - Huge amount of additional details in the slides on different metrics - Summary - Different disciplines/problems have different challenges, motivating different metrics - Most relevant metrics in HEP event selection are purity and signal efficiency - AUCs may not be the optimal choice (often not) - For every problem, we should identify the optimal metric - Comment (Steven): At least in my cases, we may use ROC, but not typically AUC - Andrea: for particle ID or event selection? - Steven: I personally mostly do particle ID - Andrea: in that case ROC without AUC may make sense - Paul: to be truly optimal we need to know lots of things on the right of slide 17 - for practical purposes, great advantage of the AUC, something that is computable fast - Andrea: started with Kaggle challenge and got confused - If you train people that the AUC is right, they will use it without thinking - It's a mind change, people will otherwise do things even if it doesn't make sense - Question (Vitaliano Ciulli): slide 20, statistical idea, or really viable approach - Andrea: Really viable approach - You have a matrix element that depends on a parameter, can take the derivative - Can try to train an ML variable to reproduce that to see how this is spread in space - It can be done, when you have many dimensions of MC you try different variables - Try to map this to space distribution and do some calculations - More on the level of crazy ideas - Question (Joosep Pata): true negative irrelevant, if we use same MC samples, then true negative well defined - Background is always the same - Andrea: Of course when using a sample, number is well defined - Take the trigger, if your numbers here are what comes out of L0, important thing is L1 rate - L1 rate must be same irrespective of L0 rate - When you do your physics publications, in the end it's only the TP, FP, FN that count - Joosep: when using data, yes. When using MC for high-level Higgs analysis, it's not clear - Andrea: Interesting to try to think which are the variables you really need - In some cases, not exactly the same - Sometimes efficiency*purity, sometimes absolute scale, sometimes absolute # background selected (trigger rate) - Dimensionality of the thing is always usually 2 or 3 - It's the true negative which is almost always there (the missing one of four) Daniel Krefl: Riemann-Theta Boltzmann machine - arXiv:1712.07581 - Boltzmann machine (BM) - Two-part system, hidden and visible sectors that are arbitrarily connected - Often binary valued states - In computer science, called energy-based models, as this is a statistical mechanics system - Probability of the system to be in a specific state is given by the Boltzmann distribution - Practically, not feasible, for applications only restricted BMs (RBMs) have been considered - Removes the self-couplings - If the self-couplings could be included, machines would be more powerful - Could model non-trivial covariances of the system - Change the domain of state values - One set continuous, the other quantized - With some algebra, this can be calculated using a Riemann-Theta function - Still an infinite sum, but you can mathematically prove that for given precision you only need to sum a finite number of terms - You can thus evaluate the partition function efficiently - Gradients can also be calculated analytically - Use theta function for neural network activation --> theta neural network - Each node learns its own activation function to model the system - This means you can learn using much smaller networks, as part of info goes into activation function itself - However, evaluation of the Riemann-Theta function is expensive (but is practically possible) - Wrote new framework as they were really changing the basic building blocks - riemann.ai/theta - Very easy interface, inspired by keras - SGD and genetic optimizer out of the box - Easy to extend functionality (object oriented) - Currently CPU based, but working on GPU and FPGA support, and a better math back-end is in progress - Expected speedup will bring large-scale applications into reach - Quick announcement of workshop from April 30 to May 4 in Hainan, China - Local costs covered, just need to buy the plane ticket - Question (Steven): you said that the evaluation was more expensive, roughly how much? - Daniel: roughly factor of 10, but we think we can get most of this with the coming developments - Question (Michela Paganini): do you believe these equations could be implemented in pytorch instead of building new one - Daniel: didn't look into pytorch as it wasn't as popular at the time - I would need to look into how deep you can go into the system - Here speed is critical, need to have activation functions running in C - If the framework is there to define custom activation functions in speedy way, can do this stuff in that framework - Our motivation was we wanted something where we had full control for R&D at the moment - If we decide to make this bigger, may want to move to something else - If we move this to another group, distribution is a problem, need to be big enough so that it is distributed with the main branch - Question (Hossein Afsharnia): is this influenced by the distribution that we have over the input data - Daniel: have to distinguish between two things, BMs and TNNs derived from BMs - BMs are a device to learn underlying input density of data - TNN in turn is a new neural network layer with a new activation function - One sector which is visible is continuous, hidden sector encodes this state in a quantized space - So the input itself is continuous - Question (Anton Poluektov): work as good feature detector, but how does training work? - Daniel: train in two steps - Learn probability density of each part of the picture in the first step - Then generate density for the feature vector for the second step Savannah Thais: NIPS 2017 summary (HEP perspective) - Largest ML conference, this year ~8000 participants (up from ~5000 last year) - 800 accepted papers, 53 workshops, 9 tutorials - Deep learning for physical sciences workshop - 30 accepted papers, 5 invited talks, 6 contributed talks - Deep topology classifiers for a more efficient trigger at the LHC - Sequence of PFlow candidates fed into RNN, then process with LSTM or FRU - Images processed with a CNN - Initially to select ttbar events (currently dominated by W+jet and QCD with single-lepton trigger) - Electromagnetic shower classification using a DenseNet - Outperforms other feature and cell based classifications - Particle classification, energy regression, and simulation - DNN for classification using flattened cell information - Energy reconstruction using a CNN - Basic GAN for electromagnetic calorimeter generating 3D energy arrays - Lots of jet contributions - Tips and tricks for training GANs with physics constraints - Nice summary of common issues when using GANs - DeepJet: generic physics object based multiclass classification - CNN jet classifier using particle candidate features - Neural message passing for jet physics - Graph embedding of jets, outperforms previously studied RNN embedding - Adversarial learning to eliminate systematic errors - Use adversarial learning to reduce systematics - Data augmentation, pivot adversarial network, and tangent propagation - DA and pivot outperform baseline, tangent does not - Particle track reconstruction with deep learning - Image-based approach to track reconstruction - RNN with individual layers of detector and CNN with 3D image of full detector - Also looked at point-based ML with RNN to predict spacepoint in next layer - A few simulation and modelling contributions - Improvements to inference compilation for probabilistic programming - Interfaces with existing scientific simulations - Example of interface with SHERPA - Graph memory networks for molecular activity prediction - Interesting RNN structure to model molecular behaviour - Standard RNN connected to matrix RNN (external memory) - Nanophotonic particle simulation and inverse design - Use NN to produce a range of measurements - Can also run the network backwards to design materials for a desired spectrum - Physics-influenced ML: can incorporate QFT information into limiting models the algorithm will learn - How can physics inform deep learning methods - Towards a hybrid approach to physical process modelling - Good attendance from both scientists and ML experts - HEP very well represented, half of organizers from CERN - A lot of our LHC contributions use toy datasets - Good to see what is feasible, but ultimately want to actually use the techniques in our experiments - A lot of work right now is classification problems, some with simulation - Lots of other interesting work being done in related fields that we can learn from - Predictive neural networks - RNN where the basic block is a 3-mode tensor that computes combination of two input vectors - Show promise: much better accuracy in shorter amount of time than standard RNNs - Interesting symposium on explainable machine learning - Hosted debate, and also a challenge - This is particularly important to HEP - Lots of other interesting symposiums and workshops occurred, a good selection are listed in the slides - Question (Hossein Afsharnia): kind of reinforcement learning, slide 14? - Savannah: Trying to combine insight on standard NNs with reinforcement representations, hybrid of the two Lars Varming Joergensen: Spes Spirae - Try to measure motor symptoms of Parkinson's disease - Second most common neurodegenerative disease (after Alzheimers) - Mostly affects older people, about 1% of people over 65 - Some get it at a younger age, I was diagnose 1.5 years ago - After diagnosed, was surfing the internet and trying to measure the symptoms of Parkinson's - Motor symptoms, nothing unseen - No cure, medicine is for the symptoms, up to the patient what kind of medicine you want - Want to understand these symptoms better - Tremors - Rigidity of movement - Slowness of movement - Postural instability - Sometimes sleeping problems and constipation, harder to measure - Cause of disease: lack of dopamine of parts of the brain responsible for movement --> cells die - By the time you are diagnosed, usually 70-80% of the dopamine producing neurons already gone - Can't give replacement dopamine as it can't pass through the blood-brain barrier - When started discussions with head of neurology department, question was "why do you think you can do better now" - Technology has changed, much easier and more comfortable to wear things 24 hours a day - Hardware - Much better battery lifetime, makes possible to measure essentially 24/7 - Very sensitive sensor package, as sensors are much better now - Simple design, preferably aesthetically pleasing - E-ink screen and buzzers to alert the patient to when to take pills or exercise (or when battery low) - Two buttons (green and red) allow to make possible to delay pill taking or exercises if moment is good or not - Requirements - Should last 10-14 days between charging - All electronics must have low power consumption (no wifi/bluetooth) - Data downloaded from watch during charging - Slow download because of low power consumption memory - Download will probably take about 2 hours (5GB), but recharging will take a few hours too so it's ok - Now that we have the data, need to ensure the patient has benefits - See their own data, are symptoms getting worse - Need two databases: one anonymised, one for the neurologist or patient - You can also use this to see if the new drug is actually working - One of the main reasons to buy this in the first place - Hopefully the fact that the patient gets a better treatment, the health insurance company may pay for the device - Question of money - Retail probably 50-100 euros - This is really cheap, when I started treatment, a month of medicine was 98 euros, in USA same thing was $360 - In Europe, price has since gone down to 65 euros, in US it's gone to $640/month - Patients in USA are therefore not taking drugs as they can't afford them - If device that costs 50-100 euro and can help get you the right drugs, it's not expensive - Health insurance company might be interested - Big payoff for us is the anonymised database - Ask about medical history (medical surgeries, history of Parkinson's in family, etc) - Hopefully we could then unleash ML on this to try to extract new relationships - Could identify better progression markers for Parkinson's, maybe even help to find a cure - 5-10 million patients in the world, what if 100k patients were wearing our watch - Next steps - Want to build first 20-40 prototypes - Check how well all of the software works - Likely will take about a year, then go into mass production - Might be able to use CERN IT infrastructure for the trial period - Question (Hossein Afsharnia): working on the data that you have on this device - Is there any group working on this now? - Lars: Not that I am aware of - Hossein: I only hear about this data now, is it open source, can we work on it - Can we use it as a kind of pattern, which could be useful for future work on your advice - Lars: Idea of database is that it should be open source - Don't want to make it a priority that one group are the only ones who can touch the data - The more we can extract from the data, the better for the patients - Question (Anton Poluektov): one of obvious solutions is release an app for apple watch - Lars: problem with that is the app takes some power as it runs continuously - It will drain the watch much too quickly - We have looked at this, at different watches, if it was possible - If we really want to do this, it has to be dedicated - Pebble, originally charge every 10 days, after app it was every 2 days - Anton: really taking 26 Hz continuously, then 5GB raw - Lars: yes, exactly - Paul: wearable, how you move at night, sounds like fitbit - Lars: completely closed off environment, don't let anything other than their software on it - Paul: yes, I mean you are on the right track, such a product can be accepted by the customer - Lars: someone told me last week that we should talk to Swatch, they are used to making things look nice