IML meeting: July 5, 2016 Peak people on Vidyo: 24 Peak people in the room: 25 Lorenzo: Intro and news - Today's meeting dedicated to anomaly detection - Particularly important for monitoring data quality in experiments - Next meeting is August 25th, focusing on unsupervised learning - Link to the next meeting's agenda is in the slides James: Anomaly detection in ATLAS and RAMP at HSF - Different people have different views of anomaly detection - Automatic detection of events which are somehow different than the bulk of the data - Therefore these events are worth the attention/scrutiny of experts - Can be supervised: train to recognize specific anomalous cases - Can be semi-supervised: train on bulk without anomalies, strong relation to one-class classification - Can be unsupervised: automatically identify the bulk by some means and thus identify anomalies - Three kinds of anomalies - Point: a few points outside of the main bulk - Contextual: the point at which a given value is observed is anomalous, but otherwise could have been a sensible value - Collective: population-level differences - Useful for monitoring and detection of problems in several areas (DAQ/trigger, distributed computing, reco and DQ) - Physics analysis to look for unusual events (points) or collective behavior - Suppose you have two samples that are supposed to be statistically identical - Two MC samples, two different data runs, etc - How can you verify that A and B are identical? - Standard approach: overlay histograms of specific variables, look for differences - ML approach: train classifier to distinguish A from B, histogram the score, check difference - Results in one distribution to check rather than thousands - If classifier is able to separate A and B, then there must be a difference between the samples - RAMP = Rapid Analytics and Model Prototyping - Real-time data challenge, idea developed by Paris-Saclay - Participants in same room, or at least working in real time - Once submitted, code can be submitted and cloned by others (competitive-collaborative environment) - Makes use of IPython and jupyter notebooks - Anomaly detection RAMP at HSF - Around 30 participants - "Reference dataset" was subset of HiggsML dataset - "Distorted dataset" was distorted version of a different subset of HiggsML dataset - Chose a performance metric of the area under the ROC (AUC) - Leaderboard shows progression, and clear benefit from a new variable then picked up by other people - RAMP-style competitions can be very productive and lead to rapid developments - Switching topic, ATLAS activities fall into three main topics - DAQ: contextual anomaly detection in a time series - Distributed computing - Data quality monitoring and physics analysis (one class classification) - DAQ: NARX neural network trained to predict corridor for which the next point in a time series should be within - DC: something is going to be down at all points - Traditional approach: keep re-trying jobs until they work (inefficient, unpredictable delays) - ML could be applied to guide application of novel fault tolerance strategies - Determine when a retry is needed, and when it's just not going to work and should be moved to another site - Significantly reduce turnaround times for production - Joint WLCG demonstrator project with LHCb - DQ: two datasets that should be statistically compatible, but are they? - First approach is similar to what was done in the RAMP - Second approach is one-class classifier that only sees reference data (semi-supervised) - Second approach provides natural quantification of the degree of abnormality - An autoencoder was tested as an example of a simple one-class classifier - In the future, would also like to investigate use in early warning systems (strange events being picked out as they are reconstructed) - Question (room): question on narrow neural network, how is this related to RNN - James: I believe it's very similar, not the same, but strongly related - Question (Sergei): which software do you use for simple NN and auto-encoder? - James: sklearn and sk-learn addon, sknn - Question (Steven): what about expected changes between runs, such as different pileup levels? - James: would need to re-train with a new reference, no simple way - Steven: what about using something like a time-series, but a pileup series instead? - James: may be possible, but likely not enough time steps to be able to do this reliably Viktor: Anomaly detection in CMS data quality monitoring - Future upgrades to the detectors and LHC will make data classification/certification a much bigger problem - Significantly increased amount of data, but same number of people to look at it - Focus on CMS hadronic calorimeter, the number of channels is going to double soon - Method shown in last IML meeting on clusterization via statistical moments - Applied the method as described, focused on 1st and 2nd moments of the significance distribution - Selected a few runs that were previously identified as either good or bad - bad runs were classified correctly using these variables - Able to spot problems in HCAL occupancy distributions, normally done by shifters looking at plots - Also able to identify problematic run timing distributions - However, clusterization is luminosity dependent - Seen clearly when scaling up to include more runs, the low lumi runs and very short runs are outliers - In the future, if this is to be used to classify lumi, it must be done section by section (not run by run) - Should also be split for each HCAL sub-detector, not just the full detector (later: extend to all CMS systems) - Comment (Viktor): curious to see what variables were used by ATLAS in previous talk - James: plots shown were just toy examples, not yet at the point of selecting variables - Question (Lorenzo): You have a dependence with luminosity, is that correct? - Viktor: occupancy is different with luminosity - Question (room): did you base you results on histograms, or on clustering algorithms? - Viktor: two histograms, one reference, one you are trying to classify - Calculate a significance of the difference between the histograms - Then look if you can see a difference using that - Question (Steven): likely improved separation if you split by lumi-sections or similar to reduce pileup/etc dependence - Viktor: yes, need to look by lumi-section Maxim: Anomaly Detection and Yandex - Will cover both supervised anomaly detection at CMS and unsupervised anomaly detection at LHCb, starting with CMS - Goal is to let experts deal only with non-trivial cases - System learns to predict expert's response fo "good" vs "bad" labelled data - Most obvious cases are covered, ambiguous left to experts - System continuously learns from the experts - Divides data into three groups - Almost surely good, almost surely bad, and ambiguous sections - Three performance metrics: rejection rate, pollution rate, and loss rate - Sequential learning procedure defined with eight steps, slide 14 - Can save a lot of time with minimal or even zero loss/pollution rates - Feasible to save 50-85% of manual work under reasonable constraints - However, expert decision was chosen as a reference - Experts also have some intrinsic loss and pollution rates - Future work: - Additional features, increased robustness - Replace "good" and "bad" with specifications of the problematic sub-detector - Studies done with 2010 data, needs to be updated - Moving to LHCb studies of unsupervised learning - Monitoring different trigger streams - Try to separate different runs - Project is in initial phase, but correlation between reported problems and classifier quality have been observed - Question (Sergei): Would be interesting for those doing DQ monitoring to keep track of mis-classification rate - How often does the shifter do something wrong? - Once they have input from these methods, watch if this rate improves - Question (Viktor): how do you select the features to use? - Maxim: brief summary of feature extraction procedure on slide 7 - Some fixed-size features extracted for each event - Results in more than 1000 features, believe that each feature can provide increased quality - Question (Sergei): not managing anything detector level, this is more physics observables - Maxim: yes, correct, question is if we can identify defects using physical features Andrew: Multiple loss functions in TMVA - BDTlib, package focusing on multiple loss functions so the user can focus on the data that matters to them - Some visible improvements with different loss function choices that were not previously supported - Consensus to integrate this package into TMVA, starting to do this now - Plans to parallelize the BDTs in TMVA - Multiple benefits from this (training, evaluation, etc) - Question (Steven): when would we expect this to be in TMVA? - Andrew: in the next TMVA build, but just the different loss functions (parallelization will come later) - Sergei: that means later this summer