IML meeting: April 28, 2016 Peak people on Vidyo: 30 Peak people in the room: 21 Michele: Intro and news - Several upcoming events, please see the slides for details - Summarised feedback from the workshop, thanks to everyone who provided feedback - Next meeting will be May 24, probably in 4-3-006 at CERN (to be confirmed) - See indico for details: https://indico.cern.ch/event/631610/ Michael Andrews: IML Challenge 2017 - Runner-up presentation - Small number of variables were provided - Needed to engineer new features - Wanted to take energy corresponding to different parts of the detector - Looked at scaling of variables with respect to jet pT - If quantities are correlated, dividing them can make them look flat, so also multiplied them - Inputs now span many orders of magnitude, so applied preprocessing - Was training on macbook and lxplus, so had to shrink network given challenge time constraints - Several small tweaks, such as - Found sigmoid worked better in this case than ReLU - Used glorot_normal for initialization instead of glorot_uniform to speed up training - Classifier is fairly confident of when it has a quark, but not so sure about gluons - Ended up with a score of 0.7921, winning score was 0.795 - Question (Pooja): what motivated you to use NN rather than other classifiers? - Michael: For these types of problems where you have such low-level features, neural nets and SVMs work well - I was more familiar with NNs, so that's what I chose - Pooja: A SVM or random forest would also be potentially powerful - Michael: we had 48 hours on a work-day to do this, so we had to go with first instincts - Michele: goes back to our feedback, make it longer next time - Question (Steven): sigmoid performing better than ReLU, is it due to smaller number of neurons? - Michael: difference is ~1% between the two in a sub-set of ~20k units, so only tested on that, not full dataset - Sigmoid better for inputs between -1 and 1, ReLU is line that goes indefinitely - Should be checked on full dataset - Andre: yes, likely you need more non-linearity before the ReLU is needed - Sergei: now that you have more time, it would be interesting to see what happens if you let it run with more parameters/etc - Michele (to all): please keep playing with the dataset, it's still there - Steven (to all): if you make significant progress, let us know and we would be happy to have a talk on what you learned Eric Aquaronne: Power and ML, IBM - IBM decided to join community efforts - Big example is when Microsoft sued Linux, and IBM gave 500 patents to Linux to break the lawsuit - IBM is a major player in UNIX, but is the underdog in Linux compared to Intel - Then, Google called, wanting to work together BUT IBM needed to convert their chip to handle x86 addressing mode (little endian instead of big endian) - OpenPower is the byproduct of this - Can have 8 threads on a single processor - Worked with NVIDIA, compiled their bus interface into our hardware (NVLink) - 80GB/s rather than normal 16GB/s, comparison shown for the gains - Long-term development, POWER6 started in 2007, on the 4th generation of chips - Now the design is open and collaborative, rather than being a single product where every part is made by IBM - A big benefit from these changes is that it's now much easier for people to use - Now many people are using IBM without knowing it (hidden under a different "brand") - Created PowerAI as an Enterprise-level deep learning distribution - Scales from a machine to clusters to the cloud - PowerAI has large commercial sector usage, including small clusters - Benchmarks compared to Intel, large gains - Moving in the direction of having dedicated ML systems - If you do ML, you should read HPC journals, getting very close together - Question (Steven): do you see this becoming possible at the personal computer scale? - This is typically ~600W so it's a bit high right now, but nothing preventing it from happening - As it's all open now, someone could go and make a laptop right now - IBM won't personally do this, but the open group may do so - Question (Andre): CMS is looking at some power systems for reco - Saw slight differences in the extreme tail for Intel vs AMD systems - We optimize everything, curious to know what the cost is in flops/$ - Eric: openPower systems are cheaper and faster by far for standard high-end systems - Old paradigm of IBM systems being more expensive is dead, new model is open and cheaper - By killing farm of Intel, and replacing with Power, got 66% less floor usage and more compact space constraints - Question (Sergei): this is more of an academic group, is there some way members of this group can evaluate resources/hardware without buying? - Especially working with the software - Eric: You can use directly with power.jarvice.com for testing (buy time on an open cloud) - Sergei: fantastic that you can rent it, but is it possible to try/trial it without paying? - Lionel: this is coming, IBM is building a system in Europe that people can use to play with - Eric: contact me for link to jupyter notebook Daniel Smith: Generative adversarial networks in Liquid Argon - Project is to develop methods to make MC-trained networks perform as well on data as they have already been shown to perform on MC - Several examples shown of MC classification performance demonstrating track-like and EM-like separation - Test beam experiment, so we know what particles are entering the TPC in data, can validate classification - Ran blind MC-trained classifier on the data, track performance much worse in data than MC - Simulation software of LArTPC software is new, so assumptions were also made - Differences can affect the network in unpredictable ways - In reality, electron will deposit charge on every wire, but in MC it was only on the two closest wires - Proposal: modified GAN to modify the MC set to create a data-driven filter for MC - Chose an architecture which makes it very quick to train - Last layer of the generator is a merge layer that takes original MC and sums output of generator to it - Done to make generator never get too far away from the MC being passed in - Problems encountered and causes are listed on the slides - With filtered MC, network is performing much closer to what is seen in MC samples - GAN is proving very effective in filtering MC to be more similar to data - Need to optimize the training and explore different applications for the filter - Question (Fernanda): in this process of re-filtering MC and re-training, what have you learned about other shortcomings of MC? - Daniel: so far, all big things I ran into are in this presentation - Issues left out relate to training the GAN (stability, etc) - Not found anything groundbreaking about differences between MC and data yet - Big goal is to figure out what the differences are, just don't have anything yet - Question (Eric): did you run this on your laptop? - Daniel: No, ran it on a CERN computer, not sure what type - Eric: Data size/training times - Daniel: On the order of 100GB to make the sample, about 2 million patches - Training is about 30 seconds per epoch, so takes ~3 hours to train - Designed the network to be intentionally small to make it not take too long to train (try a few during a day) - Question (Room): when you have a classification of showers and tracks, what do you do with that info? - Daniel: in the reco process of LArTPCs, we do hit finding, then track finding, try to build up from there - Right now process involves blindly following hits in event, then blindly trying to connect to make tracks - So right now it will use hits within showers, which is a big waste of time - This classifier would find groups of hits in showers and remove them from the track finding - This would go into the reco chain and be a very helpful way of labelling showers before the next reco steps - Question (Michael): how do you know data is track- or shower-like, do you do by eye? - Daniel: Had to eye scan as the methods to do the separation don't exist yet - Michael: is it possible to train on the data directly? - Daniel: Yes, wouldn't be as much fun, and want these systems ready/available when production starts (don't need to collect data first) - Robert Sulej: in this test beam data we can separate easily, but in full data it's a complex mix so we couldn't easily train with that Andrew Lowe: Language-agnostic data analysis workflows and reproducible research - When doing ML, lots of questions about what language/toolkit is best - Be a polyglot - Lots of code examples in the slides of translating between languages - Some notebooks/etc which support using code chunks in different languages - Store output of one step in a format that can be read by the next step - Whether or not your experiment imposes restrictions on public data/etc, your colleagues will benefit from reproducible research paradigms - Frameworks for integrating reproducible research into articles presented - Lots of examples for R-markdown in particular - ROOT is now possible to be used as a code execution engine (currently proof-of-concept) - Key messages - Already possible to write reproducible analysis in your favourite languages - Can mix and match programming languages - Ways to exchange data between code changes of different languages - Can embed ROOT code in a reproducible analysis - Question (Michele): how do you version control with this? - Andrew: Advantage of R-markdown is that it's plain text, disadvantage is you need to render it to see plots/figures - Means that it plays with version control better than jupyter/similar notebooks - Whatever your ML framework is, you can mix and match - If you want to do something in pieces, or compare different options, can do that without breaking out of the existing workflow Ford Garberson: Using machine learning to screen for autism and other childhood ailments - Previously ATLAS experiment, left a few years ago for a data science experiment - A few years later, had a child, began thinking about the future - Autism affects 2-3% of children, can be very severe or mild - For children of scientists, can be 2 or 3 times higher than this - How can we identify autism as early as possible to start treatment early - Skilled specialist can diagnose at ~1.5 years old - Usually diagnosed much later due to limited access to specialists, especially in rural areas - Goal is to create an app backed by ML to identify those of highest risk quick - Two flagship instruments for autism identification - ADI-R: 93 multiple choice questions, filled by expert clinician after a few hours with the parent - ADOS: ~30 multiple choice questions, filled by expert after a highly standardized ~hour with the child - Can we capture some of the benefits of these instruments in an app? - Have the app ask simplified questions directly - Ask the parent to record their child for ~1 minute and upload the video - Where ML comes in: such as identify most important questions and how relevant they are - However, ML doesn't perform as well in real life - Some confounding factors - Parents may not understand questions or may give biased responses - Video questionnaire watchers are not as trained as clinicians, and it's short and often doesn't show symptoms clearly - Other competitors exist, but this is the first group trying ML, and it's doing the best - Even larger separation from competition if we allow for an "inconclusive" classification of up to 25% - Lots of other potential tools to add - ML on photos of child's expressions - ML on audio of child's voice - ML on motion of child's movements - ML on data about how child plays tablet games - While ~2% of kids have autism, ~13% have some diagnosable psychological condition, could expand to other conditions - Small team, two data scientists so far - Hiring a data scientist now, aiming at more senior people - If the position is filled, still contact Ford, intend to open more positions in the future - Can try the app: https://cognoa.com/parents/dtc - Question (Steven): what types of ML tools/etc do you use? - Ford: not tried anything too fancy so far - Competition is just summing number of questions with a given answer - Getting a real training event costs thousands of dollars, have to extract as much info as possible from limited data - Not too much benefit to going to deep learning or similar - Looking at random forests and BDTs at the moment - Much larger dataset now (hundreds of thousands of people have used the app) - However, don't usually know if the app user ends up having autism or not