IML meeting: April 28, 2016
Peak people on Vidyo: 30
Peak people in the room: 21

Michele: Intro and news
    - Several upcoming events, please see the slides for details
    - Summarised feedback from the workshop, thanks to everyone who provided feedback
    - Next meeting will be May 24, probably in 4-3-006 at CERN (to be confirmed)
    - See indico for details: https://indico.cern.ch/event/631610/


Michael Andrews: IML Challenge 2017 - Runner-up presentation
    - Small number of variables were provided
    - Needed to engineer new features
        - Wanted to take energy corresponding to different parts of the detector
        - Looked at scaling of variables with respect to jet pT
        - If quantities are correlated, dividing them can make them look flat, so also multiplied them
    - Inputs now span many orders of magnitude, so applied preprocessing
    - Was training on macbook and lxplus, so had to shrink network given challenge time constraints
    - Several small tweaks, such as
        - Found sigmoid worked better in this case than ReLU
        - Used glorot_normal for initialization instead of glorot_uniform to speed up training
    - Classifier is fairly confident of when it has a quark, but not so sure about gluons
    - Ended up with a score of 0.7921, winning score was 0.795
    - Question (Pooja): what motivated you to use NN rather than other classifiers?
        - Michael: For these types of problems where you have such low-level features, neural nets and SVMs work well
        - I was more familiar with NNs, so that's what I chose
        - Pooja: A SVM or random forest would also be potentially powerful
        - Michael: we had 48 hours on a work-day to do this, so we had to go with first instincts
        - Michele: goes back to our feedback, make it longer next time
    - Question (Steven): sigmoid performing better than ReLU, is it due to smaller number of neurons?
        - Michael: difference is ~1% between the two in a sub-set of ~20k units, so only tested on that, not full dataset
        - Sigmoid better for inputs between -1 and 1, ReLU is line that goes indefinitely
        - Should be checked on full dataset
        - Andre: yes, likely you need more non-linearity before the ReLU is needed
        - Sergei: now that you have more time, it would be interesting to see what happens if you let it run with more parameters/etc
        - Michele (to all): please keep playing with the dataset, it's still there
        - Steven (to all): if you make significant progress, let us know and we would be happy to have a talk on what you learned


Eric Aquaronne: Power and ML, IBM
    - IBM decided to join community efforts
    - Big example is when Microsoft sued Linux, and IBM gave 500 patents to Linux to break the lawsuit
    - IBM is a major player in UNIX, but is the underdog in Linux compared to Intel
    - Then, Google called, wanting to work together BUT IBM needed to convert their chip to handle x86 addressing mode (little endian instead of big endian)
    - OpenPower is the byproduct of this
        - Can have 8 threads on a single processor
        - Worked with NVIDIA, compiled their bus interface into our hardware (NVLink)
        - 80GB/s rather than normal 16GB/s, comparison shown for the gains
    - Long-term development, POWER6 started in 2007, on the 4th generation of chips
    - Now the design is open and collaborative, rather than being a single product where every part is made by IBM
    - A big benefit from these changes is that it's now much easier for people to use
    - Now many people are using IBM without knowing it (hidden under a different "brand")
    - Created PowerAI as an Enterprise-level deep learning distribution
    - Scales from a machine to clusters to the cloud
    - PowerAI has large commercial sector usage, including small clusters
    - Benchmarks compared to Intel, large gains
    - Moving in the direction of having dedicated ML systems
    - If you do ML, you should read HPC journals, getting very close together
    - Question (Steven): do you see this becoming possible at the personal computer scale?
        - This is typically ~600W so it's a bit high right now, but nothing preventing it from happening
        - As it's all open now, someone could go and make a laptop right now
        - IBM won't personally do this, but the open group may do so
    - Question (Andre): CMS is looking at some power systems for reco
        - Saw slight differences in the extreme tail for Intel vs AMD systems
        - We optimize everything, curious to know what the cost is in flops/$
        - Eric: openPower systems are cheaper and faster by far for standard high-end systems
        - Old paradigm of IBM systems being more expensive is dead, new model is open and cheaper
        - By killing farm of Intel, and replacing with Power, got 66% less floor usage and more compact space constraints
    - Question (Sergei): this is more of an academic group, is there some way members of this group can evaluate resources/hardware without buying?
        - Especially working with the software
        - Eric: You can use directly with power.jarvice.com for testing (buy time on an open cloud)
        - Sergei: fantastic that you can rent it, but is it possible to try/trial it without paying?
        - Lionel: this is coming, IBM is building a system in Europe that people can use to play with
        - Eric: contact me for link to jupyter notebook


Daniel Smith: Generative adversarial networks in Liquid Argon
    - Project is to develop methods to make MC-trained networks perform as well on data as they have already been shown to perform on MC
    - Several examples shown of MC classification performance demonstrating track-like and EM-like separation
    - Test beam experiment, so we know what particles are entering the TPC in data, can validate classification
    - Ran blind MC-trained classifier on the data, track performance much worse in data than MC
    - Simulation software of LArTPC software is new, so assumptions were also made
    - Differences can affect the network in unpredictable ways
        - In reality, electron will deposit charge on every wire, but in MC it was only on the two closest wires
    - Proposal: modified GAN to modify the MC set to create a data-driven filter for MC
        - Chose an architecture which makes it very quick to train
        - Last layer of the generator is a merge layer that takes original MC and sums output of generator to it
        - Done to make generator never get too far away from the MC being passed in
    - Problems encountered and causes are listed on the slides
    - With filtered MC, network is performing much closer to what is seen in MC samples
    - GAN is proving very effective in filtering MC to be more similar to data
    - Need to optimize the training and explore different applications for the filter
    - Question (Fernanda): in this process of re-filtering MC and re-training, what have you learned about other shortcomings of MC?
        - Daniel: so far, all big things I ran into are in this presentation
        - Issues left out relate to training the GAN (stability, etc)
        - Not found anything groundbreaking about differences between MC and data yet
        - Big goal is to figure out what the differences are, just don't have anything yet
    - Question (Eric): did you run this on your laptop?
        - Daniel: No, ran it on a CERN computer, not sure what type
        - Eric: Data size/training times
        - Daniel: On the order of 100GB to make the sample, about 2 million patches
        - Training is about 30 seconds per epoch, so takes ~3 hours to train
        - Designed the network to be intentionally small to make it not take too long to train (try a few during a day)
    - Question (Room): when you have a classification of showers and tracks, what do you do with that info?
        - Daniel: in the reco process of LArTPCs, we do hit finding, then track finding, try to build up from there
        - Right now process involves blindly following hits in event, then blindly trying to connect to make tracks
        - So right now it will use hits within showers, which is a big waste of time
        - This classifier would find groups of hits in showers and remove them from the track finding
        - This would go into the reco chain and be a very helpful way of labelling showers before the next reco steps
    - Question (Michael): how do you know data is track- or shower-like, do you do by eye?
        - Daniel: Had to eye scan as the methods to do the separation don't exist yet
        - Michael: is it possible to train on the data directly?
        - Daniel: Yes, wouldn't be as much fun, and want these systems ready/available when production starts (don't need to collect data first)
        - Robert Sulej: in this test beam data we can separate easily, but in full data it's a complex mix so we couldn't easily train with that


Andrew Lowe: Language-agnostic data analysis workflows and reproducible research
    - When doing ML, lots of questions about what language/toolkit is best
        - Be a polyglot
    - Lots of code examples in the slides of translating between languages
    - Some notebooks/etc which support using code chunks in different languages
        - Store output of one step in a format that can be read by the next step
    - Whether or not your experiment imposes restrictions on public data/etc, your colleagues will benefit from reproducible research paradigms
    - Frameworks for integrating reproducible research into articles presented
        - Lots of examples for R-markdown in particular
    - ROOT is now possible to be used as a code execution engine (currently proof-of-concept)
    - Key messages
        - Already possible to write reproducible analysis in your favourite languages
        - Can mix and match programming languages
        - Ways to exchange data between code changes of different languages
        - Can embed ROOT code in a reproducible analysis
    - Question (Michele): how do you version control with this?
        - Andrew: Advantage of R-markdown is that it's plain text, disadvantage is you need to render it to see plots/figures
        - Means that it plays with version control better than jupyter/similar notebooks
        - Whatever your ML framework is, you can mix and match
        - If you want to do something in pieces, or compare different options, can do that without breaking out of the existing workflow


Ford Garberson: Using machine learning to screen for autism and other childhood ailments
    - Previously ATLAS experiment, left a few years ago for a data science experiment
    - A few years later, had a child, began thinking about the future
    - Autism affects 2-3% of children, can be very severe or mild
        - For children of scientists, can be 2 or 3 times higher than this
    - How can we identify autism as early as possible to start treatment early
    - Skilled specialist can diagnose at ~1.5 years old
    - Usually diagnosed much later due to limited access to specialists, especially in rural areas
    - Goal is to create an app backed by ML to identify those of highest risk quick
    - Two flagship instruments for autism identification
        - ADI-R: 93 multiple choice questions, filled by expert clinician after a few hours with the parent
        - ADOS: ~30 multiple choice questions, filled by expert after a highly standardized ~hour with the child
    - Can we capture some of the benefits of these instruments in an app?
        - Have the app ask simplified questions directly
        - Ask the parent to record their child for ~1 minute and upload the video
    - Where ML comes in: such as identify most important questions and how relevant they are
    - However, ML doesn't perform as well in real life
    - Some confounding factors
        - Parents may not understand questions or may give biased responses
        - Video questionnaire watchers are not as trained as clinicians, and it's short and often doesn't show symptoms clearly
    - Other competitors exist, but this is the first group trying ML, and it's doing the best
    - Even larger separation from competition if we allow for an "inconclusive" classification of up to 25%
    - Lots of other potential tools to add
        - ML on photos of child's expressions
        - ML on audio of child's voice
        - ML on motion of child's movements
        - ML on data about how child plays tablet games
    - While ~2% of kids have autism, ~13% have some diagnosable psychological condition, could expand to other conditions
    - Small team, two data scientists so far
        - Hiring a data scientist now, aiming at more senior people
        - If the position is filled, still contact Ford, intend to open more positions in the future
    - Can try the app: https://cognoa.com/parents/dtc
    - Question (Steven): what types of ML tools/etc do you use?
        - Ford: not tried anything too fancy so far
        - Competition is just summing number of questions with a given answer
        - Getting a real training event costs thousands of dollars, have to extract as much info as possible from limited data
        - Not too much benefit to going to deep learning or similar
        - Looking at random forests and BDTs at the moment
        - Much larger dataset now (hundreds of thousands of people have used the app)
        - However, don't usually know if the app user ends up having autism or not