# Fast Machine Learning for Science Workshop

America/Chicago
Southern Methodist University

#### Southern Methodist University

Description

We are pleased to announce a four-day event "Fast Machine Learning for Science”, which will be hosted virtually by Southern Methodist University from November 30 to December 3. The first three days (Nov 30 - Dec 2) will be workshop-style with invited and contributed talks.  The last day will be dedicated to technical demonstrations and coding tutorials.

As advances in experimental methods create growing datasets and higher resolution and more complex measurements, machine learning (ML) is rapidly becoming the major tool to analyze complex datasets over many different disciplines.  Following the rapid rise of ML through deep learning algorithms, the investigation of processing technologies and strategies to accelerate deep learning and inference is well underway.  We envision this will enable a revolution in experimental design and data processing as a part of the scientific method to greatly accelerate discovery. This workshop is aimed at current and emerging methods and scientific applications for deep learning and inference acceleration, including novel methods of efficient ML algorithm design, ultrafast on-detector inference and real-time systems, acceleration as-a-service, hardware platforms, coprocessor technologies, distributed learning, and hyper-parameter optimization.

Abstract submission deadline: October 30, 2020 3:59PM CDT

Organizing Committee:
Allison Deiana (Southern Methodist University)
Rohin Narayan (Southern Methodist University)
Thomas Coan (Southern Methodist University)
Elizabeth Fielding (Southern Methodist University)

Scientific Committee:
Javier Duarte (UCSD)
Phil Harris (MIT)
Burt Holzman (Fermilab)
Scott Hauck (U. Washington)
Shih-Chieh Hsu (U. Washington)
Sergo Jindariani (Fermilab)
Mia Liu (Purdue University)
Allison McCarn Deiana (Southern Methodist University)
Mark Neubauer (U. Illinois Urbana-Champaign)
Maurizio Pierini (CERN)
Nhan Tran (Fermilab)

Registration
Fast Machine Learning for Science Workshop
Participants
• A J Meir
• Aaron Bundock
• Abdul Khan
• Abinash Medhi
• Achyut Khanal
• Aidar Ilyasov
• Alan Taylor
• ALBERT ROSSI
• Alec Gunny
• Alex Gekow
• Alex Gekow
• Alex Kish
• Alex Preston
• Alexander Chkodrov
• Alexander Josef Grohsjean
• Alexandre Sousa
• Alexandros Trasias
• Alexey Grobov
• Alexia Natsi
• Alexx Perloff
• Aliaa Seleem
• Alibek Kaliyev
• Alice Campani
• Allen Ansari
• Allison Deiana
• Alpana Alpana
• Amir Gholami
• Amy Tee
• Anatoliy Martynyuk
• andre c
• Andre Sznajder
• Andreas Augoustis
• Andrej Lozar
• Andrej Seljak
• Andres Felipe Quintero Parra
• Andres Meza
• Andres Ramirez Morales
• Andrew Edmonds
• Andrew Mejia
• Andrew Mogan
• Andrew Reis
• Andrew Zheng
• Anil Sonay
• Ankit Rai
• Anna Landa
• Anne-Sophie Berthold
• Anthony Tanaydin
• Antonio Giannini
• Antonio Gioiosa
• Antonio Gomez
• Anup Kumar Sikdar
• Anuruddha Bhattacharjee
• Arno Straessner
• ARTUR KALINOWSKI
• Artur Lobanov
• Artur Trofymov
• Ashish Sharma
• Ashwin Samudre
• Atanu Maulik
• Atar Singh Kushwah
• Athanasios Mattas
• Athanasios Michailoudis
• Awais Bin Zahid
• Babak Abi
• Babar Khan
• BALASUBRAMANIAM DAKSHINAMOORTHI
• Bartłomiej Borzyszkowski
• Belina von Krosigk
• Benedikt Wach
• Benjamin Hawks
• Bertrand Echenard
• Bin Huang
• biruk seyoum
• Blaž Leban
• Brett Mayes
• Bryan Foo
• Burt Holzman
• Caitlin Endler
• Carla Reyes
• Carlos Chavez Barajas
• Carsten Hensel
• Caterina Aruta
• Caterina Doglioni
• Cecilia Tosciri
• Chaitanya Paikara
• Chang-Seong Moon
• changhyun yoo
• Charalampos Tsafaridis
• Chatura Kuruppu
• CHEUK-PING WONG
• Chris Hayward
• Chris Walerczyk
• Christian Herwig
• Christina McConville
• Christof Sauer
• Christopher Hilgenberg
• Claire Antel
• Claire Savard
• Clara Cala Franco
• Cristina Ana Mantilla Suarez
• DAFNI GIANNAKERI
• Dana Douqa
• Daniel Guerrero
• Daniel Joseph Antrim
• Daniel Reynolds
• Daniele Bonacorsi
• Danny Noonan
• Darren Price
• David Miller
• David Neuffer
• David Rodriguez
• David Rousseau
• David Yu
• Debajyoti Sengupta
• Dedeepya Chinnam
• Deep Chatterjee
• Dejan Golubovic
• Dennis Schaart
• Dhanush Anil Hangal
• Dhiraj Kalita
• Diana Kafkes
• Diana Patricia Mendez
• Diego Figueiredo
• Diego Stalder
• DIMITRA BALLA
• Dimitri Bourilkov
• Dimitris Proios
• Dinesh Bhatia
• DIogo Guerra
• Dirk Krucker
• Dmitri Strukov
• Dmitry Litvintsev
• Domenico Giordano
• Don Athula Wickremasinghe
• Dr T V RAJINI KANTH
• Dritan Kodra
• Duc Hoang
• Durdana Balakishiyeva
• Dustin Bracy
• Dustin Werran
• Dylan Sheldon Rankin
• Ebony Postell
• Edith Karina Aquino Cantero
• Edoardo Gorini
• Efe Yazgan
• Egidijus Kukstas
• Eleftheria-Pigi Miliou
• Eleni Christidou
• Elham E Khoda
• Eliana Gianfelice
• Elijah Cruda
• Elizabeth Berzin
• Elizabeth Sexton-Kennedy
• Elliot Parrish
• Elvira Rossi
• Ema Puljak
• Emanuele Usai
• Emily Filmer
• Emmanouil Vourliotis
• Eric Church
• Eric Godat
• Eric Guzman
• Erik Katsavounidis
• Esra Barlas-Yucel
• Ethan Marx
• Evangelos Kourlitis
• Evgenia Vogiatzi
• Fanyi Zhao
• Farah Fahim
• Farouk Mokhtar
• Federica Cuna
• Ferdinand Schenck
• Fernando Carrio Argos
• Filitsa Kougioumtzoglou
• Finn Jonathan Labe
• Florian Bury
• FNU Vinsensius
• Francesca Del Corso
• Francesco Cirotto
• Francesco Conventi
• Francesco Fiori
• Francis Ho
• Francisco Javier Perez Gomez
• Fred Olness
• Gabriel Santucci
• Gabriele Benelli
• Gabriele Sirri
• Gage DeZoort
• Garyfallia Paspalaki
• Geoff Hassall
• Georgios Karathanasis
• Georgios-Fotios Angelis
• Giacomo Boldrini
• Giacomo Scanavini
• Gianantonio Pezzullo
• Giles Chatham Strong
• Gino Marceca
• Giorgos Symeonidis
• Giovanni Franzoni
• Giuseppe Di Guglielmo
• GKOURAM MIRZAEV
• Goutham Makkena
• Gregor Kasieczka
• Gregori Rigakis
• Guillaume Quétant
• Hamza Javed
• Hannah Mejia
• Hanno Meyer zu Theenhausen
• Haosen Tan
• Harsh Purwar
• Hasan Mohamed
• Heather Gray
• Helenka Casler
• Heng-Ye Liao
• Heraclitos Lefcochilos-Fogelquist
• Hernan Andres Morales Navarrete
• Hichem Bouchamaoui
• Hollie Gardner
• Horacio Crotte Ledesma
• Huilin Qu
• Iacopo Longarini
• Idowu Itiola
• Ilaria Risso
• Ioanna Elissavet Chatziioannou
• Ioanna Papavergou
• Ishwar Singh
• Ishwar Singh
• Israel Kurtz
• J-C Chiao
• Jacopo Pazzini
• Jaggar Henzerling
• Jaiden Parlone
• James Kahn
• Jamie Dyer
• Jan Tuzlic Offermann
• Jannicke Pearkes
• Jasmine Liu
• Javier Mauricio Duarte
• Jayalakshmi Jain
• Jaydip Singh
• Jaymie Ruddock
• Jean-Roch Vlimant
• Jeffrey Krupa
• Jeremy Hewes
• Jessica Esquivel
• Jianhui Wang
• Jiapeng Liu
• Jie Xiao
• Jieun Hong
• Jing-Ge Shiu
• Jinwoo Kim
• Jiwoong Kim
• Joakim Olsson
• Johann Christoph Voigt
• Johannes Haller
• john emmett
• John Partee
• Jonathan Eisch
• Jonathan Lindbloom
• Jordan McElwee
• Jorge Mendez
• Josh McFayden
• Joshua Agar
• Joshua Mills
• Joshua Qualls
• Joshua Yao-Yu Lin
• Jovan Mitrevski
• Joy Shi
• Joydeep Chatterjee
• João Victor Da Fonseca Pinto
• Juan Cepeda
• Julia Layne
• Julia Lynne Gonski
• Julia Vazquez
• Julian Höfer
• Junghwan Goh
• Justin Selig
• Jyothisraj Johnson
• Kalliopi Spanidou
• Kanik Palodhi
• Kapil Sareen
• Karl Warburton
• Kate Scholberg
• Katharina Dort
• Katie Mason
• Katya Govorkova
• Keegan Harrig
• Kenneth Yamamoto
• Kevin Pedro
• Khawla Jaffel
• Kin Ho Lo
• Konstantinos Foutzopoulos
• Kris Ghimire
• Kristin Marie Dona
• Kristina Jaruskova
• Kyle Hazelwood
• Kyle Kolodziej
• Kyun Woo Hong
• Laura Lazarescou
• Lauri Laatu
• Lea Di Noto
• Lei Zhang
• Leonardo Cristella
• Leonid Didukh
• Liiana Teodsorescu
• Liz Kneale
• Lorenzo Moneta
• Louis-Guillaume Gagnon
• Louise Skinnari
• Luana Parsons Franca
• Luca Giommi
• Luca Lavezzo
• Luis Mora Lepin
• Maha Bilal
• Mahima Garg
• Makoto Uchida
• Malgorzata Makos
• Manish Kumar
• Manoj Pandey
• Manolis Kargiantoulakis
• Manuel Jesus Rodriguez Alonso
• Marcela Garcia Hernandez
• Marcin Paluch
• Marcin Zielinski
• Marco Lorusso
• Marco Pagani
• Maria Antonova
• Maria Domenica Galati
• Maria Kalafatidou
• Maria Moreno Llacer
• Maria Silva
• Mariano Dominguez
• Mark Dimitsas
• Mark Neubauer
• Markus Diefenthaler
• Markus Helbig
• Markus Julian Atkinson
• Marta Felcini
• Martin Beyer
• Martin Harrison
• Masako Iwasaki
• Matteo Cremonesi
• Matteo Migliorini
• Matteo Rossi
• Matthew Bellis
• Matthias Schroeder
• Mauro Donega
• Maximilian Graf
• Meghna Bhattacharya
• Mehdi Nikfar
• Mel Schwan
• Miaoyuan Liu
• Michael Churchill
• MICHAEL MILES
• Michael Zhao
• Michaela Blott
• Michele Faucci Giannelli
• Michelle Ntampaka
• Min Liu
• Ming-Feng Ho
• Ming-Feng Tu
• MING-HSING CHIU
• Mingshi Ji
• Mohamed Elashri
• Mohammed Mahmoud Mohammed
• Moises Garcia
• Morgan Chen
• Muge Karagoz
• Mukharbek Organokov
• Munira Khan
• Mykola Khandoga
• N Sushree Ipsita
• Nabin Poudyal
• Nabin Poudyal
• Naif Tarafdar
• Nam Tran
• Nanxi Yao
• Narcisa Guran
• Nello Bruscino
• Nemer Chiedde
• Nhan Tran
• Nicholas Kinnaird
• Nicholas Luongo
• Nick Fritzsche
• Nick Manganelli
• Nicola Fulvio Calabria
• Nikolaos Rompotis
• Nikolas Cruz Camacho
• Nilotpal Kakati
• Nisha Kurkure
• Oksu Seon
• Oleg Filatov
• Olga Taran
• Olivia Weng
• omri avrahami
• P White
• Panagiotis Lingos
• Panos Stamoulis
• Pantelis Kontaxakis
• Paolo Calafiura
• Patricia Rebello Teles
• Paul Feichtinger
• Paul Hientz
• Pavani Palla
• Pawel Klimek
• Peilong Wang
• Peter Moore
• Petya Vasileva
• Philip Coleman Harris
• Pietro Vischia
• Pooja Maurya
• Prabhjot Singh
• Prachi Sharma
• Predrag Milenovic
• Preeti Rajpoot
• Priyanka Asnani
• Quoc Trung Ho
• Raghav Kansal
• Rahmat Rahmat
• RAJEEV KUMAR
• Rajesh Tamang
• Ricardo Rocha
• Ricardo Vilalta
• Richard Guarino
• Richard Weiss
• RIkel Djoko
• Rishabh Uniyal
• Robert John Bainbridge
• Robert Kalescky
• Robert Kralik
• Robert Ortega
• Roger Proksch
• Rohin Thampilali Narayan
• Roman Kogler
• Roya Jafari
• Rui Shi
• Rui Zhang
• Rupamoy Bhattacharyya
• Ryan Allen Rivera
• Ryan Forelli
• Ryan Hooper
• Sachinthya Wagaarachchi
• Saichand Varanasi
• Salvatore Danzeca
• Sam Foreman
• Sam Jenkins
• Sandeep Garg
• Sang Eon Park
• SangEun Lee
• Sanmay Ganguly
• Santosh Parajuli
• Sara Morales Vigo
• Savannah Jennifer Thais
• Scarlet Rachel Norberg
• Scott Hauck
• Scott Perkins
• Sean Hughes
• Sebastiano Raiz
• Seema Sharma
• Sehoon Kim
• Selina Dhinsey
• Semen Lebedev
• Sergey Furletov
• Sergio Di Domizio
• Sergio Garza
• Sergo Jindariani
• Sheila Silva Do Amaral
• Shih-Chieh Hsu
• Shih-Kai Lin
• Shihua Huang
• Shreya Saha
• Shuaiyan Kang
• Shubhangi Krishan Maurya
• Shukui Zhang
• Sid Swarupananda
• Silvia Auricchio
• Sioni Paris Summers
• Sitong An
• Sofia Murillo Sanchez
• Soniya Samani
• Sotirios Tsongas
• Sougata Sen
• Srishti Nagu
• Stanislava Sevova
• Starr Corbin
• Stella Wermuth
• Stephanie Majewski
• Stephen Arrowsmith
• Stephen Robertson
• Steven Walton
• Sudeshna Ganguly
• Sudha Ahuja
• sungwon KIM
• suravinda Kospalage
• Tao WU
• Thabang Lebese
• THEODOROS MANOUSIS
• Thomas Calvet
• Thomas Coan
• Thomas Klijnsma
• Thomas Kutter
• Thomas Primidis
• Thotho Mabiku
• Tianqiao Zhao
• Tim Greenshaw
• Tobi Delbruck
• Tobias Golling
• Tobias Welti
• Tom Cheng
• Tomasz Wachala
• Tomislav Seva
• Tommaso Diotalevi
• Tomás Müller
• Torben Ferber
• Trung-HIeu Tran
• Truong Nguyen
• Umberto Tamponi
• Valentina Cicero
• valerio pascucci
• Vesal Razavimaleki
• Victor Goicoechea
• VINCENT TOGO
• Vincenzo Cacchio
• Vishal Bhardwaj
• Wei Mu
• Wei Tong
• Wesley Ketchum
• William Hinton
• William Tang
• Winnie Houng
• Xiangwen Shang
• Xiaohu Sun
• Xin Zheng
• Xinyu Liu
• Xinyun Lu
• Xuan Li
• Xueyuan Li
• YALING LIU
• Yanxiao Han
• Yavar Taheri Yeganeh
• Yeon-Jae Jwa
• Yihui Lai
• Yongbin Feng
• Yoshinari Hayato
• You Lin
• Yuan-Ru Lin
• Yulia Furletova
• Yunlong Fan
• Yuying Guo
• Yıldıray Kömürcü
• Zachary Ho
• Zachary Shelton
• Zhen Dong
• Zhenbin Wu
• Zhiqiang Que
• Zhongbo Kang
• Zuhal Seyma Demiroglu
• Ömer Özak
• Monday, 30 November
• 10:00 AM 10:05 AM
Welcome and Orientation 5m
Speaker: Allison Mccarn Deiana (Southern Methodist University (US))
• 10:05 AM 10:10 AM
Welcome from SMU 5m
Speaker: Dr Karisa Cloward (Southern Methodist University)
• 10:10 AM 10:15 AM
Overview of Workshop 5m
Speaker: Nhan Tran (Fermi National Accelerator Lab. (US))
• 11:00 AM 11:15 AM
Coffee Break 15m
• 11:15 AM 11:45 AM
Efficient Neural Network Training and Inference 30m
Speaker: Amir Gholami
• 11:55 AM 12:55 PM
Lunch 1h
• 12:55 PM 1:25 PM
Imaging: Electron Microscopy 30m
Speaker: Josh Agar (Lehigh University)
• 1:35 PM 2:05 PM
• 2:15 PM 2:30 PM
Coffee Break 15m
• 2:30 PM 2:38 PM
Quantifying DNA Damage in Comet Assay images using Neural Networks 8m

Proton therapy for cancer treatment is a rapidly growing field and increasing evidence suggests it induces more complex DNA damage than photon therapy. Accurate comparison between the two treatments requires quantification of the damage caused, one method being the comet assay. The program outlined here is based on neural network architecture and aims to speed up analysis of comet assay images and provide accurate, quantified assessment of the DNA damage levels apparent in them.

The comet assay is an established technique in which DNA fragments are spread out under the influence of an electric field, producing a comet-like object. The elongation and intensity of the comet tail (consisting of DNA fragments) indicate the level of damage incurred. Many methods to measure this damage exist, using a variety of algorithms. These can be time consuming, so often only a small fraction of the comets available in an image are analysed. The automatic analysis presented here aims to improve this.

Object detection and localisation, implemented by a Mask-RCNN neural network, are used to perform instance segmentation of the comets. The identified comet instances are then saved as masks, which when overlaid onto the original image, provide pixel coordinates of the identified comets. A minimum accuracy of 90% has been achieved by the model in identifying comets in an image. The model has been trained via transfer learning from Microsoft’s extensive COCO model, which is based on over 200,000 labelled images. This has significantly reduced both training time and also the number of images required for training (less than 70 images have been used here).

To supplement the training and testing of the network a Monte Carlo model is being developed in order to create simulated comet assay images.

Speaker: Selina Dhinsey
• 2:40 PM 2:46 PM
Autoencoders for anomaly detection in real-time at the LHC 6m

At the LHC, data are collected at 40 MHz but only 1 kHz of data can be stored for physics studies. A typical LHC experiment operates a real-time selection system, that has to decide if an event should be stored or discarded. The first stage of this system, the L1 trigger, runs on custom electronic boards, mounting FPGAs. A L1 algorithm needs to operate within O(1μsec) latency. In this system, we aim to operate an unsupervised algorithm designed to identify outliers. Possibly highlighting the occurrence of new phenomena in LHC collisions. To this purpose, we design an autoencoder processing particle four momenta and we exploit hls4ml to deploy the model on an FPGA and evaluate its resource consumption and latency in various configurations.

Speaker: Katya Govorkova (CERN)
• 2:48 PM 2:56 PM
Design of a reconfigurable autoencoder algorithm for detector front-end ASICs 8m

The next generation of particle detectors will feature unprecedented readout rates and require optimizing lossy data compression and transmission from front-end application-specific integrated circuits (ASICs) to the off-detector trigger processing logic. Typically, channel aggregation and thresholding are applied, removing information useful for particle reconstruction in the process. A new approach to this challenge is directly embedding machine learning (ML) algorithms in ASICs on the detector front-end to allow intelligent data compression before transmission. We present an algorithm optimized for the High-Granularity Endcap Calorimeter (HGCal) installed in the CMS Experiment for the high-luminosity upgrade to the Large Hadron Collider. We trained a neural-network (NN) autoencoder to achieve optimal compression fidelity for physics reconstruction while respecting hardware constraints on internal parameter precisions, computational (circuit) complexity, and area footprint. The autoencoder improves over non-ML algorithms in reconstructing low-energy signals in high-occupancy environments. Quantization-aware training is performed using qKeras and is implemented in RTL using the hls4ml compiler tool. Finally, we discuss our solution's flexibility, wherein sensors may be individually tuned to optimize performance across the full detector and over the range of expected run conditions during the detector's lifetime.

Speaker: Giuseppe Di Guglielmo (Columbia University)
• 2:58 PM 3:04 PM
Large and compressed Convolutional Neural Networks on FPGAs with hls4ml 6m

We present ultra low-latency Deep Neural Networks with large convolutional layers on FPGAs using the hls4ml library. Taking benchmark models trained on public datasets, we discuss various options to reduce the model size and, consequently, the FPGA resource consumption: pruning, quantization to fixed precision, and extreme quantization down to binary or ternary precision. We demonstrate how inference latencies of O(10) micro seconds can be obtained while high accuracy is maintained

• 3:06 PM 3:12 PM
Convolutional Neural Network Fast Inference Deployment on FPGAs 6m

From self-driving cars to particle physics, the uses of convolutional neural networks are plentiful. To greatly decrease inference latency, CNNs and other deep learning architectures can be deployed to hardware compute environments in the form of Field Programmable Gate Arrays (FPGAs). The open source package HLS4ML is leveraged to complete model conversion and RTL synthesis. The work presented here describes methods with which the generated Verilog/VHDL can be further optimized to yield further latency reductions and smaller hardware resource requirements.

Speaker: Andrew Harmon Reis (Southern Methodist University (US))
• 3:14 PM 3:20 PM
Convolutional Neural Networks for real-time processing of ATLAS Liquid-Argon Calorimeter signals with FPGAs 6m

Physicists use the Large Hadron Collider (LHC) at CERN/Geneva to create proton-proton (pp) collisions to study rare particle-physics processes at high energies. Within the Phase-II upgrade, the LHC and the particle detectors will be prepared for high luminosity operation, starting in 2027. One challenge is the high level of signal pile-up caused by up to 200 simultaneous pp collisions. Moreover, in the case of the Liquid-Argon (LAr) Calorimeters of the ATLAS detector, the signals of up to 25 subsequent collisions overlap, which further increases the difficulty to reconstruct the energy deposit in the detector.

In order to cope with this, the readout electronics of the ATLAS LAr Calorimeters will be upgraded, which will allow a real-time processing of the full sequence of digitized pulses sampled at 40 MHz. Conventional signal processing applies an optimal filter to reconstruct the energy of the detector hits. However, the high level of pile-up and a new trigger scheme requires a more advanced signal reconstruction method.

We have developed a dilated convolutional neural network (CNN) which improves the efficiency to identify significant energy deposits above a given noise threshold and which reduces the number of incorrectly identified hits when compared to an optimal filter. Since the implementation target of the CNN is a Field Programmable Gate Array (FPGA), the number of parameters and the mathematical operations are well controlled. A second network structure aims at reconstructing the hit energy, using the information of the hit identification network. The CNN training data are generated by a dedicated simulation program, called AREUS, which provides realistic signal sequences including all noise sources.

Moreover, we implemented the CNN structure in firmware in an automated way, translating the CNN training output file into VHDL, targeting an INTEL Stratix-10 FPGA. Linearized sigmoid activation functions are tested and compared to the full-precision calculation. Very good agreement between FPGA and computer based calculations is observed. We also analyzed the FPGA resource usage and the maximum frequency at which the algorithm can be executed.

The presentation will summarize the latest performance results obtained with the CNN approach and the most recent prototype implementations in FPGA firmware.

Speakers: Anne-Sophie Berthold (Technische Universitaet Dresden (DE)) , Nick Fritzsche (Technische Universitaet Dresden (DE))
• 3:22 PM 3:28 PM
FastCaloGAN: a tool for fast simulation of the ATLAS calorimeter system with Generative Adversarial Networks 6m

Building on the recent success of deep learning algorithms, Generative Adversarial Networks (GANs) are exploited for modelling the response of the ATLAS detector calorimeter to different particle types and simulating calorimeter showers for photons, electrons and pions over a range of energies (between 256~MeV and 4~TeV) in the full detector η range. The properties of showers in single-particle events and of jets in di-jets events are compared with full detector simulation performed by GEANT4. The good performance of FastCaloGAN demonstrates the potential of GANs to perform a fast calorimeter simulation for the ATLAS experiment.

Speaker: Michele Faucci Giannelli (INFN e Universita Roma Tor Vergata (IT))
• 3:30 PM 3:36 PM
A OneAPI backend of hls4ml to speed up Neural Network inference on CPUs 6m

A recent effort to explore a neural network inference in FPGAs using High-Level Synthesis language (HLS), focusing on low-latency applications in triggering subsystems of the LHC, resulted in a framework called hls4ml. Deep Learning model converted to HLS using the hls4ml framework can be executed on CPUs, but have subpar performance. We present an extension of hls4ml using the new Intel oneAPI toolkit that converts deep learning models into high-performance Data Parallel C++ optimized for Intel x86 CPUs. We show that inference time on Intel CPUs is improved hundreds of times over previous HLS-based implementation, and several times over unmodified Keras/TensorFlow.

• 3:38 PM 3:44 PM
A Quartus backend for hls4ml: deploying low-latency Neural Networks on Intel FPGAs 6m

We describe the new Quartus backend of hls4ml, designed to deploy Neural Networks on Intel FPGAs. We list the supported network components and layer architectures (dense, binary/ternary, and convolutional neural networks) and evaluate its performance on a benchmark problem previously considered to develop the Vivado backend of hls4ml. We also introduce the support for recurrent layers and introduce a new asynchronous calling model to increase performance for larger models. In addition to that, we also demonstrate the use of this new model to optimize large-sparse networks.

Speaker: Hamza Javed (Pakistan Institute of Engin. and (PK))
• Tuesday, 1 December
• 10:00 AM 10:30 AM
HPC 30m
Speaker: Robert Kalescky (Southern Methodist University )
• 10:40 AM 11:10 AM
Health Sensing, Detection, and Monitoring 30m
Speakers: Nabil Alshurafa (Northwestern University) , Sougata Sen (Northwestern University)
• 11:20 AM 11:35 AM
Coffee Break 15m
• 11:35 AM 12:05 PM
Beyond CMOS 30m
Speaker: Dimitri Strukov (UCSB)
• 12:15 PM 1:15 PM
Lunch 1h
• 1:15 PM 1:45 PM
Deep Learning Acceleration of Progress in Fusion Energy Research 30m

Accelerated progress in delivering accurate predictions in science and industry have been accomplished by engaging advanced statistical methods featuring artificial intelligence/deep learning/machine learning (AI/DL/ML). Associated techniques have enabled new avenues of data-driven discovery in key scientific applications areas such as the quest to deliver Fusion Energy – identified by the 2015 CNN “Moonshots for the 21st Century” televised series as one of 5 prominent grand challenges for the world today. An especially time-urgent and challenging problem facing the development of a fusion energy reactor is the need to reliably predict and avoid large-scale major disruptions in magnetically-confined tokamak systems such as the EUROFUSION Joint European Torus (JET) today and the burning plasma ITER device in the near future -- -- a ground-breaking \$25B international burning plasma experiment with the potential capability to exceed “breakeven” fusion power by a factor of 10 or more with “first plasma” targeted for 2026 in France. Meanwhile, a key challenge is to deliver significantly improved methods of prediction with better than 95% predictive accuracy to provide advanced warning for disruption avoidance/mitigation strategies to be effectively applied before critical damage can be done to ITER

This presentation describes advances in the deployment of deep learning recurrent and convolutional neural networks in Princeton’s Deep Learning Code -- "FRNN” – that have enabled the rapid analysis of large complex datasets on supercomputing systems that have accelerated progress in predicting tokamak disruptions with unprecedented accuracy and speed (Ref. “NATURE,” (April 26, 2019). This represented the first adaptable predictive DL software trained on leadership class systems to deliver accurate predictions for disruptions across different tokamak devices (DIII-D in the US and JET in the UK) – with the unique capability to carry out efficient “transfer learning” via training on a large data base from one experiment (i.e., DIII-D) and be able to accurately predict disruption onset on an unseen device (i.e., JET) ! Moreover, in recent advances, the FRNN inference engine has recently been deployed in a real-time plasma control system on the DIII-D tokamak facility in San Diego,CA. This opens up exciting avenues for moving from passive disruption prediction to active real-time control with subsequent optimization for reactor scenarios.

Speaker: Bill Tang (Princeton University)
• 1:55 PM 2:10 PM
Coffee Break 15m
• 2:10 PM 2:18 PM
Making ML easier at CERN with Kubeflow 8m

Different groups at CERN have been focusing on changing existing workflows and processes to rely on machine learning, covering trigger farms, fast simulation, anomaly detection, reinforcement learning, etc.

To help end users in these tasks a service must hide the underlying infrastructure complexity and integrate well with existing identity and storage services, as well as easing the tasks of data preparation, model training, serving, among others.

In this talk we present a new solution available at CERN based on Kubeflow, the ML platform running on top of Kubernetes. We describe how the underlying resources - CPUs and GPUs - are offered to the end user hiding the complex details that allow the service to scale horizontally, and shared with the goal of optimizing resource usage. We present how existing on-premise capacity can be extended to external resources (public clouds) without users realizing, and for use cases where on-demand usage is cost effective such as covering for peak periods.

In the second part of the talk we cover the complete ML lifecycle. Examples will include quick code development and iteration using notebooks; submission of analysis pipelines allowing workloads to easily scale out, and including the direct conversion of a notebook to a pipeline; distributed model training with submission via both a web interface and an API; hyper-parameter tuning support with multiple search algorithms available; and finally model storage and serving.

Speaker: Dejan Golubovic (CERN)
• 2:20 PM 2:26 PM
Using an Optical Processing Unit for tracking and calorimetry at the LHC 6m

Experiments at HL-LHC and beyond will have ever higher read-out rate. It is then essential to explore new hardware paradigms for large scale computations. We have considered the Optical Processing Unit (OPU) from LightOn https://lighton.ai , which is an analog device to multiply a binary 1 mega pixel image by a (fixed) 1E6x1E6 random matrix, resulting in a mega pixel image, at a 2kHz rate. It could be used for the whole branch of Machine Learning using random matrix in particular for dimensionality reduction. In this talk, we have explored the potential of OPU for two typical HEP use cases:

1) “Tracking”: high energy proton collisions at the LHC yield billions of records with typically 100,000 3D points corresponding to the trajectory of 10.000 particles. Using two datasets from previous tracking challenges, we investigate the OPU potential to solve similar or related problems in high-energy physics, in terms of dimensionality reduction, data representation, and preliminary results.

2) “Calorimeter Event classification”: high energy proton collision at the Large Hadron Collider have been simulated, each collision being recorded as an image representing the energy flux in the detector. The task is to train a classifier to separate signal from the background. The OPU allows fast end-to-end classification without building intermediate objects (like jets). This technique is presented, compared with more classical particle physics approaches.

Speaker: David Rousseau (IJCLab-Orsay)
• 2:28 PM 2:34 PM
Level 1 trigger track quality machine learning models on FPGAs for the Phase 2 upgrade of the CMS experiment 6m

In 2026, the LHC will be upgraded to the HL-LHC which will provide up to 10 times as many proton-proton collisions per bunch crossing. In order to keep up with the increase in data rates, the CMS collaboration is updating the Level 1 Trigger system to run particle selection and reconstruction algorithms on FPGAs in real-time with the data collection system. One such particle algorithm measures the quality of the reconstructed tracks to classify them as "real" or "fake" reconstructed tracks. In this work, we develop supervised machine learning algorithms for track quality classification and test these models on simulated FPGAs using the HLS4ML and Conifer open-source packages.

Speaker: Claire Savard (University of Colorado Boulder (US))
• 2:36 PM 2:42 PM
Adversarial mixture density network for particle reconstruction: a case study in collider simulation 6m

An adversarial mixture density network (AMDN) with gaussian kernels is used to simulate muon reconstruction in the setup of collider detectors. The network is trained on events generated using Madgraph5, Pythia8 and the Delphes3 fast detector simulation implementation for the Compact Muon Solenoid (CMS). It is observed that the network can reproduce relevant kinematic distributions with a very good level of agreement, and at the same time the underlying correlations between reconstructed variables. Without prior collider-specific constraints,the trained network also acquires the azimuthal symmetry, a key feature in CMS simulation. While popular generative models, such as generataive adversarial networks (GANs), demonstrates wide success in various research areas, our work demonstrates that an alternative algorithmic approach more specific to Monte Carlo simulation in collider physics can be favourable and help tackle the increasing computing demands from simulation in collider experiments.

Speaker: Kin Ho Lo (University of Florida (US))
• 2:44 PM 2:50 PM
Application of a neural network based technique for track identification in Nuclear Track Detectors (NTD) 6m

Nuclear Track Detectors (NTDs) have been in use for decades,
mainly as detectors of heavily ionizing particles. Existence of natural
thresholds of detection makes them an ideal choice as detectors in the
search for rare, heavily ionizing hypothesized particles (e.g. Monopoles,
Strangelets etc.) against a large low-Z background in cosmic rays as well
as particle accelerators. But identification of particle tracks in
NTDs presents a significant challenge, with conventional image analysis
software coming up short, requiring the intervention of human experts.
This makes the job of scanning NTDs a painstakingly slow process, prone
to human errors. In recent years, the use of Machine Learning techniques
has opened up the possibilities of new advances in image analysis. In this
work, we have taken a technique combining sequential application of
convolution and de-convolution previously developed by us and further
upgraded it with the use of Artificial Neural Network. This has further
reduced the need for manual intervention, is producing better
results than commercially available software and is promising to dramatically
speed up the scanning process, thereby facilitating the more widespread

Speaker: Dr Kanik Palodhi (University of Calcutta, Kolkata)
• 2:52 PM 3:00 PM
Ultra Low-latency, Low-area Inference Accelerators using Heterogeneous Deep Quantization with QKeras and hls4ml 8m

While the quest for more accurate solutions is pushing deep learning research towards larger and more complex algorithms, edge devices demand efficient inference i.e. reduction in model size, speed and energy consumption. A technique to limit model size is quantization, i.e. using fewer bits to represent weights and biases. Such an approach usually results in a decline in performance. In this CERN-Google collaboration, we introduce a novel method for designing optimally heterogeneously quantized versions of deep neural network models for minimum-energy, high-accuracy, nanosecond inference and fully automated deployment on-chip. With a per-layer, per-parameter type automatic quantization procedure, sampling from a large base of quantizers, model energy consumption and size are minimized while high accuracy is maintained. This is crucial for the event selection procedure in proton-proton collisions at the CERN Large Hadron Collider, where resources are limited and a latency of O(1) micro second is required. Nanosecond inference and a resource consumption reduced by a factor of 50 when implemented on FPGA hardware is achieved.

• 3:02 PM 3:08 PM
Matrix Element Regression with Deep Neural Networks -- breaking the CPU barrier 6m

The Matrix Element Method (MEM) is a powerful method to extract information from measured events at collider experiments. Compared to multivariate techniques built on large sets of experimental data, the MEM does not rely on an examples-based learning phase but directly exploits our knowledge of the physics processes. This comes at a price, both in term of complexity and computing time since the required multi-dimensional integral of a rapidly varying function needs to be evaluated for every event and physics process considered. This can be mitigated by optimizing the integration, as is done in the MoMEMta package, but the computing time remains a concern, and often makes the use of the MEM in full-scale analysis unpractical or impossible. We investigate in this paper the use of a Deep Neural Network (DNN) built by regression of the MEM integral as an ansatz for analysis, especially in the search for new physics.

Speaker: Florian Bury (UCLouvain - CP3)
• 3:10 PM 3:18 PM
An Early Exploration into the Interplay between Quantization and Pruning of Neural Networks 8m

Machine Learning (ML) is already being used as a powerful tool in High Energy Physics, but the typically high computational cost associated with running ML workloads is often a bottleneck in data processing pipelines. Even on high performance hardware such as Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs) the speed and size of these models are often heavily constrained by available hardware resources. Various model optimization techniques, such as pruning and quantization, have been used in an attempt to alleviate the high performance costs, but often not together, or without fully understanding the interplay of the two techniques. We attempt to explore this interplay between quantization and pruning in order to better understand how they interact. Targeting FPGAs and ASICs, we attempt to determine how to yield the best performance with both quantization and pruning. In this presentation, we explore these techniques by optimizing the HLS4ML 3 Hidden Layer Jet Substructure tagging model, finding that we can successfully optimize the model down to 3-5% of its original size while retaining comparable performance to the original network. Finally, we discuss some next steps into understanding how the different optimization techniques affect the model internally, beyond standard performance metrics.

Speaker: Mr Benjamin Hawks (Fermi National Accelerator Laboratory)
• 3:20 PM 3:26 PM
Real-time Artificial Intelligence for Accelerator Control: A Study at the Fermilab Booster 6m

We describe a method for precisely regulating the gradient magnet power supply (GMPS) at the Fermilab Booster accelerator complex using a neural network (NN). We demonstrate preliminary results by training a surrogate machine-learning model on real accelerator data, and using the surrogate model in turn to train the NN for its regulation task. We additionally show how the neural networks that will be deployed for control purposes may be compiled to execute on field-programmable gate arrays (FPGAs). This capability is important for operational stability in complicated environments such as an accelerator facility.

Speaker: Christian Herwig (Fermi National Accelerator Lab. (US))
• 3:28 PM 3:36 PM
Building the tools to run large scale machine learning with FPGAs with two new approaches: AIGEAN and FAAST 8m

FPGA programming is becoming easier as the vendors begin to provide environments, such as for machine learning (ML), that enable programming at higher levels of abstraction.The vendor platforms target FPGAs in a single host server.To scale to larger systems of FPGAs requires communication through the hosts, which has a significant impact on performance. We demonstrate the deployment of ML algorithms on single FPGAs through FAAST a newly developed FPGA based infrastructure framework. We also present a new Framework, AIGEAN, to run multiple FPGA and CPU heterogeneous system that can leverage direct FPGA-to-FPGA communication links. AIgean and FAAST, take as input an ML algorithm created with a standard ML framework and a specification of the available FPGA and CPU resources. The outputs are software and hardware cores that can compute one or more ML layers. These layers can be distributed across a heterogeneous cluster of CPUs and FPGAs for execution. As part of this work we present an optimized FPGA implementation of a CNNs. We show that in some cases FPGAs can exceed the performance of other accelerators, including GPUs.

Speaker: Naif Tarafdar (University of Toronto)
• 3:38 PM 3:44 PM
muon detection using deep learning, applied to CONNIE events 6m

The CONNIE experiment (Coherent Neutrino-Nucleus Interaction Experiment) is a collaboration from some countries in South America, EEUU and Switzerland . The data collected during the CONNIE experiment can be used to search for time variations of particles arriving at the detectors with periodic and stochastic nature. This experiment uses 12 high resistivity CCDs (Charge-Coupled Devices) placed in the vicinity of the Angra dos Reis nuclear reactor (Planta Almirante Alvaro Alberto, Rio de Janeiro, Brazil), with the purpose of detecting the antineutrinos generated in the reactor by measuring low-energy recoils from coherent elastic scattering (CEνNS). The sensors have recorded images of particles during the last 2 years in 3 hour expositions, where the majority of particles in the images are muon and beta particles that are considered as background. This work uses a deep learning algorithm to classify and detect muon particles in the images in order to remove them from the images for the purpose of neutrino studies, and also to build a time series that can be used as a stability monitor of the detection system.

Speaker: Mr Javier Bernal (Facultad De Ingenieria UNA)
• 3:46 PM 3:52 PM
GPU-accelerated machine learning inference as a service for computing in neutrino experiments 6m

Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution.

Speaker: Mike Wang
• Wednesday, 2 December
• 10:00 AM 10:30 AM
Accelerator-based Neutrinos 30m
Speaker: Jeremy Edmund Hewes (University of Cincinnati (US))
• 10:40 AM 11:10 AM
Neutrino Astrophysics 30m
Speakers: Kate Scholberg (Duke University) , Kate Scholberg (Duke University)
• 11:18 AM 11:20 AM
Acknowledgments 2m
Speaker: Philip Coleman Harris (Massachusetts Inst. of Technology (US))
• 11:20 AM 11:35 AM
Coffee Break 15m
• 11:35 AM 12:05 PM
Cosmology 30m
Speaker: Michelle Ntampaka (STSCI)
• 12:15 PM 1:15 PM
Lunch 1h
• 1:15 PM 1:45 PM
ASICs and Circuits 30m
Speaker: Tobi Delbruck (ETH Zurich)
• 1:55 PM 2:25 PM
Electron-Ion Collider 30m
Speaker: Markus Diefenthaler (Jefferson Lab)
• 2:35 PM 2:50 PM
Coffee Break 15m
• 2:50 PM 2:58 PM
Development of ML FPGA filter for particle identification with transition radiation detector. 8m

Transition Radiation Detectors (TRD) have the attractive features of being able to separate particles by their gamma factor. A new TRD development, based on a GEM technology, is being carried out as a R&D project for the future Electron Ion Collider (EIC) and for upgrade of the GlueX experiment. This detector combines a high precision GEM tracker with TRD functionality and optimized for electron identification.
Modern concepts of trigger-less readout and data streaming will produce a very large data volume to be read from detectors. From a resource standpoint, it appears strongly advantageous to perform both the pre-processing of data and data reduction at earlier stages of a data acquisition. Following this trend, we began to develop an FPGA based Machine Learning algorithm for a real-time particle identification with GEMTRD. This research is important for streaming readout systems being developed now at JLab for EIC. The report will describe first steps in the development of ML-FPGA filter for GEMTRD.

Speaker: Sergey Furletov (Jefferson Lab)
• 3:00 PM 3:08 PM
AI-assisted Tracking Algorithm 8m

In this work we describe the development of machine learning models to assist the CLAS12 detector tracking algorithm. Several networks were implemented to assist tracking algorithm to overcome drift chambers inefficiencies using auto-encoders to de-noise wire chamber signals and corruption detection.A classifier network was used to identify track candidates from numerous combinatorial segments using different types of networks including: Convolutional Neural Networks (CNN), Multi-Layer Perceptron (MLP) and Extremely Randomized Trees (ERT). The final implementation provided an accuracy >99%. The implementation of AI assisted tracking into the CLAS12 reconstruction workflow and provided code speedup of up to 4 times.

Speaker: Gagik Gavalian (Jefferson Lab)
• 3:10 PM 3:18 PM
Anomaly Detection with Spiking Neural Networks on Neuromorphic Chips 8m

We describe anomaly detection applications on Neuromorphic Chips, exploiting Spiking Neural Networks on the Intel Loihi chip. We describe different workflows to train models directly on Loihi or to convert Neural Networks to Spiking Neural Networks. As a benchmark, we consider the problem of Gravitational Wave detection without a-priori assumption of the wave profile. We discuss baseline models and compare their reach to that of Spiking Neural Networks.

Speaker: Bartlomiej Pawel Borzyszkowski
• 3:20 PM 3:28 PM
Deep Learning based acceleration of Gravitational Waves 8m

In gravitational-wave detectors, regression techniques are applied to remove noise artifacts in order to improve the ability to observe and extract information from astrophysics signals. We present a deep learning-based noise regression method called DeepClean that can subtract linear and non-linear noise in gravitational-wave data from the Advanced LIGO detectors. We also discuss our work toward a new computing model in gravitational-wave data analysis where GPU and FPGA acceleration on machine learning inference can be deployed on an as-a-service basis. We use DeepClean as a use-case for exploring such computing models in order to achieve real-time capabilities and overall flexibility such models provide.

Speaker: Alec Gunny
• 3:30 PM 3:38 PM
Accelerating Graph Neural Networks on FPGAs for Particle Track Reconstruction using OpenCL and hls4ml 8m

Current charged particle tracking algorithms at the CERN Large Hadron Collider (LHC) scale quadratically or worse with increasing number of overlapping proton-proton collisions in an event (pileup). As the LHC moves into its high-luminosity phase, pileup is expected to increase to an average of 200 overlapping collisions, highlighting the need for new algorithmic strategies. Recent work has shown that graph neural networks (GNNs) are well-suited to classifying segments of tracks. The real-time data filter at the LHC (L1 trigger) requires sub-microsecond latencies that can only be met by devices like field-programmable gate arrays (FPGAs). Accelerating neural networks on FPGAs facilitates energy efficient data-processing on large datasets with execution times that meet the L1 trigger latency requirements.

In this talk, we present two complementary FPGA implementations of an interaction network, a type of GNN, using OpenCL, an open-source framework for writing programs that execute across heterogenous acceleration platforms, and hls4ml, an open-source compiler of machine learning models into firmware. The OpenCL implementation adopts a CPU-plus-FPGA coprocessing approach where the CPU host program manages the application and all computational operations are accelerated using dedicated kernels deployed to the FPGA and take advantage of the FPGA hardware architecture to parallelize operations. The hls4ml implementation utilizes Xilinx high-level-synthesis tools to convert the GNN model to FPGA firmware making it suitable for both FPGA-only and co-processing applications. We will present comparisons of the two implementations in terms of their resource usage, latency, and tracking performance on the publicly-available TrackML benchmark dataset.

Speaker: Aneesh Heintz (Cornell University (US))
• 3:40 PM 3:46 PM
SONIC: Coprocessors as a service for deep learning inference in high energy physics 6m

In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolve this confrontation provided that algorithms can be sufficiently accelerated. In many cases, algorithmic speedups are found to be largest through the adoption of deep learning algorithms. We present a comprehensive exploration of the use of GPU-based hardware acceleration for deep learning inference within the data reconstruction workflow of high energy physics. We present several realistic examples and discuss a strategy for the seamless integration of coprocessors so that the LHC can maintain, if not exceed, its current performance throughout its running.

Speaker: Dylan Sheldon Rankin (Massachusetts Inst. of Technology (US))
• Thursday, 3 December
• 9:00 AM 12:00 PM
Tutorial Session
• 9:00 AM
hls4ml Tutorial 3h
Speaker: Sioni Paris Summers (CERN)
• 12:00 PM 1:00 PM
Lunch 1h
• 1:00 PM 4:00 PM
Tutorial Session
• 1:00 PM
hls4ml Tutorial 3h
Speaker: Sioni Paris Summers (CERN)