ICALEPCS 2021: 2nd Data Science and Machine Learning Workshop

UTC
Remote

Remote

Manuel Gonzalez Berges (CERN), Marco Lonza (Elettra - Trieste)
Description

Workshop Description

The fields of large scale data analytics and machine learning have made impressive progress in recent years. Many applications have been successful in applying techniques in these fields for problems in areas such as health, language processing, search engines, etc Many tools have been developed to facilitate the application of these techniques (e.g. libraries like Scikit-learn, TensorFlow, Keras, PyTorch, Apache Spark, etc)

A growing number of examples of applications in accelerators and experimental physics installations have started to appear in the last years. The workshop will aim to share the experience gained in the development of some of these applications, whether successful or not. This should contribute to the continuous growth of the use of these methods in our systems.

Correlated topics: data analytics, statistical analysis, data mining, deep learning, neural networks, expert systems, automatic optimization, robotics, etc.

 

Organizational Details:

Given the conference limitations due to Covid, the workshop will last only half day, from 12:00 to 16:00 (UTC) 15th October.

We have decided not to include any tutorials. As a pre-workshop activity, you can follow the tutorials of the previous edition from the Indico site: https://indico.cern.ch/event/828418/. For the tutorials of Alfredo Canziani on Supervised and Unsupervised learning (or other related topics), you can go directly to his YouTube channel (link below)

We are at the moment preparing the agenda. Despite the big number of participants and the remote format, we aim to have the workshop as interactive as possible. We will start with an introductive session where experts will give us an overview of techniques and applications in the field. This will be followed by informal presentations by the participants aiming to trigger discussions among all of us. We are not expecting fully polished presentations like in the conference, but rather short presentations of problems that you have solved, failed to solve or are trying to face using data science or machine learning. If you have a proposal for a contribution, please submit a short abstract in this Indico page (left menu). Please indicate also the time you would need for the presentation.

For the abstract submission you will need to login to the Indico page with either your CERN credentials, your organization credential if it is part of EduGAIN, or with your public service account (e.g. Facebook, Google, etc.). If you experience any issue with the submission, you can send the abstract via email to:

manuel.gonzalez@cern.ch and marco.lonza@elettra.eu

Reference links:

ICALEPCS 2021 Official Page: https://indico.ssrf.ac.cn/event/1/

ICALEPCS 2021 Workshops: https://indico.ssrf.ac.cn/event/1/page/16-workshops

Alfredo Canziani YouTube channel: https://www.youtube.com/c/AlfredoCanziani/videos

    • 12:00 12:10
      Workshop Introduction 10m
      Speakers: Manuel Gonzalez Berges (CERN), Marco Lonza (Elettra Sincrotrone Trieste)
    • 12:10 13:55
      Overview Session
      • 12:10
        Machine learning for accelerators: a physicist approach 35m

        Machine learning has become ubiquitous today as a bracket for similar but different concepts statistical learning, neural networks, and reinforcement learning. These different approaches allow tackling a wide range of problems: deriving complex parameter sets from stochastic data, discover or simplify complex relationships, substitute diagnostics, e.g., particular beam destructive ones.
        This talk will give a glimpse to the different areas of ML, recommend some tools and their usage, and describe some developments currently envisaged at BESSY II. Furthermore, it will address engineering issues that have to be addressed to roll out an ML application successfully within a large scale infrastructure.

        Speaker: Pierre Schnizer (BESSY)
      • 12:45
        AI/ML Operational challenges at SLAC's accelerators & Collaborating Facilities 35m

        Particle accelerators are used in a wide array of medical, industrial, and scientific applications, ranging from cancer treatment to understanding fundamental laws of physics. While each of these applications brings with them different operational requirements, a common challenge concerns how to optimally adjust controllable settings of the accelerator to obtain the desired beam characteristics. For example, at highly flexible user facilities like the Linac Coherent Light Source (LCLS) and FACET-II at SLAC National Accelerator Laboratory, requests for a wide array of custom beam configurations must be met in a limited window of time to ensure the success of each experiment – a task which can be difficult both in terms of tuning time and the final achievable solution quality. At present, the operation of most accelerator facilities relies heavily on manual tuning by highly-skilled human operators, sometimes with the aid of simplified physics models and local optimization algorithms. As a complement to these existing tools, approaches based on machine learning are poised to enhance our ability to achieve higher-quality beams, fulfill requests for custom beam parameters more quickly, and aid the development of novel operating schemes. With a focus on practical experiences at SLAC and collaborating institutions, I will discuss recent developments in using ML for online optimization, the creation of ML-enhanced virtual diagnostics to aid beam measurements, the use of ML to create fast-executing online models (i.e. digital twins) of accelerator systems, and ML-aided accelerator design. I will also discuss the high-level open-source software and workflows we are working on and using at SLAC for ML development and deployment, highlight open questions and challenges that we've encountered, and give an outlook on pathways we are currently taking to address those challenges.

        Speaker: Auralee Linscott Edelen
      • 13:20
        ML and optimization algorithms for CERN accelerators 35m

        The presentation will go through the recent projects in the field that are regularly shared at the CERN ML and Data Analytics community forum. The current status of the development and the results obtained so far will be highlighted.

        Speaker: Verena Kain (CERN)
    • 13:55 14:10
      Break 15m
    • 14:10 16:10
      Project Presentations
      • 14:10
        LSTM model for the automatic LHC collimator alignment 25m

        A collimation system is installed in the Large Hadron Collider (LHC) to protect its sensitive equipment from unavoidable beam losses. An alignment procedure determines the settings of each collimator, by moving the collimator jaws towards the beam until a characteristic loss pattern, consisting of a sharp rise followed by a slow decay, is observed in downstream beam loss monitors. This indicates that the collimator jaw intercepted the reference beam halo and is thus aligned to the beam. The latest alignment software introduced in 2018 relies on supervised machine learning (ML) to detect such spike patterns in real-time. This enables the automatic alignment of the collimators, with a significant reduction in the alignment time. This paper analyses the first-use performance of this new software focusing on solutions to the identified bottleneck caused by waiting a fixed duration of time when detecting spikes. It is proposed to replace the supervised ML model with a Long-Short Term Memory model able to detect spikes in time windows of varying lengths, waiting for a variable duration of time determined by the spike itself. This will allow for further speeding up the automatic alignment.

        Speaker: Gabriella Azzopardi (CERN)
      • 14:35
        Report on the Artificial Intelligence workshop at EuXFEL 15m

        Experiments at large scale facilities like the European X-ray Free Electron Laser (EuXFEL) generate data at a very large rate, leading experimentalists with data samples potentially including millions of unlabelled images and instrument scientists with online monitoring of data from different sources. The analysis and understanding of such data need to cope with the high data rate, both to improve the operation of the facility and the production of high-quality scientific results. Machine Learning includes many methods that may contribute to the analysis of such large datasets. The EuXFEL has organized a Machine Learning workshop with the goal of brainstorming on solutions to common issues and establishing networks of researchers which could improve the current Machine Learning applications in the facility. This presentation shows a quick summary of the workshop.

        Speaker: Danilo Ferreira de Lima
      • 14:50
        Automatic Serial Femtosecond Crystallography online analysis with Reinforcement Learning 20m

        Data analysis pipelines typically require experiment- or data-dependent parameters to be tuned. Serial Femtosecond Crystallography (SFX) is a powerful technique that allow to disclose structural information in a time-resolved fashion. We present here an attempt of tuning the parameters of a software tool widely used for SFX data analysis, CrystFEL, using a model-free actor-critic Reinforcement Learning method to automatically tune the input parameters in SFX for fast online feedback, allowing users to adapt their experiments to maximize data quality and output on-the-fly.

        Speaker: Danilo Ferreira De Lima (Ruprecht Karls Universitaet Heidelberg (DE))
      • 15:10
        Report on LEAPS Integrated Platform workshop 20m
        Speaker: Marco Calvi (PSI)
      • 15:30
        Machine Learning for the Tune Estimation in the LHC 20m

        The betatron tune in the Large Hadron Collider (LHC) is measured using a Base-Band
        Tune (BBQ) system. The processing of these BBQ signals is often perturbed by 50 Hz noise harmonics
        present in the beam. This causes the tune measurement algorithm, currently based on peak detection,
        to provide incorrect tune estimates during the acceleration cycle with values that oscillate between
        neighbouring harmonics. The LHC tune feedback (QFB) cannot be used to its full extent in these
        conditions as it relies on stable and reliable tune estimates. In this work, we propose new tune
        estimation algorithms, designed to mitigate this problem through different techniques. As ground-
        truth of the real tune measurement does not exist, we developed a surrogate model, which allowed
        us to perform a comparative analysis of a simple weighted moving average, Gaussian Processes and
        different deep learning techniques. The simulated dataset used to train the deep models was also
        improved using a variant of Generative Adversarial Networks (GANs) called SimGAN. In addition,
        we demonstrate how these methods perform with respect to the present tune estimation algorithm.

        Speaker: Leander Grech (University of Malta (MT))
      • 15:50
        Peek to some recent ML Activities at DESY 10m

        DESY hosts a huge number of complex facilities like PETRA, FLASH and the European XFEL but also some smaller scale test facilities. At many of these facilities, the usage of ML techniques and approaches has strongly increased in the last years. This talk will give a brief overview about the some of this ongoing activities within the machine division at DESY.

        Speaker: Raimund Kammering (DESY)
      • 16:00
        Machine Learning for Failure Detection on RF Cavities 10m

        In this talk, we are going to show you the most recent activities in failure detection on the RF cavities at XFEL, mostly for quench detection and predictive maintenance. We show that a two-layer recurrent neural network can be trained to classify the behaviour of the signals that we record about the cavities and even with relatively inaccurate labelling achieve convincing results to detect signals that precede a failure or exhibit a normal function.

        Speaker: Antonin Sulc (DESY)
    • 16:10 16:25
      Submitted Topics & Problems for discussion
    • 16:25 16:30
      Conclusion 5m