ICALEPCS 2023: 3rd Data Science and Machine Learning Workshop

Africa/Johannesburg
CENTURY CITY CONVENTION CENTRE4 Energy Lane, Century City, Cape Town

CENTURY CITY CONVENTION CENTRE4 Energy Lane, Century City, Cape Town

4 Energy Lane, Century City, Cape Town
Gianluca Valentino (University of Malta (MT)), Manuel Gonzalez Berges (CERN)
Description

Workshop Description

General development of new methods and applications in the field of data analytics and machine learning are appearing at an increased speed. The use of these techniques in the field of scientific installations is also growing rapidly.  Some examples of applications are anomaly detection, better diagnostics, automatic control, improved performance, optimization of engineering designs or procedures, automated data extraction and analysis from logbooks/logs, coding assistants, etc. The workshop will give an opportunity to learn some of the fundamental concepts underlying the tools that are already in place. It will also be the occasion to share the state of the projects that the participants are involved in.

Intended audience:

People with some basic knowledge of data science and basic programming skills, Specially we will need some participants involved in DS/ML projects that can share their experience

Organizational Details:

The workshop will take place on the 7th of October. If there is enough interest, parallel discussions can be organized at the last part of the workhsop on selected topics. This will be based on the input given during the Indico registration.

We are at the moment preparing the agenda. There will be mainly two sessions: a first one with an introductory tutorial and a second one where participants will have the chance to present in an informal setup their work. This can include completed or ongoing projects, problems that need a solution, questions on a topic, etc

For the abstract submission you will need to login to the Indico page with either your CERN credentials, your organization credential if it is part of EduGAIN, or with your public service account (e.g. Facebook, Google, etc.). If you experience any issue with the submission, you can send the abstract via email to:

manuel.gonzalez@cern.ch and gianluca.valentino@um.edu.mt

Links to previous workshops:

ICALEPCS 2021: 2nd Data Science and Machine Learning Workshop: https://indico.cern.ch/event/1075165/

ICALEPCS 2019: 1st Data Science and Machine Learning Workshop: https://indico.cern.ch/event/828418/

Alfredo Canziani YouTube channel: https://www.youtube.com/c/AlfredoCanziani/videos

Registration
ICALEPCS 2023: 3rd Data Science and ML Workshop
Zoom Meeting ID
64251878404
Host
Manuel Gonzalez Berges
Passcode
90728755
Useful links
Join via phone
Zoom URL
    • 8:00 AM
      Welcome Coffee
    • 1
    • 2
      Tutorial I: Linear and Logistic Regression
      Speaker: Gianluca Valentino (University of Malta (MT))
    • 9:30 AM
      Coffee Break
    • 3
      Tutorial II: Neural Networks, Unsupervised Learning and Advanced Topics
      Speaker: Gianluca Valentino (University of Malta (MT))
    • 11:30 AM
      Lunch Break
    • Project presentations/demos
      • 4
        Neural Networks for Anomaly Detection in LINACs, Injectors, and Transfer Lines

        Maximizing up-time of accelerators relies heavily on the ability to detect and diagnose changes in the machine. The application of machine learning for anomaly detection remains a rich area of research. RadiaSoft has been developing methods for anomaly detection in collaboration with Jefferson Lab, Brookhaven National Lab, and SLAC. Here we provide a survey of recent innovations in anomaly detection for particle accelerators and present results from our recent work. Our studies are focused on the low energy injector at CEBAF, the AGS to RHIC transfer line at BNL, and industrial accelerators for radiotherapy and imaging. We focused on the use of two neural network architectures, inverse models and variational autoencoders. This talk will provide high level context for how these methods are utilized for anomaly detection and results from our studies using both simulation and measurement data.

        Speaker: Jon Edelen (Radiasoft)
      • 5
        A Potential of Use of Language Processing in Accelerator Control Systems

        Particle accelerators rely on complex control systems for their operation. As accelerators grow in scale and complexity, developing and maintaining effective control systems becomes increasingly challenging. In this presentation, we will explore the potential for applying natural language processing (NLP) techniques to improve accelerator operations by closely examining the use of textual data.

        We will present our applications of NLP algorithms to logbook data from DESY and BESSY. Initial results demonstrate feasibility for using NLP to automatically parse log entries, categorize events, detect problems, and surface important information.

        However, challenges remain in handling physics terminology, noisy data, and model generalization. This presentation will provide an overview of how natural language processing can be applied to accelerate logbooks in field of accelerator controls.

        Speaker: Antonin Sulc (DESY)
      • 6
        Use of Machine learning for Denoising Beam Profile Measurements

        Several CERN accelerators are being equipped with Beam Gas Ionization (BGI) profile monitors using high resolution Timepix3 detectors resulting in very powerful and not destructive measurements [1]
        The images produced by these detectors contain the signal from ionization electrons as well as noise coming from different sources (mainly beam losses) and other artifacts like noisy pixels or the RF shield.
        Several approaches are being studied to remove the noise and the artifacts from the images to improve the beam measurement. The presentation will give an overview of these approaches.

        [1] https://bgi.web.cern.ch/introduction/

        Speaker: Javier Martinez Samblas
      • 7
        Addressing protein serial crystallography 36 GB/s data-rate challenge with FPGAs and GPUs

        Serial crystallography [1] is a technique used at synchrotrons and X-ray free electron lasers to solve protein structures from random still diffraction images of thousands of small crystals. The technique is one of the most data intensive techniques at X-ray facilities. With novel detectors, like the 9 MPixel JUNGFRAU [2] currently commissioned at the Paul Scherrer Institute, it is possible to acquire a continuous stream of images at 36 GB/s.

        Such large data rates challenge the current way images are handled in crystallography, i.e., it is no longer possible to save to disk storage every image and every pixel irrespective of their value for the scientific question [3]. On-the-fly data analysis and compression become a key to sustainable operations of high data rate detectors. Given a very high data throughput, such analysis requires computing accelerators, like field programmable gate arrays (FPGAs) and general-purpose graphical processing units (GPUs).

        In this presentation, I will talk about our practical experience from implementing data science methods for on-the-fly analysis on computing accelerators [4]. I will give a practical example of spot finding algorithms that we implemented on GPUs and FPGAs (with high-level synthesis), highlighting differences in both approaches. I will also give an outlook of our early-stage developments in image analysis with machine learning methods.

        [1] T. Weinert et al. (2019). Science, 365, 61-65.
        [2] F. Leonarski et al. (2018). Nat. Methods, 15, 799–804.
        [3] F. Leonarski et al. (2020). Struct. Dyn., 7, 014305.
        [4] F. Leonarski et al. (2023). J. Synchrotron Rad., 30, 227.

        Speaker: Filip Leonarski
      • 8
        Common Problems in Early Stage Projects at the ISIS Neutron and Muon Source

        At the ISIS Neutron and Muon Source, we are still relatively early on in our pursuit to integrate machine learning into the operations of the accelerator. Consultation with various teams across the accelerator has highlighted three key areas where machine learning can be leveraged most effectively, namely fault diagnosis and prediction, the use of virtual diagnostics and intelligent control of the machine. However, in the case of each of these themes we have encountered complications that may limit their development or practical use that we are keen to discuss with other facilities who may have more knowledge and experience mitigating against these issues. Some of these items to be considered include high dimensional feature selection, dealing with highly correlated outputs and how to match models trained on physics simulation with live behavior of the machine.

        Speaker: Kathryn Baker
      • 9
        Additional presentations and/or discussions
    • 2:30 PM
      Coffee Break
    • 10