On Tuesday 15th and Wednesday 16th of August, the CERN openlab 2023 summer students will present their work at two dedicated public Lighting Talk sessions.
In a 5-minute presentation, each student will introduce the audience to their project, explain the technical challenges they have faced and describe the results of what they have been working on for the past two months.
It will be a great opportunity for the students to showcase the progress they have made so far and for the audience to be informed about various information-technology projects, the solutions that the students have come up with and the potential future challenges they have identified.
Abstract: This project aims to develop a model-based Reinforcement Learning (RL) algorithm, based on the GP-MPC(Gaussian Process-Model Predictive Control) approach, for controlling accelerator systems using Machine Learning. By leveraging insufficient data and uncertainty quantification capabilities of Gaussian Processes, the algorithm will overcome limitations related to beam instrumentation and training large iterations. The final outcome will be the integration of the modified algorithm into CERN's Generic Optimisation Framework, facilitating easy deployment for operational scenarios and enabling intelligent accelerator control.
Adiabatic protocols are critical subroutine in many quantum technologies. Accelerating these processes is challenging, not only due to suboptimal hardware implementations, but also for fundamental reasons. Indeed, a class of adiabatic theorems bounds the maximum fidelity that a particular protocol may accomplish. Furthermore, phase transitions and narrow gap systems are inherently difficult to address.
In this work, we integrate counterdiabatic drive with quantum optimum control techniques (COLD), as proposed in PRX Quantum 4, 010312. Using symbolic calculus, we expand the formulation of the adiabatic gauge potential ansatz to any spin hamiltonian. Finally, we put COLD to the test on frustrated systems and 2D topologies.
In order to provide urgent and life-saving resources to forcibly displaced people, spontaneous measurement of the number is essential. Instead of directly measuring this number, a popular approach is to estimate the number of shelter tents by satellite imagery. However, traditionally it has to be done by manual labeling, which is labor- and time-consuming. In this project, we propose a solution: using machine learning techniques and satellite imagery to evaluate the number of shelter tents.
This project focuses on creating a modular load testing and benchmarking tool for testing file systems, specifically EOS storage system. EOS supports storage for data from CERN experiments as well as personal user data. It relies on protocols like FUSE and XRootd, which this project aims to test and analyze for any errors or weaknesses. The goal was to create an automated workflow generator and load testing tool, capable of assessing performance, testing the infrastructure and comparing metrics after changing software or hardware configurations.
Databases managed by IT-DA are crucial for CERN's operations. Although many mechanisms for preventing data loss are already in place, our data might still be vulnerable to ransomware attacks. This project proposes saving database copies outside of a potential attacker's reach, using a cloud feature called immutable buckets.
Keeping a project's dependencies updated to the most recent version has a direct impact on efficiency and security, therefore the need for automating this process every so often arises.
The Renovate tool helps us solve this issue and create automatic Merge Requests with the most recent updates (minor and major) for every dependency.
In my CERN openlab 2023 summer tenure, I undertook three cybersecurity projects. Firstly, I addressed the challenge of integrating two-factor authentication (2FA) standards—FIDO2 and OTP—across CERN systems. Despite intensive efforts, the dissonance between these protocols posed insurmountable obstacles to unification. Secondly, I engaged in translating and dissecting IRC chat logs and Telegram conversations of Romanian hacker collectives implicated in the MICI-BICA incident. My role involved decoding strategies, exposing potential threat vectors, and uncovering their tactics to safeguard CERN and affiliated institutions. Lastly, I am currently developing a Python tool for technology detection of 17,000+ CERN websites. This task entails migrating the tool to Python 3, automating core functions, and integrating with the new Single Sign-On. The future plan involves optimizing the tool's capabilities using Go and implementing a versatile vulnerability scanner named Nuclei, leveraging a YAML-based DSL.
High-energy physics (HEP) experiments rely on Monte Carlo (MC) simulation to test hypotheses and understand data distributions. To meet the demand for fast and large-scale simulated samples, novel Fast Simulation techniques have emerged, including neural network models. This project explores the application of a large-scale transformer-based model, leveraging recent advancements in foundation models (e.g., GPT-3, Dall-E-2), for fast simulation in HEP. My contribution involves working on preprocessing and loss function design to enhance the efficiency and accuracy of the proposed transformer-based model and compare their results with other existing methodologies.
In order to have system monitoring, we need telemetry – data about the system’s behavior, emitted from the system itself. Telemetry data comes in three forms – logs, metrics and traces. Logs and metrics are already utilized by the current CERN Monitoring infrastructure. Logs provide information on individual events and include some local context, providing debugging/diagnostic information by describing the immediate surrounding of the event. On the other hand, metrics give system/service level information though the aggregation of measurements across time. Just by using logs and metrics we have discontinuity of information – we can observe individual events and we can observe the system state across time, but reasoning about causality between the two is left as a responsibility of the developer/engineer. Traces aspire to bridge this gap; they link each individual event with the tree of invocation dependencies from the user request that started the chain to all of the side effects it had in the system. The goal of this project is to create a proof-of-concept deployment of the distributed tracing infrastructure, evaluate the possibilities of its integration with the existing monitoring infrastructure and assess the possibilities these additions would unlock towards improving the verbosity and quality of information the monitoring team can provide its clients regarding the behavior of their systems.
In the realm of data processing and physics analysis at the Large Hadron Collider (LHC), there exists a notable advantage of deep learning-based algorithms over traditional physics-based counterparts. This study explores cutting-edge methodologies for the low latency neural network inference on Field Programmable Gate Array (FPGA) devices. Specifically, we concentrate on the recalibration and classification of physics objects at a demanding rate of 40 MHz within the CMS framework. The primary objective of this work is to develop an ultra-low-latency neural network model, strategically combining various techniques such as Quantization Aware Training, Knowledge Distillation, transfer learning, and Pruning schedules to achieve an exceptionally low latency without compromising the integral reconstruction performance.
TMVA provides a fast interference system that takes an ONNX model as input and produces compilation-ready standalone C++ scripts as output which can be compiled and executed on CPU architectures. The idea of this project is to extend this capability to generate from the TMVA SOFIE model representation code that can be run also on Intel GPU using both SYCL and Intel OneAPI libraries. These will allow for a more efficient evaluation of these models on Intel accelerator hardware.
Machine learning has enabled computers to learn from data and improve their performance on tasks, revolutionizing various industries by automating processes, uncovering insights, and enhancing decision-making.
NISQ (Noisy Intermediate-Scale Quantum) refers to the current stage of quantum computing, characterized by quantum devices with a moderate number of qubits and significant noise. In the NISQ era, parametrized quantum circuits (PQC) are employed in machine learning as flexible quantum algorithms that can be executed on current noisy quantum devices. By introducing tunable parameters, these circuits can adapt to the limitations of NISQ devices, enabling quantum-enhanced solutions to classification and generative tasks while paving the way for practical applications of quantum machine learning.
Time series arise from the abundance of sequential data in various fields, and time series analysis uncovers patterns, trends, and dependencies within the temporal data, enabling informed decision-making and predictions for a wide range of applications.
In this project, we aim to replicate the findings of a paper that utilized parametrized quantum circuits (PQCs) to forecast time series data in the context of finance. By faithfully reproducing the original experiments, we seek to establish the reliability and applicability of PQCs for financial time series predictions.
Building upon these results, we further investigate the impact of hyperparameters and explore novel circuit architectures to optimize the performance of quantum machine learning in time series forecasting.
The LHC generates an immense volume of data by colliding protons or heavy ions at extremely high energies, resulting in a multitude of particle interactions. These interactions are crucial for experiments conducted at the LHC, such as ATLAS, CMS, ALICE, and LHCb, and require recording, processing, and analysis. To address this data challenge, the WLCG collaborates globally, offering computing resources and services to support these experiments. To understand which architecture is most suitable for certain kinds of jobs/workloads, we need to benchmark the workloads for the various experiments at CERN. My assignment involved benchmarking these workloads, especially MadGraph, on both CPU and GPU. The goal was to
study the relationship between energy consumption and performance on different architectures.