CERN openlab Summer Student Lightning Talks (2/2)
On Monday 11th and Tuesday 12th of August, the 2025 CERN openlab summer students will present their work at two dedicated public Lighting Talk sessions.
In a 5-minute presentation, each student will introduce the audience to their project, explain the technical challenges they have faced and describe the results of what they have been working on for the past two months.
It will be a great opportunity for the students to showcase the progress they have made so far and for the audience to be informed about various information-technology projects, the solutions that the students have come up with and the potential future challenges they have identified.
Please note
- Only the students giving a talk need to register for the event
- There are 18 places available on Monday and 19 places on Tuesday
- The event will be accessible via webcast for an external audience (Please invite your university professors and other students)
Day 1 information: https://indico.cern.ch/event/1543700/
Please note that pictures and videos might be taken during the event. The pictures and videos might be used for communication about the event. By joining the lecture, you are agreeing to being featured in these communication actions.
-
-
13:30
→
13:37
Next Generation Triggers for CMS: Continual Learning for Decision Trees 7m 31/3-004 - IT Amphitheatre
In the coming upgrade to the CMS trigger system many machine learning algorithms are being developed which will use low-level detector information to make decisions on whether to keep data or not. These algorithms will be exposed to a dynamic inference environment that will not only differ from their training environments but evolve over time as the detector conditions change. This project will explore continual learning approaches to incrementally updating trigger algorithms with new data specifically focusing on decision tree-based algorithms. The project will explore changing environments in the CMS detector as well as designing novel ways of training and updating decision trees. There will be opportunity to explore the implementation of these continual learning training techniques within the wider CMS trigger ML training infrastructure.
Speaker: Sara Abdelrazeq (University of Bonn) -
13:37
→
13:44
Next Generation Triggers for CMS: Integrating the Kubeflow platform for Level 1 Trigger MLOps applications 7m 31/3-004 - IT Amphitheatre
The Level-1 Trigger is the earliest stage of event selection in the CMS experiment. Recent advancements in hardware and streamlined firmware synthesis tools have enabled an increasing number of machine learning (ML) algorithms to operate at Level-1. Given the constantly evolving detector environments, robust ML Operations (MLOps) pipelines are essential to ensure efficient data taking. Kubeflow is an advanced workflow engine built on Kubernetes and provides the ability to scale and manage ML pipelines across heterogeneous compute resources (CPUs and GPUs). This project aims to develop a complete Level-1 MLOps pipeline on the Kubeflow platform, consisting of: data extraction, transformation, and loading, as well as training, hyperparameter tuning, and firmware synthesis. This work has the potential to establish a new standard for MLOps in CMS trigger applications.
Speaker: Asma Basly -
13:44
→
13:51
Enhancing the FPGA synthesis service platform 7m 31/3-004 - IT Amphitheatre
HLS and logic syntheses are crucial steps in producing FPGA firmware before deployment on hardware and are commonly the most resource intensive tasks, taking hours to complete and requiring powerful computers. There is a rising need for a synthesis-as-a-service platform, which is beginning to take shape in a platform called Gofer. This summer project aims to enhance Gofer, a Python-based FPGA synthesis-as-a-service platform, by implementing user-facing features such as user quotas, role-based permissions, and an improved web UI. Quotas will ensure fair resource allocation, while role-based access control (RBAC) will provide fine-grained permissions for managing users and groups. The redesigned web interface will include a real-time dashboard for monitoring synthesis jobs, managing quotas, and visualizing resource usage. Additionally, the OpenAPI specification and platform documentation will be used to auto-generate compatible client tools. These improvements will make Gofer more secure, scalable, user-friendly, and better suited for multi-user environments.
Speaker: Anushka Bilandani (Indian Institute of Information Technology, Pune) -
13:51
→
13:58
Automatic minutes summarisation in Indico using a LLM (Large Language Model) 7m 31/3-004 - IT Amphitheatre
The aim of this project is to create a feature that automatically generates summaries of meeting minutes stored in Indico. This feature will extract key information from the minutes and present it in a concise, easy-to-read format. To achieve this, the student will develop a plugin for Indico that utilizes an open-source Large Language Model (LLM) to summarize the minutes. The plugin will enable users to select the minutes they wish to summarize, using pre-written prompts to generate the summaries. The student will join the Indico team and collaborate with them to deliver this project. This is a great opportunity to enhance their skills in web development and AI, while working on a real-world project that will benefit the Indico community at CERN and beyond.
Speaker: Zeynep Caysar -
13:58
→
14:05
Integrating Intel FPGA Acceleration into Alpaka Using SYCL: A Feasibility Study with CLUE 7m 31/3-004 - IT Amphitheatre
As part of the CERN Openlab internship, this project investigates the integration of Intel FPGA acceleration into Alpaka, a performance-portable C++ abstraction library used in the CMS experiment. The goal is to evaluate the feasibility of using SYCL-based high-level synthesis (HLS) to support FPGAs as a backend in Alpaka. The Intel IA-840f card, featuring an Agilex 7 FPGA, is used for development and testing. To validate the toolchain and performance characteristics, the CLUE (Parallel Clustering for HG Cal) algorithm is ported and tested on the FPGA simulator using SYCL. The project involves analyzing memory placement strategies, evaluating initiation intervals, and optimizing data movement and kernel structure. This work lays the foundation for extending Alpaka's hardware abstraction to include reconfigurable architectures in high-throughput physics applications.
Speaker: Mohamad Khaled Charaf -
14:05
→
14:12
MLOps Infrastructure and Development for LHCb Monitoring Optimization 7m 31/3-004 - IT Amphitheatre
The LHCb experiment operates thousands of servers using advanced monitoring systems in an on-premise cloud environment. To enhance observability and streamline operations, this project implements a cloud-based MLOps infrastructure tailored to the LHCb environment. It includes workflows and pipelines for seamless model deployment and management, interfaces for efficient interaction, and tools to monitor and evaluate model performance. This robust setup enables the integration of advanced machine learning techniques to further improve the LHCb monitoring system.
Speaker: Hanna Czifrus (CERN) -
14:12
→
14:19
Matching Multi-Resolution CAD Models via Representation Learning 7m 31/3-004 - IT Amphitheatre
A system for comparing and matching resolution-variant CAD models using representation learning techniques, built for use in NVIDIA Omniverse digital twin and simulation environments.
Speaker: Max Decman -
14:19
→
14:26
CERN Digital Twin: CAD-to-USD with Dynamic LOD Rendering 7m 31/3-004 - IT Amphitheatre
This project develops an automated pipeline to convert CERN's CAD models into USD format for digital twin visualization in NVIDIA Omniverse. It integrates dynamic Level-of-Detail (LOD) rendering to optimize real-time performance, ensuring scalable and high-fidelity visualization of complex CERN facilities.
Speaker: Noa Emien Ette -
14:26
→
14:33
A CXL-enabled, FPGA-based Near Memory Compute (NMC) accelerator for L1 data scouting at CMS 7m 31/3-004 - IT Amphitheatre
The L1 Trigger at CMS performs real time event reconstruction at the 40 MHz bunch-crossing rate, identifying high-interest collision events for downstream processing. To enhance the trigger’s online analysis capabilities, the L1 Data Scouting initiative aims to utilize emerging hardware technologies to manage data volume and latency challenges directly at this early stage. This project investigates the suitability of Micron’s FPGA based Near Memory Compute (NMC) accelerator integrated via the Compute Express Link (CXL) protocol as a scalable, low-latency compute platform for CMS L1DS applications. By benchmarking physics driven workloads, such as jet clustering algorithms, on the NMC accelerator, we evaluate its performance and resource efficiency for CMS data processing. Thus, the project aims to demonstrate the potential of next generation memory centric computing hardware to accelerate physics workflows in the high throughput, latency sensitive environment such as the CMS L1 Trigger.
Speaker: Zohaib Irfan -
14:33
→
14:48
Coffee Break 15m 31/3-009 - IT Amphitheatre Coffee Area
-
14:48
→
14:55
Abuse and anomaly detection using Machine Learning for GitLab runners 7m 31/3-004 - IT Amphitheatre
Part of the Version Control Systems service at CERN is the provisioning of GitLab Runner infrastructure to facilitate the CI/CD pipeline execution for the needs of the whole organization. We run more than 20,000 jobs monthly across 10 clusters customized for each unique use case our researchers and developers need. However, the shared runners are not protected against abuse of the available resources and malicious practices. Users can run huge workloads and render a cluster unavailable for the majority of users if their jobs occupy all the resources. The student will enhance our monitoring capabilities by developing a Machine Learning-based abuse and anomaly detection service. This service will collect data from Prometheus and Opensearch to establish a baseline of normal pipeline behavior, including metrics like CPU and memory usage, job execution durations, and the commands executed within each job. Leveraging this data, the student will build a system to identify suspicious activities, such as resource anomalies or malicious code execution, and automatically generate reports to alert the VCS team.
Speaker: Diana Nersesyan -
14:55
→
15:02
Flash Storage potential in Large-Scale Physics Workflows 7m 31/3-004 - IT Amphitheatre
This summer student project proposes a comprehensive benchmarking study of currently available access protocols to Pure Storage’s flash storage system. This initial phase aims to evaluate the performance, reliability, and efficiency of various access protocols under different workload scenarios, focusing on parameters such as data throughput, latency, scalability, and resource utilization. The study will provide valuable insights into the strengths and limitations of each protocol, enabling the identification of the most optimal configurations for different use cases. In a potential second phase, the project could extend to benchmarking Pure Storage technology when integrated with CERN’s EOS storage system. This phase would explore how the advanced features of Pure Storage, such as DirectFlash® technology and its DirectFlash Modules (DFMs), perform within the distributed and high-demand environment of EOS. The integration would be analyzed for its impact on data placement, latency, write amplification, and overall system efficiency.
Speaker: Robert-Paul Pasca -
15:02
→
15:09
Energy Efficiency and Performance Analysis of Monte Carlo Event Generators for High-Energy Physics 7m 31/3-004 - IT Amphitheatre
The objective of this project is to study the performance and energy consumption of Monte Carlo event generators used in High-Energy Physics (HEP) simulations. The project builds upon ongoing efforts to optimize event generators such as MadGraph5_aMC@NLO, evaluating their performance across various computational backends—including GPU acceleration with both NVIDIA and AMD GPUs, and CPU-based acceleration using vector instructions on Intel and ARM processors. The work involves the use of several profiling and monitoring tools to gain insights into hardware utilization, memory efficiency, and power consumption. This research has broader implications for the HEP community, addressing the urgent need for sustainable computational practices, improved development and profiling toolchains, and optimized computing infrastructure. It also offers the opportunity to contribute to cutting-edge research at the CERN IT department, collaborating with a team of experienced researchers.
Speaker: Raisa Rahman Richi (Cern Openlab Summer Student) -
15:09
→
15:16
Next Generation Triggers ML Optimization: Quantization Techniques for Scalable Inference 7m 31/3-004 - IT Amphitheatre
Quantization is the use of a lower precision for the model parameters. It is a useful technique, as it lowers significantly the size of a model, while trying to preserve the accuracy. Some runtimes like ONNX offer the possibility to do dynamic quantization. The tasks will be to explore and evaluate the different techniques for model quantization to offer users a view of their model metrics under such a compression. The outcomes of this project will provide decision support for the users whether to use quantization for inference and/or training.
Speaker: Najla Samer Sadek -
15:16
→
15:23
Extending Grafana for visualization of historical data from CERN industrial control systems 7m 31/3-004 - IT Amphitheatre
Dashboards play an important role in visualizing historical data from control systems, enabling transforming complex datasets into actionable insights. They enable operators, engineers and scientists to monitor system performance, identify patterns and detect anomalies with ease. At CERN, several custom web-based tools have been developed to enable visualization of historical data from WinCC OA-based control systems. Recently, there has been growing interest in complementing these custom solutions with Grafana - a popular solution for creating dynamic web dashboards with rich data visualization. Grafana provides built-in support for querying data from TimescaleDB/PostgreSQL - the database that will soon be used to store historical data from WinCC OA systems used at CERN. However, the complexity and large size of the signal metadata in these systems present unique challenges. To address this, custom Grafana plugins are essential to enable users to efficiently browse and select signals for display in dashboard panels. This project aims to develop proof-of-concept Grafana plugins that will enrich the platform’s functionality. These extensions will focus on providing high-performance, user-friendly widgets for browsing and selecting signal metadata, allowing users to create dynamic dashboards without the detailed knowledge of the underlying database schema.
Speaker: Lasse Baerland Strand -
15:23
→
15:30
AI-Enhanced Operator Assistance for UNICOS Applications 7m 31/3-004 - IT Amphitheatre
The proposed project aims to investigate the potential of AI-driven assistants in supporting operators using UNICOS applications. UNICOS is the CERN SCADA system for technical infrastructure supporting the accelerator complex. The focus will be on exploring the technical feasibility and benefits of integrating an AI assistant into UNICOS to assist operators in real-time decision-making and operational efficiency. By enabling functionalities such as voice-controlled commands, real-time system monitoring, and smart parameter tuning, the AI assistant can reduce cognitive load on operators, and provide faster access to critical information, ultimately enhancing system reliability and operational safety.
Speaker: Bernard Tam -
15:30
→
15:37
Orchestrating Distributed Hybrid Quantum-HPC Workflows with Kubernetes 7m 31/3-004 - IT Amphitheatre
This project focuses on orchestrating distributed hybrid quantum-classical workflows using Kubernetes, leveraging advanced features like Kueue for workload scheduling and Argo Workflows for managing complex, multi-step pipelines. The workflow supports tasks such as quantum circuit cutting and sampling-based quantum diagonalization, enabling scalable execution across quantum processing units (QPUs), GPUs, and CPUs. By containerizing each stage—partitioning, execution, and reconstruction—the system provides a modular, fault-tolerant, and resource-aware platform for hybrid quantum-HPC workloads.
Speaker: Mar Tejedor Ninou -
15:37
→
15:44
Identify and categorise journals on Zenodo 7m 31/3-004 - IT Amphitheatre
Zenodo is the world's largest general-purpose research repository, whose open-access nature makes it an essential resource for over 400,000 users globally. However, this openness also exposes it to misuse by malicious actors, particularly predatory publishers engaging in unethical practices. Such activities compromise the platform's credibility and the trust of its research community. This project prototyped a method to automatically identify and flag potentially malicious user accounts. The system was trained using years of existing data in a combination of heuristics and machine learning models, including large language models, to detect signatures of predatory activity. This involved conducting behavioral and temporal pattern analysis to identify anomalous user conduct alongside content analysis to find textual markers of fraudulent research. The resulting model, which effectively flags accounts focused primarily on minting DOIs rather than legitimate research, is to be hosted on CERN’s machine learning infrastructure, providing the Zenodo team with a sustainable tool to safeguard the platform's integrity.
Speaker: Karlo Vrancic -
15:44
→
15:51
Automatic serving-focused benchmarking of Models 7m 31/3-004 - IT Amphitheatre
This project aims to develop an automated benchmarking system for machine learning models, evaluating them from a user-centric perspective—primarily focusing on latency and resource usage. It will support diverse model types and serving runtimes (e.g.,TFServing, TorchServe, ONNX), and integrate profiling tools to visualize performance bottlenecks. The outcome will enhance quality of service and aid in diagnosing performance issues in production environments.
Speaker: Tomasz Wojnar -
15:51
→
15:58
Cross-Site Data Integrity Check with Ceph and Kafka 7m 31/3-004 - IT Amphitheatre
This project demonstrates a system for verifying data integrity in a geo-distributed Ceph storage cluster between the CERN sites of Meyrin and Prévessin. By capturing Ceph's bucket notifications with a Kafka pipeline, the system compares events from both sites in order to validate consistency under specific, simulated conditions. The primary objective is to provide an automated, script-based method for identifying potential data integrity issues.
Speaker: Dawid Wiktor Grabowski -
15:58
→
16:05
Bridging authentication for services in the Virtual Research Environment 7m 31/3-004 - IT Amphitheatre
The ESCAPE Virtual Research Environment (VRE) is a Jupyter-based platform that enables interactive analysis and seamless integration with various external services, allowing users to create an end-to-end scientific analysis. The VRE represents one the key results delivered by the ESCAPE project; its development was–and still is–driven by the scientific needs of the ESCAPE communities addressed together with a collaborative, open source approach. The VRE is the first analysis facility prototype to grant seamless, concurrent access to three CERN services via plugins called extensions: Rucio, a scientific data management software, Reana, an analysis framework focused on reproducibility and re-analysis, and Zenodo, a multi purpose repository that enables researchers to share and preserve any research outputs in any size, any format and from any science. One fundamental feature that the VRE needs is robust, seamless and user-transparent integration of all its extensions. The successful candidate would develop an OpenID Connect (OIDC) hook to integrate two of the VRE JupyterLab extensions with the ESCAPE Identity and Access Management (IAM) provider. This feature will allow any user to be connected and identified with the Reana and the Zenodo services immediately after having successfully logged into the VRE. Lowering the barrier to entry for inexperienced users and reducing the technical tasks needed to achieve an operational environment will make sure that scientists can perform better research with less technical overhead.
Speaker: Tomas Ondrejka -
16:05
→
16:30
https://indico.cern.ch/event/1508441/surveys/
-
16:30
→
17:00
Certificate handout ceremony by the CERN openlab CTO team 30m Restaurant 2
Restaurant 2
CERN
-
17:00
→
19:00
Food & Drinks 2h Restaurant 2
Restaurant 2
CERN
-
13:30
→
13:37