On Monday 12th and Tuesday 13th of August, the CERN openlab 2024 summer students will present their work at two dedicated public Lighting Talk sessions.
In a 5-minute presentation, each student will introduce the audience to their project, explain the technical challenges they have faced and describe the results of what they have been working on for the past two months.
It will be a great opportunity for the students to showcase the progress they have made so far and for the audience to be informed about various information-technology projects, the solutions that the students have come up with and the potential future challenges they have identified.
Please note
Due to its intrinsic quantum mechanical nature, applications in chemistry and materials science are among the strongest candidates for algorithmic quantum advantage. However, to understand the problem size at the break-equal point between quantum and classical solutions, one requires large-scale simulations of quantum circuits. Here, we propose to develop and investigate an emulation approach to reduce the cost of simulating trotterized Hamiltonian evolution (a very common subroutine of many chemistry applications) by efficiently emulating multi-qubit rotations. To assess the practical impact and efficiency of our emulation strategy, we are conducting comprehensive benchmarks. These benchmarks compare the performance and resource utilization of multi-qubit rotation implementations against the single and double-qubit rotations. Following the evaluation, we aim to explore ways to integrate the emulation strategy into the Intel Quantum SDK or other quantum computing frameworks to enhance their efficacy and accessibility for quantum chemistry computations.
CERN poses a difficult challenge: with thousands of devices in charge of controlling critical systems, distributed across multiple departments, how can one effectively monitor their state and minimise response time in critical situations? One approach is to form a tree structure, with devices as leaves and computation nodes that decide whether to propagate an error from their children upwards. Starting from a prototype of such a tool, my supervisor and I have built an intuitive interface to design this structure, and a distributed system which runs the tree and displays results in real-time. While the main goal has been large-scale device monitoring, we focused on making the tools generic enough to be potentially applicable to other domains.
The primary objective of this project is to evaluate the integration of Apache Knox, an open-source gateway for securing and centralizing access to multiple Hadoop clusters, with the CERN Single Sign-On infrastructure. The goal is to enhance the security and streamline the authentication process for accessing distributed data from Hadoop and HBase clusters hosted centrally at CERN. The successful integration of Apache Knox with CERN SSO can provide CERN with a robust and secure solution for managing access to distributed data thereby improving user data security.
In this study, we explore the effect of tuning hyperparameters in order to achieve the best model performance on an environmental AI-based model for drought predictions in the Alps developed by EURAC Trento in the context of InterTwin under the InterTwin Project. We begin with a brief introduction of both interTwin and the environmental use case, followed by a comparative analysis of different hyperparameter combinations, showcasing their impact on model accuracy, computational efficiency and overall predictive capability.
ROOT's modern analysis interface, RDataFrame, is designed to meet the high computing demands of the forthcoming High-Luminosity LHC (HL-LHC). In this project, the essential aspects of the LHC physics analyses workflows are showcased by the RDataFrame implementation of the Analysis Grand Challenge executed within a distributed computing environment at CERN. We investigate the robustness and scalability of modern HEP data analysis workflows as well as the processing capabilities in the distributed computing environments necessary for future LHC physics experiments. A first exploration of the recently released CMS open data is also foreseen as part of the project [1]. [1] The CMS Collaboration (2024) CMS releases 13 TeV proton collision data from 2016. Available at: https://opendata.cern.ch/docs/cms-releases-2016data-2024 (Accessed: 24 July 2024)
In today’s large-scale environments, keeping systems healthy and performing at their best is crucial. This work presents a practical approach to system monitoring that combines various data sources and methods. It explores how service logs and Prometheus metrics provide detailed insights into operational issues and their effects on system performance while also enabling real-time monitoring and alerts via email. The project uses model training techniques to spot problems before affecting system health. Using it, organizations can proactively identify and address anomalies by applying data-driven and unsupervised learning methods, leading to better system performance. We also integrate Kubeflow, an open-source platform for managing machine learning workflows, simplifying the deployment of these models. This combined approach provides a framework for monitoring and optimizing large-scale environments, ensuring data systems remain resilient and effective as operational demands evolve.
The project goal is to use AI to streamline the CMS paper peer review process by improving the standardization, readability, and overall efficiency of scientific papers, ultimately making the CMS-internal peer review process faster. We are developing an AI-driven solution using the open-source LLaMA Large Language Models (LLMs) and fine-tune them with the Parameter Efficient Fine-Tuning (PEFT) approach with over 1300 published LaTeX source-documents from the CMS experiment.
CERN has substantial interest in the use of digital twins for many technological, engineering, and scientific improvements. In this case, digital twins provide assistance with planning interventions. Using NVIDIA Omniverse, a 3D simulation and programming engine, a large-scale 3D digital twin of the CERN accelerator complex was created and simulated. This was done using CERN CAD data that was converted to the Universal Scene Descriptor (USD) format. These models and data were used in the digital twin to create interactive tools for intervention planning and simulation, including navigation through areas, animations, analyzing access levels, and simulating intervention size and weight.
This project implements a disaster recovery system that replicates backups to long-term Oracle Cloud Object Storage. The objective is to provide protections against threat actors and primary backup failures using geographic separation, data immutability and faster retrieval times compared to tape storage.
Kubernetes revolutionized the way we manage storage with the Container Storage Interface (CSI), enabling seamless integration with block and file systems. But what about object storage? Enter the Container Object Storage Interface (COSI). In this talk, we'll explore how COSI brings object storage management to Kubernetes, making it as effortless as handling persistent volumes. We'll delve into its practical applications with S3, and how CERN is leveraging COSI with Ceph/RGW to enhance OpenShift PaaS and Kubernetes services. Join us for an insightful journey into the future of Kubernetes storage!
In response to the significant engineering challenges hindering scientific research, CERN and European research institutes are pioneering a standardized digital twin framework to streamline and enhance the efficiency of scientific workflows. This talk will detail the development of advanced machine learning (ML) solutions at CERN, focusing on distributing model training and automating MLOps across high-performance computing (HPC) infrastructures. Key initiatives include integrating distributed deep learning and hyper-parameter optimization tools into traditional ML training workflows for optimal deployment and management. As a summer student, I am involved in applying and validating the ML Digital Twin Framework(itwinai) across diverse digital twin applications, from high-energy physics to environmental monitoring. This presentation will highlight my experiences with leading ML frameworks such as PyTorch, showcasing their impact on accelerating scientific innovation in digital twin technology.
The objective of this project is to identify the most suitable lightweight open-source large language model (LLM) for code summarization in the domain of industrial control systems. The focus is on selecting models that are capable of running on a single GPU with less than 40GB RAM, and provide fast, accurate summaries. The analysis showed promising results, demonstrating the viability of deploying these models in a cost-effective manner for real-world applications. Moreover, fine-tuning efforts, particularly with Llama 3.1 8B, highlighted the potential of these models to deliver context-aware responses.
AdaptivePerf is an open-source, comprehensive, and low-overhead code profiler. It is based on linux perf and can profile on-CPU and off-CPU activity producing thread/process trees and non-time-ordered and time-ordered flame graphs by tracing relevant syscalls.
My objective for the internship was to develop a metric profiler that can sample different continuous signals at runtime. It can be based on different external tools from which data will be recorded to be displayed along the time-ordered flame graphs.
LxPlus is a cluster of linux machines that is used on average by over a 1000 workers at CERN each. The software distribution on LxPlus is done using CVMFS - a file system that serves as a scalable, reliable and low-maintenance software distribution service. The goal of my project was data exploration of CVMFS usage on LxPlus, as well as checking hypotheses set by the CVMFS developers.
The Agassi model simulates the behaviour of a two-shell nuclear system in which both the paring and quadrupole interactions are present. Despite the simplicity of its Hamiltonian, it admits a very rich phase diagram. There are currently no known methods to solve this model analytically for nonzero temperatures. The aim of our project is to locate the phase transitions in the case of a finite model. We combine techniques of classical and quantum anomaly detection, together with algorithms enabling to find the thermal states in a computationally feasible way, to achieve this goal.