On Monday 12th and Tuesday 13th of August, the CERN openlab 2024 summer students will present their work at two dedicated public Lighting Talk sessions.
In a 5-minute presentation, each student will introduce the audience to their project, explain the technical challenges they have faced and describe the results of what they have been working on for the past two months.
It will be a great opportunity for the students to showcase the progress they have made so far and for the audience to be informed about various information-technology projects, the solutions that the students have come up with and the potential future challenges they have identified.
Please note
This talk introduces WebAssembly (Wasm) and its innovative integration at CERN, emphasizing its potential to enhance computational efficiency in server environments. We will cover the fundamental concepts of WebAssembly, its potential incorporation into CERN's diverse computational ecosystem through Kubernetes, and present compelling benchmarks that demonstrate its performance benefits. Discover how Wasm is shaping the future of web and scientific computing, providing a versatile and efficient runtime for various applications.
The Tomcat service has been operating in Kubernetes for the past four years. With the advent of new technologies, deployment methods have also evolved. In particular, some applications and components of the current infrastructure are already managed declaratively via Terraform and a GitOps controller. This project aims to explore how ClusterAPI and Crossplane can employ a declarative approach to manage and deploy Kubernetes clusters in a cloud-native way, simplifying disaster recovery. Another important aspect of the project is interoperability; the proposed solution should be developed and tested both on-premise and on Oracle Cloud Infrastructure, further enhancing disaster recovery.
The ORACLE service at CERN manages complex database setups involving clusters and multiple nodes. The project aims to detect discrepancies between the running setup and metadata repositories that store configuration elements. The system needs to detect differences between OEM information, run-time information, and configuration management data stored in Syscontrol LDAP. Discrepancies may lead to service instability or unexpected behavior in the environment, and maintenance or monitoring scripts could also be affected. Therefore, it is crucial to detect misconfigurations, such as incorrect Oracle Home paths, in an efficient manner.
The OpenSearch service at CERN has been operating since 2016 on puppet-managed servers. We are currently managing 122 OpenSearch and OpenDistro clusters powered by a pool of 156 powerful physical machines running AlmaLinux 9. In the current architecture, multiple clusters live within the same host, resources isolation (e.g., cpu, ram) is achieved on the process level and disk space is managed with LVM. This deviates from the standard OpenSearch deployment. The objective of this project is to get familiar and explore a containerized deployment of OpenSearch using the standard Docker + Kubernetes images on the CERN IT Kubernetes Service.
Using the Grafana Oracle Enterprise Manager plugin, create comprehensive dashboards dedicated to monitoring the health, performance, and configurations of Oracle services. These dashboards should provide clear and detailed visualizations of various metrics to ensure efficient tracking and management. Additionally, implement unified alerting systems to promptly notify relevant stakeholders of any service outages or performance incidents.
CouchDB is a document-oriented NoSQL database used as a metadata storage solution for deploying Java-based applications into Kubernetes clusters. This project focused on enhancing the existing CouchDB deployment by integrating advanced features such as connecting with CERN single sign-on (SSO), developing granular access management, enabling data replication between databases, schema validation, and clustering.
Gathering, processing and storing logs out of operations inside our clusters is vital for debugging, diagnostics and security reasons.
In that context, we will discuss about how the log collection inside of CERN OKD Kubernetes clusters works, the challenges the system currently faces and possible alternatives. We discuss the research done on FluentD, Vector, and FluentBit, but also Operators, DaemonSets and architectural approaches. We present a small scale implementation and a proof of concept that can be leveraged for real production use at CERN.
Detecting anomalies in the performance of WLCG compute nodes is challenging and has been addressed in various ways over the years. Recently, the HEPiX Benchmarking Group introduced a novel approach using the HEP Benchmark Suite to validate WLCG compute node performance in relation to fabric metrics. By running HEPSuite on the grid, a large dataset of performance metrics has been collected. Given the volume of metrics, advanced machine learning techniques can enhance analysis. The project aims to assess how these features impact job slot performance. Project delve into benchmarking computing resources, utilization of HEPscore23, and developing a machine learning models for validating metrics, parameter correlation analysis and anomaly detection.
High-performance computing (HPC) storage systems are critical to the efficient execution of high-energy physics (HEP) analyses, which involve processing vast amounts of data generated by particle accelerators like the Large Hadron Collider (LHC). This project evaluates various HPC storage solutions in terms of their performance, scalability, and suitability for HEP data processing workflows. We examine traditional parallel file systems, emerging object storage technologies, and hybrid approaches to identify their strengths and limitations. Performance metrics such as I/O throughput, latency, and data integrity are assessed using real-world HEP analysis scenarios.
Koji is a system for building and tracking binary RPMs, Cloud and Docker images. This presentation will cover the improvement of the pipeline developed by the Linux team to automate and verify the most common Koji operations for every new release, ensuring smooth and reliable upgrades. It will also cover another project, the Opensearch dashboards, which were moved to Grafana due to index changes.
This presentation explores innovative solutions to enhance the data processing capabilities of the CMS Level-1 Scouting System. With the High Luminosity LHC upgrade and the increase in data volumes, advanced techniques for trigger events selection and data processing are required. For this reason, we are evaluating Compute Express Link (CXL) enabled devices, focusing on latency and bandwidth measurements and addressing memory management challenges. By combining CXL cache coherency and the Fabric-attached memory file system (Famfs), we can provide disaggregated shared files and multi-host memory management. The integration of these technologies is ongoing within the SCDAQ software for demonstration purposes.
CERN's VRE (Virtual Research Environment) is an analysis facility that allows users to use a distributed data lake infrastructure (RUCIO) together with a distributed software service (CVMFS) to run their scripts and workflows on a reproducibility-oriented scientific analysis platform (REANA). Having the right middleware in place would simplify the use of these services by scientific users, allowing them to focus on their scientific projects instead of dealing with increasingly complex IT infrastructures. In this project, we will develop a JupyterLab extension that connects the REANA-VRE instance with the VRE Jupyter frontend through the REANA API. The extension would allow connecting to the user's REANA account, displaying their workflows as well as retrieving, interacting and bringing to the Jupyter frontend file system the data and results of the different REANA workflows.
This project aims to create a comprehensive digital twin of CERN's Large Hadron Collider (LHC) using NVIDIA's Omniverse platform. By leveraging advanced 3D modeling, real-time simulation, and collaborative tools, we will develop a highly accurate virtual representation of the world's largest and most powerful particle accelerator. This digital twin will enable researchers, engineers, and scientists to visualize, analyze, and interact with the LHC in a virtual environment, facilitating enhanced understanding, improved operational efficiency, and accelerated scientific discovery. The project will showcase the potential of cutting-edge visualization technology in advancing particle physics research and complex machine operations.
The LHCb collaboration is currently using a pioneer data filtering system in the trigger system, based on real-time particle reconstruction in Graphics Processing Units (GPUs). This corresponds to processing 5 TB/s of information and has required a huge amount of hardware and software development. Among them, the corresponding power consumption and sustainability are imperative matters in view of the next high luminosity era for the LHC collider, which will largely increase the output data rate. In this talk, we show some of the proposals that can be considered to optimize energy usage in terms of the computing architectures and the efficiency of the algorithms running on them.
The extensive program of high-energy physics experiments relies on Monte Carlo simulation. Simulation can be used for testing hypotheses of the underlying distribution of the data. The need for fast and large-scale simulated samples for HEP experiments motivates the development of new simulation techniques, particularly those based on neural network models. Machine-learning models are studied mainly in the context of fast shower simulation, as calorimeters are typically the most time-consuming detectors to simulate. Both accuracy and performance need to be validated before their implementation in production software. The project will focus on improving the accuracy of the fast shower simulation, studying the difference between different machine-learning solutions focusing on the low energy particle showers.
The study uses the Run-3 data scouting demonstrator to refine muon track reconstruction in the barrel region at the Level-1 trigger. By applying machine learning algorithms designed for FPGA deployment, the study improves online reconstruction of muon parameters using stub-only data.