28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)
Chulalongkorn University

Updates:
| 5 April | + Given the current travel situation, we are happy to extend the early-bird rate until the end of the registration period. We hope this helps you save some costs. + Invitation and visa support letters have currently been issued up to registration ID 554. If you have not yet received your letter, or if you require any additional documents, please contact the LOC by email.
|
| 2 April |
The timetable is now available. |
| 30 March |
The early-bird registration has been extended to April 5, 23.59 BKK. |
| 28 March |
+ To help you prepare the contribution, guidelines are available in Practical Information >> Prepare your contributions. + Share your plan before/during/after the conference in Information >> Share your plan + If your payment has been made via TPP or bank transfer, please note that it may take at least one week (or longer) for us to receive confirmation. We currently request status updates from the bank twice a week. Please do not be concerned if your transfer has been completed, but the status has not yet been updated. If any issue occurs, the funds will be returned to your account. We will also honor the early-bird rate should you need to retry the payment. |
| 25 March |
+ Payment with card: Status update for people who paid by March 25, 2026, 12:14 AM (BKK). If your payment is successful but registration is pending, please contact LOC. + Even if you (or your management team) make the payment directly via TPP or bank transfer, please ensure that you upload the payment slip or a screenshot in the payment portal. Selecting “Bank Transfer” will guide you to the page where you can upload the supporting document. |
| 24 March |
Booking ticket with Thai Airways with special offers here. |
| 23 March |
+ Invitation letter + visa support letter: Issue upto #Reg ID 248. The remaining by this week.
+ For CERN, you may pay with TPP (Third Party Payment) using
|
| 9 March |
All CHEP emails and contact names are available here (or under contact on the left menu). We are starting to send out the invitation letter. For the invoice or any special document, please contact LOC (chep2026-info[-AT-]chula.ac.th). It is important to note that if you are not listed on any abstracts and don't use an institutional email address (University or research institute), the LOC can't issue you an invitation letter. Regarding the visa application letter, we will provide it only for participants whose nationalities are not eligible for visa exemption when entering Thailand. Please check the details at the Thai Embassy in your country or the visa information page here. The LOC is closely monitoring the situation in the Middle East. For participants traveling from EU countries, we recommend taking a direct flight from a major European city to Bangkok. For those traveling from the United States, you may choose to fly either over the Pacific (maybe with a connection in Japan, China, Taiwan, or Korea) or over the Atlantic (with a connection in a major European city). |
The Conference on Computing in High Energy and Nuclear Physics (CHEP) is a long-standing conference series that began in 1985. It addresses computing, networking, and software challenges for the world’s leading data-intensive scientific experiments, which currently analyze hundreds of petabytes of data using globally distributed resources. CHEP offers a unique platform for particle and nuclear physics computing experts to come together, share experiences, and learn from one another. The conference typically attracts over 500 participants from around the world.
The focus of CHEP evolves over time to reflect advances in technology and the changing needs of scientific research. In 2026, CHEP will take place during the final week of Run 3 operations at the LHC. As the community prepares for the High-Luminosity LHC (HL-LHC), the conference will cover a wide range of emerging topics in both infrastructure and software. Reflecting broader trends, CHEP 2026 will also include sessions on sustainable computing, such as green data centers, energy consumption, and the role of AI/ML in operations.
Although the name CHEP refers to High Energy and Nuclear Physics, the conference welcomes contributions from a wide range of data-intensive scientific disciplines. It provides an excellent opportunity for interdisciplinary exchange, including -but not limited to- topics in big data applications for astronomy, biology, medicine, and quantum computing. Experiences and contributions from High Performance Computing (HPC) centers are also highly welcomed, especially where they intersect with challenges in large-scale data processing, simulation, and infrastructure design.
The conference program features plenary talks, parallel sessions, and poster presentations. Peer-reviewed proceedings are published following the event. The nine parallel session tracks are designed to encourage deep technical discussions and community engagement across specific domains:
- Track 1 – Data and Metadata Organization, Management, and Access
- Track 2 – Online and Real-time Computing
- Track 3 – Offline Data Processing
- Track 4 – Distributed Computing
- Track 5 – Event Generation and Simulation
- Track 6 – Software Environment and Maintainability
- Track 7 – Computing Infrastructure and Sustainability
- Track 8 – Analysis Infrastructure, Outreach, and Education
- Track 9 – Analysis Software and Workflows
The CHEP 2026 organizers are committed to fostering a supportive, inclusive, and diverse environment. We warmly welcome full participation from the entire community and strongly encourage students, early-career researchers, and underrepresented groups to attend and contribute.
CHEP 2026 will be hosted by Chulalongkorn University in Bangkok, Thailand, from 25 to 29 May 2026.
-
-
Opening Ceremony
-
1
WelcomeSpeaker: Phat Srimanobhas (Chulalongkorn University (TH))
-
2
HEP and scientific computing future
The high-energy physics (HEP) community is preparing to address the computing challenges of the coming decade. The upgrade program of the Large Hadron Collider at CERN (HL-LHC) will generate an unprecedented volume and complexity of data, requiring advanced solutions for processing, analysis, archiving, and simulation. In parallel, other HEP experiments, such as DUNE, will enter their data-taking phase with novel workflows that demand substantial computing support. The recent update of the European Strategy for Particle Physics recommends a circular electron–positron collider (FCC-ee) at the TeV scale as the next flagship of the CERN scientific program following the HL-LHC. The community is assessing the computing landscape needed for the generation of HEP projects such as FCC-ee. This contribution provides an overview of the current state of the art in HEP computing, outlines the steps required to meet the HL-LHC computing challenges, and offers a forward-looking perspective on the post–HL-LHC era. It also highlights common computing challenges shared with other scientific domains and explores potential synergies.
Speaker: Simone Campana (CERN) -
3
From quantum computing and quantum algorithms to high energy physics
Quantum hardware has made striking progress, and I will open with a brief theorist’s snapshot of where today’s devices stand: what current qubit platforms can do reliably and what the roadmaps of leading providers suggest for the next few years. The central theme of the talk, however, is the field’s biggest open challenge: finding compelling uses—problems where quantum devices can produce real scientific value with realistic resources. To organize that search, I will highlight a small set of directions that look most promising. These include simulating quantum systems in real time (with an eye toward lattice models and gauge theories), computing electronic structure for molecules and materials, improving measurement and sensing through quantum techniques, and a few carefully chosen machine-learning tasks where quantum methods might help with sampling or representation. A key message will be that progress depends not only on new algorithms, but also on “translation”: turning ideas into complete workflows that fit the constraints of actual devices, use data in practical ways, and can be tested fairly against the best classical approaches. I will close by connecting these themes to high-energy physics, outlining where quantum computing could complement existing tools and, just as importantly, how high-energy physics can help steer the field by providing hard benchmarks and realistic problem settings.
Speaker: Zoe Holmes (EPFL)
-
1
-
4
Your Survival Guide to CHEP 2026
-
10:40
Break
-
Plenary
-
5
Q/A for opening plenary session talks
-
6
From Global HPC Trends to National Impact: The Role of HPC in Thailand’s Research Ecosystem
High Performance Computing (HPC) has long been a cornerstone of large-scale scientific discovery. Today, its role is evolving beyond traditional simulation-driven workloads toward a broader paradigm that integrates data-intensive computing and also artificial intelligence, particularly large language models (LLMs). This transformation is reshaping how HPC systems are designed and deployed.
This talk offers a high-level overview of the current HPC landscape, focusing on system-level integration that brings together traditional HPC with modern technologies such as AI and quantum computing. While looking ahead to future directions, the talk also provides insights into the current landscape in Thailand, with particular focus on ThaiSC as a national supercomputing infrastructure provider, highlighting real-world examples and usage patterns.
Selected use cases within Thailand illustrate how HPC supports scientific research and emerging applications in AI and industry. These examples reflect a transition of HPC from a specialized research tool to a foundational platform for research and innovation. HPC is an essential infrastructure for the future of research across disciplines.
Speaker: Dr Krich Nasingkun (Thailand Supercomputer Center (ThaiSC)) -
7
Streaming Data Acquisition for Nuclear Physics Experiments: Standardization Activities of the SPADI Alliance
The rapid evolution of detector technologies and increasing beam intensities in nuclear physics experiments are driving a paradigm shift in data acquisition (DAQ) systems, from conventional trigger-based schemes to streaming-readout architectures. Challenges associated with trigger generation in complex detector systems, as well as the growing data throughput and trigger rates, are becoming increasingly common across nuclear physics facilities.
Streaming readout provides an effective solution to these challenges by enabling scalable, trigger-less data acquisition systems adaptable from small- to large-scale experiments.
The SPADI Alliance (Signal Processing and Data Acquisition Infrastructure Alliance) aims to develop a common streaming DAQ system, establish it as a de facto standard, and sustain a long-term development and maintenance framework. The development scope spans front-end electronics, precise time synchronization, data transfer protocols, DAQ software, data processing frameworks, accelerator-assisted fast processing, computing farms, and user interfaces.
The developed system has already been deployed in physics experiments at RCNP, as well as in beam tests at J-PARC and RARiS.
In this paper, we present the architecture and implementation of the streaming-readout system developed within the SPADI Alliance and discuss future development plans and perspectives.
Speaker: Shinsuke Ota (RCNP, Osaka University)
-
5
-
12:30
Lunch
-
Track 1 - Data and metadata organization, management and access: Tape storage, archival and long-term preservation
-
8
Fermilab's migration to CTA
In 2025, Fermilab transitioned from its legacy tape storage management software, Enstore, to CTA (CERN Tape Archive).
The replacement system was adapted to satisfy Fermilab use cases, including the ability to read existing data off of Enstore formatted tapes. The new system also includes the ability to read aggregated files from containers, which were managed by Enstore, to maintain good performance with smaller files (under 10 MB). We will detail the developments needed for reading the legacy tapes and small file containers.
We will describe the testing and evaluation process which gave us confidence CTA could replace Enstore. This test was also used to choose between EOS and dCache as the front end file system. Our experience with CTA, including provisioning and performance metrics, will be detailed.
Speaker: Eric Vaandering (Fermi National Accelerator Lab. (US)) -
9
The Next Generation of ATLAS Data Carousel: Architecture, Performance, and Refactoring
In the current ATLAS Distributed Computing model, available disk capacity is insufficient to store even a single complete copy of all data actively in use. Consequently, tape systems serve not only as long-term backups but also as primary data sources. Efficient utilization of tapes at the ATLAS scale requires specialized orchestration mechanisms, as tape access is inherently slower and operationally more complex than disk access. Once data are staged from tape, they must be efficiently shared among all sites requiring them and, when likely to be reused, temporarily retained on disk to avoid redundant recalls. To address these challenges, the Data Carousel system was developed to coordinate large-scale tape staging across the distributed infrastructure. Its core functionality includes automated creation, sharing, retention, and deletion of staging rules based on dataset usage; dynamic staging profiles to balance tape load; dashboards and alert mechanisms for real-time monitoring; and both manual and automated recovery procedures for common tape issues and downtimes. In this paper, we describe the overall architecture of the Data Carousel, provide detailed usage statistics, and present a recent comprehensive refactoring of the system that significantly expands its scope. The refactored implementation integrates more closely with other Distributed Data Management activities, improves scalability and reliability, and prepares the system for future challenges of Run 4 and the HL-LHC era.
Speakers: Fernando Harald Barreiro Megino (University of Texas at Arlington), Misha Borodin (University of Texas at Arlington (US)) -
10
Analysis of Tape Archival Metadata Based on ATLAS Run3 Recall History
The High Luminosity upgrade to the LHC (HL-LHC) is expected to generate scientific data on the scale of the multiple exabytes. To address this unprecedented data storage challenge, the ATLAS experiment launched the Data Carousel project in 2018, which entered production in 2020. In the Data Carousel workflow, jobs receive input data from tapes seamlessly for user payloads. It represents a fundamental shift from the traditional archival-only model toward a production system executing tens of thousands of tape recalls across multiple sites on a daily basis. A key challenge in the Data Carousel model is how to achieve high tape bandwidth utilization during recall operations, through sustained stream reads and optimized tape mounts. This requires intelligent grouping of files that are likely to be recalled together, so called “smart writing”. To implement smart writing, sites depend on archival metadata provided by the experiment to supply grouping hints. In this paper, we present our recent analysis of tape archival metadata using the ATLAS Run3 recall history, highlighting patterns and correlations that can inform future data-placement and grouping strategies.
Speaker: Xin Zhao (Brookhaven National Laboratory (US)) -
11
Intelligent Orchestration of Petabyte-Scale Data Staging for Physics Workflows
The ALICE detector at the CERN LHC generates petabyte-scale raw datasets during heavy-ion collision runs, which must undergo a multi-stage offline reconstruction cycle. EOSALICEO2 serves as the primary high-performance disk buffer for ALICE operations, both during data taking and data processing, providing the sustained throughput necessary for large-scale parallel reconstruction workflows. These workflows require an aggregate read throughput of approximately 200 GB/s to support ~100,000 parallel CPU cores, a performance level achievable only through the EOSALICEO2 disk buffer. However, existing tape-based archival systems, distributed across one T0 site (5 PB buffer) and six T1 sites (200–870 TB buffers each), lack sufficient individual buffer and throughput capacity for the amount of data necessary for one reconstruction cycle. This distributed architecture, where data is replicated across sites for redundancy, renders large-scale asynchronous reconstruction infeasible without dynamic staging capabilities to coordinate data recall across multiple custodial systems.
This contribution presents a centralized data staging service designed to address this critical bottleneck by automating the recall and transfer of raw data files from custodial tape storage to EOSALICEO2. The system features a web-based operator interface for submitting staging requests, intelligent batch sizing adapted to each tape system's buffer capacity, and comprehensive state management with multi-level retry mechanisms. Performance analysis demonstrates that the combined tape infrastructure provides an aggregate throughput of 6.54 GB/s. ALICE's largest datasets originate from Pb–Pb collision runs, with data-taking periods producing approximately 70 PB of raw data. With individual tape buffers requiring 11–20 days to fill and complete period staging spanning 5-6 months, the system implements a 1-month retry window for tape recalls to handle buffer contention and up to 10 attempts per file transfer to ensure reliable operations across this extended timeline. By enabling reliable petabyte-scale staging, this system makes large-scale asynchronous reconstruction workflows operationally feasible for the first time. Integration with existing data management tools supports a continuous workflow cycle where reconstructed data can be automatically removed from EOSALICEO2 to free buffer space for staging additional data. The system, however, is designed generically and can target alternative storage backends if those endpoints meet similar processing requirements.
Speaker: Alice-Florenta Suiu (National University of Science and Technology POLITEHNICA Bucharest (RO)) -
12
Preserving the Web-Based Interfaces of the RHIC/STAR Experiment: A Low-Maintenance Archival Strategy
For 25 years, the STAR experiment at the Relativistic Heavy Ion Collider (RHIC) has accumulated a significant archive of metadata, supported by extensive web-based tools. As the collaboration transitions into a "long-term preservation phase," a key priority is ensuring sustained access to these critical web interfaces in a self-contained and maintenance-free format. Preserving essential metadata—such as the conditions database, run log, and associated configuration parameters—is vital for accurately tagging and contextualizing the petabytes of collected scientific data. This preservation effort is essential for enabling future physics analyses and simulations throughout the next decade.
To achieve this, we explored various web archival strategies, including the use of headless browsers and hybrid solutions, with the goal of effectively capturing the full functionality and appearance of these interfaces. This presentation will detail and compare the advantages and disadvantages of our exploratory methods. Furthermore, we will offer recommendations deemed crucial for ongoing or future experiments facing similar complexities in long-term data and metadata preservation.
Speaker: Ankush Reddy Kanuganti (Brookhaven National Laboratory)
-
8
-
Track 2 - Online and real-time computing
-
13
The ATLAS Trigger System
The ATLAS experiment in the LHC Run 3 uses a two-level trigger system to select
events of interest to reduce the 40 MHz bunch crossing rate to a recorded rate
of up to 3 kHz of fully-built physics events. The trigger system is composed of
a hardware based Level-1 trigger and a software based High Level Trigger.
The selection of events by the High Level Trigger is based on a wide variety of
reconstructed objects, including leptons, photons, jets, b-jets, missing
transverse energy, and B-hadrons in order to cover the full range of the ATLAS
physics programme.
We will present an overview of improvements in the reconstruction, calibration,
and performance of the different trigger objects, as well as computational
performance of the High Level Trigger system.Speaker: ATLAS Collaboration -
14
Development of Data Scouting program in Run 3 at the CMS experiment
Pioneered by CMS in Run 1, the “data scouting” technique has helped found a now-established trend in the LHC experiments. Implemented during Run 2, LHCb and ATLAS collaborations have “turbo” and “trigger-level analysis” streams, respectively.
The “data scouting” technique overcomes the limitations of the conventional data processing strategies with nonstandard uses of trigger and data acquisition (DAQ). For a constrained transfer bandwidth, by reducing the event size, the data recording rate can be increased. By exploiting reconstruction at the high level trigger (HLT), specialised scouting streams record only the high level information of the collisions, significant smaller compared to the completed readouts of the detector in the conventional strategy. The allowed increased rate can be allocated to lower the trigger thresholds, accessing the phase space inaccessible or inefficient by the standard triggers. Many physics analyses, including low mass dijet searches and the discovery of the rate η → 4 μ, have been performed using scouting datasets collected during Run 2.
CMS substantially expands its implementation of the scouting program in Run 3, with the heterogeneous GPU-CPU architecture of the HLT farms being the main enabler. Run 3 data scouting streams reaches an order of magnitude higher than the standard strategy and covers larger event content, providing a more complete picture of the collisions and opening new physics possibilities. This talk will focus on the latest development of CMS data scouting program throughout Run3 and explore its physics potentials.
Speaker: Patin Inkaew (Helsinki Institute of Physics (FI)) -
15
A Hyperparameter Optimization Framework for Preparing ML Models for Real-Time Trigger Deployment
Machine learning models used in real-time and resource-constrained environments, such as hardware triggers, online reconstruction pipelines, and FPGA/GPU inference systems, must satisfy strict latency, memory, and numerical precision requirements. Achieving these targets typically requires extensive tuning of training schedules, quantization settings, sparsity levels, and architectural parameters. In current workflows, this optimization process is often manual and difficult to reproduce, especially when multiple objectives (e.g., accuracy, latency, and on-chip footprint) must be balanced simultaneously.
To address this challenge, we introduce a new hyperparameter optimization (HPO) platform within the PQuantML library, developed as part of the Next-Generation Trigger (NGT) project. The platform provides an integrated framework for automated exploration of compression parameters and fine-tuning strategies, built on Optuna for adaptive sampling and MLflow for experiment tracking. Users define search spaces and evaluation metrics through configuration files, enabling large-scale optimization experiments without modifying model code. The system supports Bayesian/TPE sampling, early-pruning strategies, and multi-objective optimization, allowing the search to target both physics performance metrics and hardware-level constraints.
The module is designed for distributed execution on the NGT cluster, enabling hundreds of parallel trials to evaluate trade-offs between accuracy, sparsity, bit precision, and latency. We demonstrate the framework on representative convolutional and classifier models used in real-time ML studies, showing how automated optimization systematically identifies the best configurations that meet HL-LHC latency and resource budgets while maintaining high physics performance. This integrated HPO capability strengthens PQuantML as a toolchain for preparing deployable ML models and provides a reproducible workflow for tuning models destined for the NGT and online computing systems.Speaker: Anastasiia Petrovych (CERN) -
16
An End-to-End, Unified Workflow for Sub-Microsecond Inference on FPGAs
Real-time inference with sub-microsecond latency is critical for the Level-1 trigger systems at the High-Luminosity LHC. We present an end-to-end, open-source framework that spans model optimization, quantization, and FPGA deployment, enabling the translation of high-level neural network or generic dataflow models into resource-efficient FPGA implementations.
Within the workflow, we introduce High-Granularity Quantization (HGQ), a quantization framework that simultaneously optimizes the model's resource utilization and accuracy through quantization-aware training with differentiable bitwidths, all with native Keras-like training speeds. The framework supports both conventional matmul-based neural network architectures, ranging from classical dense operations to multi-head attention blocks, as well as fabric-native architectures that map efficiently to FPGA Look-Up Table (LUT) primitives. Users can freely use either architecture or combine both in a single model to achieve optimal trade-offs between accuracy, resource usage, and latency.
On the backend, we present
da4ml, an HLS compiler that optimizes and converts unrolled static dataflow graphs, such as machine learning models for L1T, into RTL firmware in either Verilog or VHDL. Specifically, the framework can optimize constant-matrix-vector multiplication (CMVM) operations into efficient adder graphs, enabling DSP-free implementations for a wide range of models. The package also provides a compilation-free precise resource surrogate and bit-exact emulation of the compiled models via a C++ based interpreter, allowing for rapid design space exploration and model validation.To facilitate adoption, the
HGQandda4mlpackages are designed with user-friendly APIs that integrate seamlessly together. Furthermore, these packages can interface directly withhls4ml, allowing users to leverage the strengths of all three frameworks and utilize existing workflows without friction.Speaker: Chang Sun (California Institute of Technology (US)) -
17
AIE4ML: Leveraging Versal AI Engines to Enable More Expressive Real-Time ML Models for Next-Generation Trigger Systems
Modern particle-physics experiments increasingly rely on machine learning (ML) to perform real-time data reduction under the extreme conditions of the High-Luminosity LHC (HL-LHC). Hardware-trigger inference must satisfy microsecond-level latency, deterministic execution, and tight on-chip memory constraints. FPGA-based deployments can meet these requirements for small, highly parallelized models. However, scaling to deeper or wider architectures remains challenging due to resource limitations, manual design effort, and the lack of automated compilation flows. Frameworks such as hls4ml have enabled compact neural-network deployments in current trigger systems but also illustrate the challenge of supporting larger and more expressive models using conventional FPGA fabrics. In this work, we introduce AIE4ML, a compilation and optimization framework designed to seamlessly use the AI Engine arrays (AIE-ML/AIE-MLv2) of AMD Versal devices for low-latency ML inference. As part of ongoing Next-Generation Trigger (NGT) R&D efforts, AIE4ML extends the hls4ml ecosystem with support for the Versal AI Engines. These devices provide a particularly interesting architectural compromise between FPGA and GPU platforms as they offer a deterministic VLIW-SIMD architecture that allows compile-time (static) instruction scheduling and software-managed local memory. The AIE architecture is well matched to several classes of low-latency ML models of interest to HEP, including-but not limited to those which be expressed as structured collections of matrix–vector or matrix–matrix operations, including components of particle-flow networks, MLP-Mixers, and trigger-oriented classifiers. For demonstration, we evaluate quantized models imported from high-level frameworks, focusing on linear submodules (i.e., extracted from MLP-Mixer–style architectures), and we showcase throughput comparable to GPUs while respecting HL-LHC-scale latency budgets. Compared to FPGA implementations of similar models on large dense workloads, AIE4ML achieves order-of-magnitude performance gains, reaching up to a ~13× speed-up in some cases. This suggests that Versal AI Engines may enable more expressive and computationally intensive ML models in future real-time trigger systems.
Speaker: Dimitrios Danopoulos (CERN)
-
13
-
Track 3 - Offline data processing: Calorimeters and Particle ID
-
18
RICH ring reconstruction based on Graph Neural Networks for the CBM experiment
The Compressed Baryonic Matter experiment (CBM) at FAIR is designed to explore the QCD phase diagram at high baryon densities with interaction rates up to 10 MHz using triggerless free-streaming data acquisition. The CBM Ring Imaging Cherenkov detector (RICH) contributes to the overall PID by identification of electrons from the lowest momenta up to 6-8 GeV/c, with a pion suppression factor of more than 100. The RICH reconstruction combines a standalone (trackless) Cherenkov ring-finding with a ring-track matching of extrapolated tracks from the Silicon Tracking System (STS) by closest distance.
The ring reconstruction is particularly challenging due to high ring multiplicity regions, smeared ring structures and varying radii & hits per ring. Hence, an alternative pattern-aware ring-finding approach based on a graph neural network is investigated for the CBM RICH. The end-to-end pipeline performs ring instance reconstruction using 2+1 dimensional information of hits (2D position and time) as input. In addition to ring reconstruction, noise classification is included as an auxiliary downstream task.
Speaker: Martin Beyer -
19
TICL: The Iterative CLustering Framework for the CMS Phase-2 Event Reconstruction
The increase in luminosity and pileup at the High-Luminosity LHC (HL-LHC) will place unprecedented demands on the CMS experiment, requiring major advances in both detector technology and event reconstruction. Among the planned upgrades, the High-Granularity Calorimeter (HGCAL) will replace the current endcap calorimeters, providing fine spatial segmentation and precision timing. These features dramatically enhance physics capabilities but also introduce substantial reconstruction challenges due to the large number of hits per event and the stringent computing requirements. To address this, CMS is developing TICL (The Iterative CLustering), a modular and heterogeneous reconstruction framework integrated into the CMS software.
TICL is designed for parallelism and portability across heterogeneous architectures through the alpaka abstraction layer. It performs calorimetric pattern recognition through a sequence of clustering and linking stages that reduce several hundred thousand energy deposits to a compact set of high-quality particle candidates. The 2D clustering on each HGCAL layer and the 3D clustering across layers are heterogeneous, with the latter that prioritizes purity and therefore produces fragmented shower compoments. To recover these fragments, TICL employs multiple linking algorithms tailored to different shower types: a superclustering plugin optimized for electromagnetic showers and a geometric and topological linking for hadronic ones. Recently, a graph neural network approach has been introduced to further improve hadronic shower linking, enhancing completeness in dense environments. The linking between charged-particle tracks and calorimetric showers has also been refined: the current one is based on geometric compatibility, with requirements on time and energy, while the ongoing work targets a many-to-many linking strategy for a more coherent global event interpretation.
Beyond the endcaps, TICL is being extended to the barrel calorimeters, enabling a unified reconstruction approach across the entire CMS calorimeter system for the HL-LHC era.
This contribution will present the design principles of TICL, highlight the recent developments and performance studies. The results demonstrate the scalability and robustness of the framework and its readiness to meet the challenges of the forthcoming HL-LHC operations.
Speaker: Wahid Redjeb (CERN) -
20
GNN-based end-to-end reconstruction in the CMS Phase 2 High-Granularity Calorimeter
We present the first application of a one-pass, machine learning based imaging calorimeter reconstruction approach to the latest full CMS High Granularity Calorimeter (HGCAL) simulation. The model is a Graph Neural Network that directly processes the hits in the HGCAL, one of the most important upgrades of the Compact Muon Solenoid detector in preparation for the High-Luminosity phase of the Large Hadron Collider planned to begin operations in 2030. The network is trained to group hits originating from the same incident particle by assigning them to a common cluster. The accuracy of the reconstruction is evaluated through physics-inspired metrics that quantify how accurately the properties of individual particles are measured. The algorithm is studied using simulations of different particle types in HGCAL and its performance is tested in single-particle environments.
Speaker: Jose Daniel Gaytan Villarreal (Carnegie-Mellon University (US)) -
21
dN/dx reconstruction with deep learning for high-granularity TPCs
Particle identification (PID) is essential for future particle physics experiments such as the Circular Electron-Positron Collider and the Future Circular Collider. A high-granularity Time Projection Chamber (TPC) not only provides precise tracking but also enables dN/dx measurements for PID. The dN/dx method estimates the number of primary ionization electrons, offering significant improvements in PID performance. However, accurate reconstruction remains a major challenge for this approach. In this presentation, we introduce a deep learning model, the Graph Point Transformer (GraphPT), for dN/dx reconstruction. In our approach, TPC data are represented as point clouds. The network backbone adopts a U-Net architecture built upon graph neural networks, incorporating an attention mechanism for node aggregation specifically optimized for point cloud processing. The proposed GraphPT model surpasses the traditional truncated mean method in PID performance. In particular, the $K/\pi$ separation power improves by approximately 10% to 20% in the momentum interval from 5 to 20 GeV/$c$. (arXiv:2510.10628)
Speaker: Dr Guang Zhao (Institute of High Energy Physics (CAS)) -
22
Using Graph Neural Networks for the segmentation of overlapping objects in high granularity calorimeters
One of the major difficulties of particle reconstruction in calorimeters is the case of overlapping objects in the detector. This problem will become particularly concerning at the High-Luminosity LHC, where the increased luminosity will cause high levels of pile-up. High-granularity calorimeters, such as the future HGCal in the CMS endcap, allow us to perform Particle Flow (PF) reconstruction on particles with overlapping showers in the calorimeter. This task requires new algorithms that can adequately exploit the granular properties of future calorimeters.
We propose a Graph Neural Network architecture for a segmentation block that can split overlapping showers produced by two distinct electrons in a high-granularity electromagnetic calorimeter, and reconstruct the individual showers from each electron. In order to do so, it predicts, for each node, the fraction of its energy attributed to each individual shower. We introduce the optimisation work that was done on the model, in particular on the graph construction and convolution operations. We also show the separation efficiency of the model and demonstrate that it is at the state of the art, with significantly reduced resource consumption.Speaker: Matthieu Martin Melennec (Centre National de la Recherche Scientifique (FR))
-
18
-
Track 4 - Distributed computing
-
23
The Many Faces of Authentication and Authorisation at CERN
Identity and Access Management (IAM) in a large scale research collaboration typically serves both organisational and distributed community needs. CERN operates at this intersection, balancing local institutional requirements with those of a worldwide ecosystem of scientific partners.
This presentation will outline the evolution of CERN’s Single Sign-On platform (based on Keycloak) and the parallel development of dedicated token issuers (using INDIGO IAM) that support the WLCG token transition strategy. It will put the distinct systems in context and highlight observed trends in the adoption of token-based authentication and authorisation.
CERN’s inclusion as a foundational node in the European Open Science Cloud (EOSC) marks an important step towards deeper integration between Research Authentication and Authorisation Infrastructures (AAIs). Together with partners across Europe and beyond, we are contributing to the establishment of cross-AAI trust frameworks, breaking out of the current hierarchical model and spearheading the adoption of OpenID Federation.
This work builds on many years of collaboration through the Federated Identity Management for Research (FIM4R) initiative and the EC-funded AARC (Authentication and Authorisation for Research and Collaboration) projects. The shared investment in technical standards and policy frameworks is now reaching maturity: in 2026 we expect to see its true value as Research Collaborations begin to trust each other’s token issuers in practice.
This contribution is submitted on behalf of CERN’s Identity and Access Management (IT-PW-IAM) team in collaboration with the WLCG Authorisation Working Group, CERN’s EOSC Task Force and the AARC TREE project.
Speaker: Berk Balci (CERN) -
24
Deployment of site-focused security event detection capabilities
The risk of cyber attack against members of the research and education sector remains persistently high, with several recent high visibility incidents including a well-reported ransomware attack against the British Library. As reported previously, we must work collaboratively to defend our community against such attacks, notably through the active use of threat intelligence shared with trusted partners both within and beyond our sector.
We discuss the development of capabilities to defend sites across the WLCG and other research and education infrastructures, with a particular focus on sites other than Tier1s which may have fewer resources available to implement full-scale security operations processes. These capabilities include a discussion of a pilot deployment of the Unicor software, a development of the previously reported pDNSSOC, which enables lightweight and flexible correlation of DNS logs with threat intelligence and subsequent contextual alerting. We also report on other existing deployments of Unicor in other environments.
Defending as a community requires a strategy that brings people, processes and technology together. We suggest approaches to support organisations and their computing facilities to defend against a wide range of threat actors. While a robust technology stack plays a significant role, it must be guided and managed by processes that make their cybersecurity strategy fit their environment.
Speakers: David Crooks, Dr David Crooks (UKRI STFC) -
25
Hardening and Federating INDIGO IAM for Secure and Interoperable Research Infrastructures
INDIGO IAM is an Identity and Access Management service providing authentication and authorization across distributed research infrastructures. It is a Spring Boot application relying on OAuth/OpenID Connect (OIDC) technologies and is currently evolving to meet increasingly stringent requirements in terms of security, interoperability and observability.
A key aspect is the progressive hardening of the platform, including the migration from the no longer maintained MITREid Connect library to the modern Spring Authorization Server (SAS). This transition strengthens security and reliability while enabling improved scalability, modularity and tighter integration with the Spring ecosystem.
Further security measures include support for client-bound access tokens and stronger adoption of Multi-Factor Authentication (MFA), which can now be enforced via configuration. OAuth client secrets are never stored in clear text but are securely hashed before being persisted, reducing the impact of potential data breaches. In addition, access tokens are no longer stored in the database, reducing overhead and improving performance during authentication workflows.
Operational usability and observability are being enhanced through a new Web dashboard that simplifies service management and decouples user-facing functionality from core services. At the same time, a proof of concept based on OpenTelemetry is being developed to enable better monitoring, tracing and troubleshooting.
In parallel, INDIGO IAM is integrating OpenID Federation to strengthen interoperability and federated identity capabilities. This allows the dynamic establishment of trust relationships between Identity Providers and Relying Parties based on shared Trust Anchors, addressing the limitations of static onboarding in heterogeneous, multi-community research environments.Speaker: Francesco Giacomini (INFN CNAF) -
26
SSH with federated identities
Traditional SSH key-based authentication presents significant scalability
and security challenges in modern federated research environments,
particularly regarding key distribution, lifecycle management, and access
revocation. This paper presents ssh-oidc, a novel approach that integrates
OpenID Connect (OIDC) authentication with SSH certificate-based access
control for scientific computing infrastructures. The system replaces
permanent SSH keys with time-limited certificates issued by an online
Certificate Authority (CA) that validates OIDC tokens from federated
identity providers.Our implementation leverages three key components: motley-cue for identity
mapping and user provisioning, oinit as an online CA for automated
certificate issuance, and oidc-agent (or similar) for token management. The
system enables fine-grained authorisation through OIDC claims including
institutional affiliation, project entitlements, and identity assurance
levels, allowing differentiated access policies for various user
categories and security requirements.Evaluation in research environments demonstrates significant
administrative overhead reduction while maintaining security through
centralised access control and automatic credential lifecycle management.
The approach integrates seamlessly with existing federated identity
infrastructures including eduGAIN and institutional identity providers,
enabling cross-institutional collaboration without compromising security
or requiring extensive infrastructure modifications. This solution
addresses critical authentication challenges in contemporary distributed
research computing environments.Speaker: Dr Marcus Hardt (KIT) -
27
A workflow based approach to risk assessment and analysis arising from token based authentication within the WLCG
The migration away from using X.509 towards token-based authentication within the Worldwide LHC Computing Grid (WLCG) infrastructure has required many redesigns of the various workflows, ranging from data management through to job submission, and various activities in between. To compound the complexity of this transition, different user groups within WLCG have adopted different token use strategies within their workflows, resulting in a varied Token landscape across the grid.
Within such a diverse environment it is important to engage in a comprehensive and structured risk assessment process to identify potential risk vectors, their mitigations, and quantify their impact. This is crucial to be able to find and prioritise potential issues before they occur, to build and maintain an operationally secure Authentication and Authorisation Infrastructure.
The Token Trust and Traceability (TTT) Working group has been engaging in such risk assessment activities to identify, quantify and understand the threats inherent to the use of tokens within our distributed computing infrastructures. Given the different use cases of the various WLCG user groups and experiment communities, it became evident that threat considerations needed to be partitioned by workflow methodology, which formed the framework for our process.
We detail how the TTT constructed and performed the token workflow risk analysis, focusing on some of the key conclusions and recommendations that have been identified, and then present our plans to evolve the analysis into an ongoing process to continue to advise best practice for token use over the coming years - both within the WLCG, and to partner organisations and communities.
Speaker: Mr Tom Dack (STFC UKRI)
-
23
-
Track 5 - Event generation and simulation: Fast Simulation 1
-
28
Generative Muon Punch-Through with Flow-Matching in ATLAS
The ATLAS experiment at the Large Hadron Collider uses the Geant4 toolkit to simulate detailed Monte Carlo events spanning a broad range of physics processes. However, the full simulation is computationally expensive, with the main bottleneck originating from the modelling of particle showers in the calorimeter systems. To meet increasing demands, especially for the high-luminosity LHC era, ATLAS has deployed the AtlFast3 (AF3) suite as its current fast-simulation framework. A key component of AF3 is the modelling of Muon Punch-Through (MPT) particles: highly energetic hadrons that penetrate the calorimeters and produce activity in the muon spectrometer. The current MPT approach samples secondary particles via principal component analysis, with a neural network estimating punch-through probabilities. To handle the intricate correlation of the kinematic distributions and increased number of secondary species, a new punch-through model is now being developed. This contribution presents an overview of the existing MPT simulation and introduces the generative approach. The model is based on a transformer network to capture correlation of high dimensional input with a flow-matching method to fulfill the regression task for secondary particle kinematics. A Variational Bayesian layer is used as the linear layer to introduce uncertainty in the weights and model the stochasticity of the main kinematic distributions. The method is expected to improve accuracy and flexibility, with future plans to integrate it into the ATLAS software framework as part of AF3’s Geant4 fast-simulation model.
Speaker: Firdaus Soberi (The University of Edinburgh (GB)) -
29
Simulating the CMS High-Granularity Calorimeter with Generative AI
In the upcoming High Luminosity LHC era, detector simulation will face computing resource constraints; at the same time CMS will be upgraded with the new High Granularity Calorimeter (HGCal), which is more intensive to simulate. This computing challenge motivates the use of generative machine learning models as surrogates to replace full physics-based simulation of particle showers in the HGCal. A large dataset of calorimeter showers in the CMS HGCal, simulated with GEANT4, has been prepared and will be used to train and assess the performance of multiple state-of-the-art generative AI models. Applying generative AI to the simulation of HGCal showers requires significantly higher granularity and more irregular geometry as compared to previous AI-based calorimeter simulation studies. We will discuss the various methods employed by the AI models to overcome these challenges. The quality of the showers produced by the various AI models will be assessed and compared across multiple performance metrics.
Speaker: Oz Amram (Fermi National Accelerator Lab. (US)) -
30
DiT-based fast simulation for the CEPC long-bar crystal electromagnetic calorimeter
The CEPC is a proposed high luminosity e+e− collider designed for precision measurements of the Higgs, W, and Z bosons. Its reference detector incorporates a long bar crystal ECAL, which employs long, narrow crystal bars arranged in orthogonal layers to deliver fine 3D shower imaging and excellent compatibility with Particle Flow reconstruction. [1]
For CEPC physics analyses, large volumes of simulated data are essential. Calorimeter simulation is by far the most CPU intensive component of the CEPC detector simulation, accounting for roughly 80% of the total simulation budget. Consequently, the development of fast simulation techniques is a critical R&D priority.
Our work is inspired by CERN’s CaloDiT-2 [2], which develops a fast simulation framework based on the Diffusion Transformer (DiT). Our implementation, named Voxel Diffusion Transformer for Calorimeter (VoDiT4CAL), is built using PyTorch [3] and Lightning [4]. Building on the design principles of CaloDiT-2, VoDiT4CAL introduces two key enhancements:
- Enhanced Local Spatial Modelling: VoDiT4CAL incorporates PixelDiT [5] layer to better capture local spatial correlations, which reduces the DiT depth and significantly lowers computational cost.
- Enhanced Energy Modelling: VoDiT4CAL adds energy prediction head and dynamically redistributes energy across voxels.
Testing shows that VoDiT4CAL accurately reproduces key photon shower distributions across incident energies from 0.25 GeV to 100 GeV, meeting CEPC physics precision requirements. This contribution also presents a detailed report on distillation(for accelerating inference [6]), its impact on physics performance, and the practical speedup achieved after integrating VoDiT4CAL into the official CEPC software framework.
[1] Souvik Priyam Adhya et al. “CEPC Technical Design Report - Reference Detector”. In: (Oct. 2025). arXiv: 2510.05260 [hep-ex].
[2] Piyush Raikwar et al. “A Generalisable Generative Model for Multi-Detector Calorimeter Simulation”. In: (Sept. 2025). arXiv: 2509.
07700 [physics.ins-det].
[3] Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In: Advances in Neural Information
Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035. URL: http://papers.neurips.cc/paper/9015-pytorch-
an-imperative-style-high-performance-deep-learning-library.pdf.
[4] William Falcon and the PyTorch Lightning team. PyTorch Lightning. 2024. DOI: 10.5281/zenodo.13254264. URL: https://doi.
org/10.5281/zenodo.13254264.
[5] Yongsheng Yu et al. “PixelDiT: Pixel Diffusion Transformers for Image Generation”. In: arXiv preprint arXiv:2511.20645 (2025).
[6] Kaiwen Zheng et al. “Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency”. In: ArXiv abs/2510.08431
(2025). URL: https://api.semanticscholar.org/CorpusID:281950486.Speaker: ZHIHAO LI (Institute of High Energy Physics, Chinese Academy of Sciences) -
31
Modern Generative Models for Fast Calorimeter Simulation in ATLAS
Accurate modelling of electromagnetic and hadronic showers is one of the most expensive components of the ATLAS detector simulation. To reduce CPU usage for Run 3, the collaboration introduced AtlFast3, a fast simulation tool which combines classical histogram based parameterisations with GAN based calorimeter models.
Following Run 3, a new optimisation of the voxelisation scheme used for model training, which groups energy deposits into small volumetric cells, was initiated. This new voxelisation effort has led to a more efficient shower description and a clear improvement in physics performance and its development and results will be presented in this contribution. Despite these gains, GANs still show well-known limitations in stability and accuracy.
ATLAS is therefore investigating newer generative approaches such as diffusion models, transformers, and continuous normalizing flows. Recent results show that these models can reproduce detailed shower features more reliably and are being evaluated as potential replacements for parts of the current parameterisation used in AtlFast3.
This contribution summarises the current machine learning based calorimeter simulation in ATLAS, presents results from the modern generative models under study, and discusses the main challenges on the path toward a more ML driven fast simulation for Run 4 and beyond.
Speaker: Florian Ernst -
32
Adapting PARNASSUS: A Fast Simulation Tool for the ATLAS Experiment
Detector simulation and reconstruction are significant computational bottlenecks in particle physics. A state-of-the-art GenAI-based paradigm, Particle-flow Neural Assisted Simulations (PARNASSUS), has shown great promise for fast simulation, aiming to minimize resource utilization and enable fast surrogate models. PARNASSUS takes as input a point cloud (particles impinging on a detector) and produces a point cloud (reconstructed particles). Therefore PARNASSUS is an end-to-end simulation approach that goes directly from generated events to reconstructed quantities. We demonstrate this approach using datasets from various physics processes by the ATLAS experiment. We show that PARNASSUS accurately mimics the existing P-Flow algorithm on the (statistically) same events it was trained on, and that it can generalize to jet momentum and particle types outside the training distribution. In addition to its physics performance, PARNASSUS offers substantial computational gains: by replacing traditional simulation and reconstruction chains with a single learned surrogate model, it reduces CPU time by orders of magnitude facilitating scalable workflows
Speaker: Hamza Hanif (Simon Fraser University (CA))
-
28
-
Track 6 - Software environment and maintainability: Performance and Heterogeneous ComputingConveners: Arantza De Oyanguren Campos (Univ. of Valencia and CSIC (ES)), Ruslan Mashinistov (Brookhaven National Laboratory (US))
-
33
Automated tuning of GPU kernel parameters via RunTime Compilation for the ALICE Online reconstruction
The ALICE GPU TPC reconstruction is implemented by many GPU functions (kernels). Each kernel requires a block and a grid size to control GPU thread spawning, and may also need additional parameters like memory buffer sizes or pre-processing flags. Moreover, ALICE undertakes an aggressive GPU optimization by mapping grid and block sizes to launch bounds, optional compiler hints affecting hardware resource usage. By exploiting RunTime Compilation, this compile-time optimization is made available at runtime, unlocking a further level of performance tuning. Several kernels may run concurrently on the GPU, sharing resources and thus making these parameters interdependent. This makes manual tuning extremely time-consuming and complex. To effectively exploit this enhanced runtime optimization, an automated tool for tuning GPU kernels has been developed. The tuning is steered by Bayesian Optimization and, through the profiler, it transparently extracts the kernels' performance metrics. In this way, precise evaluations are possible for an ensemble of concurrent kernels, without modifying the target application code. The tuner has been successfully employed to optimize ALICE TPC reconstruction software, achieving up to $\approx11\%$ gain on production GPUs when reconstructing real Pb--Pb data. Since GPU TPC reconstruction dominates processing time during data taking, this results in significant time and energy savings. In addition, ALICE can dynamically load optimal parameters for different beam types, interaction rates, and GPU models. Finally, the tuner will be exploited by ALICE to fairly compare different GPU models and vendors for future hardware purchases. We believe that this kind of approach to GPU optimization, combining grid size and launch bounds tuning with profiler feedback, may inspire other experiments or applications with complex GPU workflows.
Speaker: Gabriele Cimador (CERN, Università and INFN Torino) -
34
The alpaka C++ library for performance portability
The rapid evolution of computing architectures toward increasing heterogeneity — combining multi-core CPUs with accelerators from multiple vendors — poses major challenges for performance, portability, and long-term sustainability of high-energy physics (HEP) software. Maintaining separate implementations for each architecture is costly, error-prone, and difficult to scale as both hardware and software ecosystems evolve.
The alpaka (Abstraction Library for Parallel Kernel Acceleration) performance portability library addresses these challenges by providing a header-only C++ abstraction for explicit parallel programming across a broad range of back-ends. These include GPUs from NVIDIA, AMD and Intel, FPGAs from Altera, as well as multi-core CPUs leveraging OpenMP, oneTBB, and standard C++ threading. Alpaka exposes well-defined execution and memory hierarchies, enabling single-source implementations while preserving fine-grained control over performance-critical aspects such as execution configuration and memory access patterns, and achieving near native performance on multiple back-ends.
Alpaka is an open source project developed at HZDR and CERN, and is used in both production and research software across several scientific domains. At HZDR, it is used in the PIConGPU Particle-in-Cell simulation code and in the mallocMC memory allocation library. At HZB, alpaka is a core component of the RAYX software for the design and optimization of beamlines in synchrotron light source facilities. In HEP, alpaka is used in production within the CMS experiment’s CMSSW framework, where it supports shared CPU and GPU implementations of reconstruction and trigger workloads. At CERN, alpaka is also used in other software projects, including as one of several backends in the traccc experimental framework for ACTS track reconstruction and in SOFIE for machine learning inference.
This talk presents alpaka’s design principles and performance characteristics, reviews its evolution through recent 2.x releases, and discusses ongoing development and long-term plans toward alpaka 3.
Speaker: Dr Andrea Bocci (CERN) -
35
Module Map Graph: A High-Performance Software Library for GNN-Based Reconstruction Pipelines on Heterogeneous Architectures for the HL-LHC
In recent years, numerous Machine Learning–based algorithms have been developed within particle physics experiments to accelerate the reconstruction of complex detector objects, notably at CERN in the context of the HL-LHC and, for example, within the Belle II experiment. A significant fraction of these approaches relies on Deep Geometric Learning, and in particular on Graph Neural Networks (GNNs). These algorithms are typically integrated into end-to-end pipelines combining graph construction from detector data, GNN model inference, and dedicated post-processing steps for the final reconstruction of physics objects.
We present Module Map Graph (MMG), a generic high-performance library optimized for hybrid CPU and GPU architectures, providing a unified implementation of all stages of such reconstruction workflows. MMG delivers specialized algorithms implemented as dedicated kernels, with a strong focus on memory efficiency through fixed per-event pre-allocation strategies and the use of stride-based data structures. MMG also leverages state-of-the-art inference frameworks (e.g., NVIDIA TensorRT) for the GNN step. The library exploits large-scale parallelism and supports fully asynchronous execution through CPU multithreading and CUDA streams, enabling excellent scalability on heterogeneous architectures. MMG follows modern software quality standards and provides Python interfaces as well as integration with experiment-independent track reconstruction toolkit ACTS [1].
We present the performance of MMG measured on benchmarks deployed on a realistic, production-like hybrid architecture setup representative of HL-LHC computing environments, and report a standardized set of metrics including latency, peak memory usage, energy consumption, and scalability. While these developments are motivated by the reconstruction of data from LHC experiments, MMG can straightforwardly be extended to other applications of GNNs in high-energy physics.
[1] The Acts project: track reconstruction software for HL-LHC and beyond, url: https://doi.org/10.1051/epjconf/202024510003
Speaker: Sylvain Caillou (Centre National de la Recherche Scientifique (FR)) -
36
Evolution of Software Performance Optimization and Monitoring in the ATLAS experiment
With HL-LHC approaching, we have extended software performance monitoring within Athena to better support multi-threaded and heterogeneous workloads, ATLAS is using the monitoring to guide the software optimization which is crucial during the upcoming shutdown accompanied by significant detector and computing upgrades. This optimization is crucial to ATLAS staying within its projected CPU resource constraints. The multi-threaded software performance monitoring system within Athena has been developed and in active use for several years, and in this contribution, we will present its current capabilities and future developments necessary to optimize ATLAS software.
We also report on new tooling to help ATLAS automate and track the performance of merge requests and releases. This includes improved automation tools, including Jenkins-based validation workflows and a MatterMost bot for automatic reporting and alerting.
Together, these developments aim to provide a consistent and reliable foundation for continuous performance monitoring and regression tracking in ATLAS' HL-LHC software.
Speaker: Tatiana Ovsiannikova (University of Washington (US)) -
37
Incorporating Heightened Scrutiny into a Large HEP Software Project
In 2023, DUNE began re-evaluating the requirements of its data-processing framework, which led to commissioning a new design that would better fit neutrino physics than the existing reconstruction frameworks designed for collider physics. Due to the radical changes expected, significant multi-institutional effort has been directed toward the creation of the Phlex framework. In addition, the tight timelines in which to implement such a framework have invited scrutiny from various parties, including DUNE itself, the host laboratory Fermilab, and the US Department of Energy.
After briefly discussing the unique needs of neutrino physics, we will recount how the Phlex development team has approached the design and development of a framework in the context of this heightened scrutiny. This process began with a systems-engineering approach to formally manage the requirements DUNE has of its framework. What followed was, for the first time in the HEP community, a review of the framework’s conceptual design by a panel of external framework experts before developers proceeded to the implementation. The implementation efforts have led to a series of prototypes that are being used to provide rapid feedback from Phlex users to framework developers.
We will discuss how each of these steps has resulted in a strong design that has the backing of the DUNE experiment, Fermilab, and the US Department of Energy.
Speaker: Philippe Canal (Fermi National Accelerator Lab. (US))
-
33
-
Track 7 - Computing infrastructure and sustainability
-
38
Standardizing HPC-resource adaptation for HEP workflows
HPC-resources are important for LHC HEP-experiments currently and they will become even more important as more new CPU-resources are found in HPC-machines and even more computing resources are needed for the High-Luminosity LHC (HL-LHC) era. HPC-resources can be challenging to adapt to HEP workflows. It would be preferable to have an established method of HPC enabling using standard open source tools that would allow running unchanged applications.
Some significant hurdles to HEP HPC usage can be the lack of a native CVMFS installation, non optimal data transfer tools and only ssh job submission. Typically current HEP workflows are run from nested containers distributed with CVMFS, which is often not supported by HPC sites. Here we present a solution in the form of a new tool, fapptainer that leverages Apptainer/Singularity and CVMFS running without root priviledges to run nested containers side by side to support standard unchanged workflows. For job submission and management we use Advanced Resource Connector (ARC) with ssh submission to a HPC as ARC can cache and transfer the input and output files and manage the jobs. The fapptainer and other tools that we use are generic, so they can be used in other experiments and communities with a need to harness HPC systems for their computing needs.
We have used fapptainer and ARC on the LUMI, Puhti and Mahti HPC machines in Finland provided by CSC to run unmodified ATLAS and CMS workflows. We present the fapptainer tool and the results achieved with it as well as discuss the performance and scaling of our setup.
Speaker: Tomas Lindén (Helsinki Institute of Physics (FI)) -
39
Global Grid User Support (GGUS) for WLCG and EGI: From Legacy to Next-Generation Helpdesk
The EGI Helpdesk, also known as Global Grid User Support (GGUS), is operated within the EGI federation as a core support service for the Worldwide LHC Computing Grid (WLCG) and other distributed research infrastructures, providing coordinated incident handling and service support across hundreds of computing centres. To address growing scalability, interoperability, and sustainability requirements, GGUS has recently undergone a major modernisation, migrating to a new platform based on the open-source Zammad system.
This contribution presents the architecture, feature development, and roll-out of the new GGUS Helpdesk, with a particular focus on meeting the operational needs of both the WLCG and EGI communities. The migration preserved and extended long-established GGUS workflows, such as team, alarm and multisite handling, and maintained existing integrations with core infrastructure services including the EGI Check-in authentication and authorisation service, the GOCDB (Grid Configuration Database), the OSG (Open Science Grid) database, and the CERN ServiceNow system. At the same time, it introduced improved role management, a modern user interface, richer APIs and a range of new capabilities. Since entering production in January 2025, the system has supported hundreds of active supporters and several thousand tickets, demonstrating its readiness for large-scale distributed operations.
A key novelty of the upgraded helpdesk is the introduction of AI-assisted support features. We present ongoing developments integrating large language models for AI-assisted ticket handling, including intelligent routing to appropriate support units, semantic summarisation of ticket histories, and context-aware writing assistant for support agents. These capabilities aim to reduce response times, improve consistency, and support first-level operators in handling complex incidents and requests typical of WLCG and EGI infrastructures.
The new GGUS Helpdesk thus establishes a scalable, extensible foundation for future support services, aligning operational support with the evolving demands of Open Science and data-intensive research communities.
Speaker: Pavel Weber (Karlsruhe Institute of Technology (KIT)) -
40
Enhancing High Availability and Disaster Recovery for Kubernetes Workloads at CERN
Over the past few years, CERN has transitioned a significant portion of its IT services and workloads to cloud-native environments, hosted on the CERN Kubernetes Service. These workloads leverage affinity policies within the cluster to optimize availability by distributing replicas across multiple availability zones within a single data center.
This session will present recent advancements in our service aimed at further enhancing high availability and disaster recovery capabilities. Key developments include the introduction of Active/Passive and Active/Active deployment models for replicating workloads across multiple clusters and data centers. Additionally, we’ll highlight the improvements made to disaster recovery processes, focusing on automated backup and recovery for both cluster configurations and workload metadata.
Through a series of practical use cases, we will explore how technologies like Cilium and ClusterMesh enable improved business continuity. We will also showcase Velero as a key tool for streamlining advanced backup and recovery procedures, ensuring seamless data protection and service availability in the face of failures. As Grid services and multiple WLCG sites continue their transition to similar stacks, we will show how the same building blocks can help optimize deployments across all sites and infrastructure.
Speakers: Jack Charlie Munday, Ricardo Rocha (CERN) -
41
IHEP site cluster fine-grained scheduling optimization
Due to procurement at different stages, the computing infrastructure at the IHEP site is highly heterogeneous: the cluster contains multiple node models with varying capabilities, and the performance gap between nodes can be substantial. Traditional scheduling policies do not tightly couple hardware performance characteristics with job behavioral characteristics, which can lead to suboptimal placements—for example, I/O-intensive jobs occupying fast CPU nodes while CPU-sensitive jobs are dispatched to slower nodes. This mismatch results in avoidable waste of scarce computing resources.
To address this issue, our solution systematically inventories the site’s hardware resources and annotates each compute node with capability metrics—covering compute, storage, and network—via HTCondor ClassAds. Users can then declare required capability thresholds when requesting resources. In parallel, we perform large-scale job collection and classification across the cluster. For job types whose resource demand patterns are well understood, we preferentially schedule them onto the most suitable nodes, enabling precise job–node matching and improving overall resource utilization and cluster throughput.Speaker: Chaoqi Guo -
42
Extending HPC integration programme in CMS toward heterogeneous processors: The experience of Deucalion
The HPC systems integration programme of CMS is in continuous evolution and the experience achieved in the last few years resulted in a toolkit of technical solutions contributing to ease the process of incorporating the resources provided by a HPC center into a thoroughly distributed computing system. However, such a process still represents a real barrier to effectively benefit from the provided computing capacity. CMS is now investing towards multiple dimensions, on the one side increasing the number of integrated machines and on the other side exploiting the heterogeneous hardware often provided by the HPCs. Indeed, the possibility to access sizable amount of heterogenous architectures resources is key also for testing and validating latest developments of the CMSSW software. In this talk we will describe the integration of Deucalion, the European High Performance Computing Joint Undertaking supercomputer located in Portugal. Deucalion is the first EuroHPC supercomputer based on ARM processors. CMS’s aim is to use the allocation primarily to contribute to the systematic validation of recent CMSSW releases, which are already ARM enabled, with the aim to reach production usage in 2026. The presentation will detail aspects of the actual technical integration, which enhances a former strategy employed for the VEGA HPC in Slovenia, together with the CMS physics valiadation status of ARM.
Speaker: CMS Collaboration
-
38
-
Track 7 - Computing infrastructure and sustainability
-
43
Getting to the top
CNAF is the national center of INFN (Italian Institute for Nuclear Physics) dedicated to Research and Development in Information and Communication Technologies. As the central computing facility of INFN, CNAF has been historically involved in the management and evolution of the most important information and data transmission services in Italy, supporting INFN activities at both national and international levels.
Recently, CNAF acquired an HPC cluster through RecoverEU funding, featuring high core count CPUs, H100 GPUs, and InfiniBand connectivity. We performed extensive system tuning with the goal of entering the Top500 list (https://top500.org). Achieving good performance was not an easy task: we struggled to find reliable and up-to-date documentation, but ultimately, we reached our target. Our presentation walks through all the steps we took to optimize the cluster and successfully enter the Top500 list.
Similarly to what we did for HPC, we tested our storage trying to enter the IO500.
The IO500 benchmark represents a paradigm shift in HPC storage evaluation. Traditional benchmarks often emphasized peak bandwidth, but modern scientific workflows, particularly those in high-energy physics and data-intensive computing, require excellent performance across multiple dimensions: sequential and random I/O, as well as metadata operations. IO500 addresses this need by providing a comprehensive suite of tests that better reflect real-world usage patterns. For facilities like CNAF, understanding performance across the full IO500 spectrum is essential for infrastructure planning and optimization.
We will show the procedure for setting up a 10-node client cluster, as per the specification of the standard IO500 benchmark, alongside the tuning of two distinct storage clusters to achieve the best performance. We successfully entered the IO500 list with two systems: one based on GPFS (a.k.a. Spectrum Scale), the other on CephFS.Speaker: Mr Andrea Chierici (Universita e INFN, Bologna (IT)) -
44
STARS: A representative compute metric for SRCNet radio astronomy workloads
The SKA SRCNet project will provide a globally distributed network of compute resources to enable scientific analysis of the vast data volumes produced by the Square Kilometre Array. These resources are contributed by institutions across multiple countries and are therefore highly heterogeneous, creating challenges in defining consistent compute pledges, accounting, and fair resource usage across the network.
To address these issues, SRCNet requires a common compute metric and corresponding accounting solution. In addition, institutions procuring new systems need guidance on optimising cost-effectiveness and utilisation for SRCNet workloads.
Inspired by HEPScore, we are developing STARS, a benchmark suite designed to provide a representative compute metric for large-scale, heterogeneous radio astronomy computing environments.
Developing such a metric presents challenges, including the diversity of radio astronomy software and the lack of mature production workloads while the telescopes are still under construction. We present the current status of STARS, our approach to building a representative software suite, and the accompanying SRCNet Accounting API, which enables SRCNet-wide per-user accounting based on node-level benchmark results, together with lessons learned and our roadmap.
Speakers: Pablo Llopis Sanmillan (EPFL), Ms Rohini Joshi (FHNW) -
45
Benchmarking GPU Viability in Heterogeneous HEP Batch Computing
High Energy Particle Physics (HEP) relies on efficient and sustainable computing infrastructures operating at a global scale. These infrastructures must support a broad range of workloads, including machine learning applications, large-scale production campaigns, and heterogeneous end-user analysis jobs. Ensuring that available computing resources can be effectively utilized across this spectrum is therefore of key interest to HEP as a whole.
In preparation for the High-Luminosity LHC phase, which is scheduled to start in the 2030s, computing centers are facing increasing demands for performance and energy efficiency. In this context, the CMS and ATLAS collaborations are currently evaluating how GPU resources could be incorporated into their computing models, motivated by their potential for high throughput and improved efficiency. However, the general suitability of these resources across the full HEP landscape remains an open question, as centralized benchmarking is still in development.
This contribution assesses the performance gains from GPUs for HEP computing in the context of a batch-processing environment using three representative benchmark scenarios. In addition, we explore the opportunities of GPU partitioning via prototype Multi-Process Service (MPS) and Multi-Instance GPU (MIG) HTCondor setups, demonstrating flexible and efficient integration into HEP batch systems.Speaker: Tim Voigtlaender (KIT - Karlsruhe Institute of Technology (DE)) -
46
Classification and Modeling the Performance and Scaling of Scientific Workflows with Resource-efficient Workflow Mini-apps
Scientific workflows are increasingly important in driving scientific discoveries, and future supercomputers must be designed and tuned to execute them efficiently. However, evaluating the performance of emerging computing systems using production-scale workflows is costly and energy-inefficient, especially at extreme scales. Moreover, application-level mini-apps do not capture workflows’ heterogeneity, sophistication, and end-to-end behavior. We propose workflow mini-apps, a modeling technique that faithfully reproduces the key performance characteristics of real workflows while remaining portable across systems and architectures and enabling low-cost, reproducible evaluation. In this work, we model the performance and scalability of several Simulation-with-ML workflows by generating workflow mini-apps from representative workflow instances and exposing adjustable parameters to emulate families of similar workflows. In addition to modeling workflow by their own directly, we introduce a workflow classification that groups workflows into classes based on their component task characteristics and performance traits. Using this classification, we provide class-level performance prediction and analysis, enabling users to reason about scaling behavior and system suitability beyond single instances. This supports informed HPC infrastructure choices, targeted performance optimization, and clearer expectations of scaling outcomes.
Speakers: Ozgur Ozan Kilic (Brookhaven National Laboratory), Tianle Wang (Brookhaven National Lab) -
47
Hardware technology in the AI era
The world of data center technology is experiencing rapid and significant changes, due to an ever increasing demand for hardware in the AI commercial sector, with profound implications for the HEP community. More than ever, the road to HL-LHC requires the experiments to develop and implement radical changes on how they exploit computing and storage resources, to cope with much less favorable price reduction predictions.
This contribution provides an update of the status and the trends of hardware technology, with a strong focus on the elements that pose the greatest challenges for the HEP community, how they will impact the preparation for HL-LHC and possible mitigation measures.
Speaker: Dr Andrea Sciabà (CERN)
-
43
-
Track 8 - Analysis infrastructure, outreach and education: Open Data
-
48
From recommendations to action: your role in advancing data preservation and open science in high-energy physics
The International Committee for Future Accelerators (ICFA) has mandated a panel to address various aspects of the data lifecycle with a focus on open science and FAIR practices - FAIR standing for Findability, Accessibility, Interoperability and Reusability of digital assets. A key indicator of success in this context is the long-term usability of research data by members of experimental collaborations and the broader scientific community. Experience shows that achieving this requires providing a rich set of information in various forms, which can only be effectively collected and preserved during the period of active data use.
The Data Lifecycle panel brought together scientists and experts involved in high-energy physics data preservation to compile a concrete set of best-practice recommendations for data preservation and open science. The guiding principle in drafting these recommendations was to ensure that they are specific to the high-energy physics domain, relevant for achieving long-term data usability, and actionable for the reader.
This contribution outlines the importance of bringing these recommendations to the attention of the high-energy physics research community and highlights the role that every actor - from policymakers to individual analysts - plays in putting them into practice. It also discusses how progress in advancing data preservation and open science will be assessed across the community.
Speaker: Kati Lassila-Perini (Helsinki Institute of Physics (FI)) -
49
Rucio for Open Science: FAIR and Accessible Data at Scale from Day One
The open-sharing and re-use of scientific data is ever more important, either to meet the demands of transparency and reproducibility, or to maximize the scientific return of large and small experiments. The FAIR principles (Findable, Accessible, Interoperable, Re-usable) require efficient data publication, discovery, and long-term preservation that often means costly duplication of data across storage and publishing platforms. In this contribution, we discuss how Rucio, the widely adopted distributed data management system, is being extended with native support for Open Data to address these challenges.
We present the architecture and workflow of the new “Rucio Open Data” capabilities: after tagging data as “open,” Rucio automatically applies the appropriate data placement, replication, metadata tagging, and exposure mechanisms required for public release without requiring data duplication or separate export procedures. This mechanism empowers the transition from internal data workflows to public data sharing.
We highlight how this approach simplifies compliance with FAIR principles and reduces operational overhead for experiments. We also discuss the integration with the existing CERN Open Data Portal (and similar open-data platforms), enabling experiments to publish data directly from their Rucio-managed storage to public repositories, with metadata and access policies managed consistently.
Finally, we showcase benefits for both new and legacy datasets: new experiments can plan open data sharing from day one and established collaborations can incrementally expose historical data without major re-engineering.
Speaker: Hugo Gonzalez Labrador (CERN) -
50
Providing Cold Storage in the CERN Open Data portal in production
The CERN Open Data portal provides open access to high-energy physics data collected by CERN experiments for research, education, and outreach. At present, more than 5 PB of data are accessible through it. To ensure the long-term preservation and sustainable management of large datasets, a cold storage system has been introduced. Cold storage enables the archiving of data that is rarely accessed, freeing up high-performance resources while maintaining the ability to restore datasets on demand.
Following a successful proof of concept, the system was deployed in production in June 2025. The implementation uses the CERN Tape Archive (CTA) to store the cold data, EOS Open Storage for the hot data, and the File Transfer Service (FTS) to manage data movements. One of the key challenges was to support data staging requests from non-authenticated users, ensuring both openness and reliability without compromising system integrity. The system allows users to request the staging of archived datasets directly from the web interface, with requests automatically queued, processed, and monitored by the backend. The workflow includes transfer request management, asynchronous execution of data movements between storage classes, and notification mechanisms for users when data becomes available.
This contribution will present the technical design and integration of the cold storage system, the challenges faced in supporting unauthenticated user interactions, and the performance results observed since deployment. It will also share the lessons learned after one year of production operation, and outline ongoing work to improve monitoring, automate lifecycle policies, and evaluate other transfer technologies.
Speaker: Pablo Saiz (CERN) -
51
Advancements in ATLAS Open Data for Research
The ATLAS Collaboration has for the first time released a large volume of event generator output in HepMC format for the benefit of the research community, allowing theorists and other experimentalists to profit from the efforts and resources of the collaboration. This release complements the existing proton and heavy ion collision data and MC simulation that were released for research use in 2024. To support the use of these datasets, ATLAS has developed significant supporting infrastructure and documentation, including the atlasopenmagic package, a one-stop shop for metadata, dataset searches, and file identification. Thanks to close collaboration with CERN IT, the infrastructure also allows monitoring of documentation and file accesses, so that the collaboration can gather user information without resorting to user surveys. This contribution will introduce the new flavours of ATLAS Open Data for Research and expand upon the infrastructure and monitoring developments that are now available.
Speaker: Giovanni Guerrieri (CERN) -
52
LHCb Ntupling Service: official public release with access to Run 2 open data
The LHCb collaboration is very excited to announce the official public release of the LHCb Ntupling Service: an application for on-demand production and publishing of custom LHCb open data, providing users access to both Run 1, and for the first time, Run 2 pp data collected by the LHCb experiment, amounting to roughly 7 fb−1. A key feature of this implementation is that no knowledge of the LHCb software stack is required to use the application and perform the subsequent data analysis. In this talk, we will share some recent improvements to the application, describe the procedure of publishing output from the LHCb Ntupling Service to the CERN Open Data Portal, and highlight the vast potential these data have for research and educational purposes with real examples of analysis workflows. The LHCb Ntupling Service is accessible directly through the CERN Open Data Portal, offering a dynamic and flexible option for exploring petabytes of LHCb open data.
Speaker: Piet Nogga (University of Bonn (DE))
-
48
-
Track 9 - Analysis software and workflows
-
53
New Developments in ROOT's RDataFrame
ROOT's RDataFrame is a declarative analysis interface to define modern analysis workflows in C++ or Python, which are executed efficiently either locally using TBB, or in a distributed manner using Dask or Spark. Its seamless integration with TTree and RNTuple makes it an ideal tool for performant and space-efficient data analysis in HEP. This contribution will highlight recent and upcoming features of RDataFrame, namely saving systematic variations to disk using snapshots, and seamless reading and writing ROOT's RNTuple format. We will also discuss how to use RDataFrame to directly feed data to various machine-learning frameworks without the need to create specialised NTuples or intermediate copies, and touch on the prospects of extending RDataFrame to bulk operations.
Speaker: Stephan Hageboeck (CERN) -
54
Coffea Framework: Current Status, Recent Updates, and Community Impact
The Coffea (Columnar Object Framework for Effective Analysis) framework continues to evolve as a cornerstone tool for high-energy physics data analysis, providing physicists with efficient, scalable solutions for processing complex event data. This talk presents the current status of Coffea, highlighting significant recent developments and their impact on the HEP analysis community.
A major milestone has been transitioning from Dask-Awkward to Awkward Array's new virtual arrays feature as the default backend. This shift improves Coffea’s user-friendliness, increases performance, and simplifies the execution model. The transition requires minimal to no code modifications for existing analyses, providing a seamless migration path with substantial improvements.
Recent enhancements include advanced workflow features such as checkpointing for robust, resumable analyses, improved branch preloading and caching for network-efficient data access, and workflow tracing to identify required data branches. These optimizations benefit both interactive and batch processing scenarios, allowing physicists to focus on physics rather than data management details.
This talk will also present usage statistics and community feedback demonstrating Coffea's growing adoption across CMS and other HEP experiments, along with performance metrics showcasing memory reduction and throughput improvements. Looking toward the High-Luminosity LHC era, Coffea's architecture positions it as a key enabling technology for handling unprecedented data volumes while maintaining intuitive, user-friendly interfaces for data analysis at scale.
Speaker: Iason Krommydas (Rice University (US)) -
55
MC-Run: Scalable MC event generation and Rivet analysis
Efficient and reproducible analysis workflows are vital for large-scale Monte Carlo (MC) event studies in high-energy physics (HEP). We present MC-Run, a lightweight and scalable open-source tool designed to orchestrate complete MC production and analysis chains, from event generation to Rivet analyses and subsequent post-processing such as combination procedures and plotting. The framework is aimed at analysts and focuses on portability and ease of use: all generators and analysis modules are sourced directly from LCG stacks via a CVMFS mount, following MCnet standards and avoiding experiment-specific software layers.
Originally developed to derive non-perturbative correction factors for phenomenological studies, MC-Run has already been used in HEP publications, enabling large-scale campaigns totalling O(100k) CPU hours and handling up to 300 TB of distributed storage. MC-Run has since been extended to the derivation of electro-weak corrections. Its modular structure allows straightforward adaptation to new physics targets and additional analysis steps, such as future extensions for MC generator tuning efforts.
MC-Run seamlessly exploits distributed computing resources to scale analyses. Workflows are handled through the Luigi and law frameworks and executed on HTCondor, with support for Grid storage systems e.g. via XrootD and WebDAV. This enables robust large-scale production campaigns without requiring users to manage Grid middleware directly.
This combination of reproducibility, portability, and seamless access to distributed resources makes MC-Run a practical solution for end-to-end MC analysis workflows, from generator-level production to physics-ready results.Speaker: Cedric Verstege (KIT - Karlsruhe Institute of Technology (DE)) -
56
Declarative paradigms for analysis description and implementation - a demonstrator for High Energy Physics
The software toolbox used for big data analysis is rapidly changing in the last years. The adoption of software design approaches able to exploit the new hardware architectures and increase code expressiveness plays a pivotal role in boosting both development and performance of sustainable data analysis.
The scientific collaborations in the field of High Energy Physics (e.g. the LHC experiments, the next-generation neutrino experiments, and many more) are devoting increasing resources to the development and implementation of bleeding-edge software technologies in order to cope effectively with always growing data samples, pushing the reach of the single experiment and of the whole HEP community.
In this context the adoption of declarative paradigms for the analysis description and implementation is gaining momentum in the main collaborations.This approach can simplify and speed-up the analysis description phase, support the portability of the analyses among different datasets/experiments and strengthen the preservation and reproducibility of the results. Furthermore this approach, providing a deep decoupling between the analysis algorithm and back-end implementation, is a key element for present and future processing speed, potentially even with back-ends not existing today.
A framework characterised by a declarative paradigm for the analysis description and able to operate on datasets from different experiments is under development in the frame of the ICSC (Centro Nazionale di Ricerca in HPC, Big Data and Quantum Computing, Italy). The Python-based demonstrator provides a declarative interface for the implementation of HEP data analyses, with support for different input data formats and for different processing back-ends. Status and main features of the demonstrator will be illustrated.
Speaker: Paolo Mastrandrea (Universita & INFN Pisa (IT)) -
57
SMOCS - JLab’s Streaming Monitoring Optimization Control System
Machine learning (ML) has proven to be incredibly useful in science and engineering, however, there exists a significant overhead for deployment and maintenance of ML models in real time operation. This is due to many different custom interfaces each complex facility may have, the conversions required between non standard data formats, and ML infrastructure required for continuous adaptation of deployed models. To minimize this overhead, we present SMOCS (Streaming Monitoring Optimization and Control System), a Kafka-based containerized framework that enables dynamic deployment of specialized agents ranging from sensor monitoring and visualization to ML driven diagnostic and controls. This platform agnostic solution links live streamed data with custom AI/ML applications and aims to avoid rebuilding the equivalent infrastructure for each facility. SMOCS addresses the growing need for a dynamic streaming framework capable of agentic workflows in scientific facilities while maintaining the flexibility to adapt to diverse experimental requirements and the reliability required by these complex systems. SMOCS is publicly available as open source software with documentation to facilitate adoption across the scientific community.
Speaker: Torri Jeske
-
53
-
Poster
-
Track 1 - Data and metadata organization, management and access: Rucio, data lakes and distributed data management
-
58
A new Rucio Service at CERN for Emerging and Established Experiments
In this contribution, we present a new Rucio-based service designed specifically to simplify data management for the Small and Medium experiments at CERN.
Rucio has become the de-facto data management solution for major experiments in high-energy physics and related scientific domains such as astrophysics, providing a scalable, policy-driven framework for distributed data placement, access, and lifecycle management. Small and medium-sized experiments have similar challenges to those of big experiments in data organization, reliability, and long-term preservation. These experiments seek a robust data management platform without the operational overhead associated with large-scale deployments.
Two years ago we had the idea of creating a new class of service for these experiments and we finally can report on this new successful service based on Rucio, EOS and CTA (CERN Tape Archive) and FTS (File Transfer Service).
The service offers a low barrier to adoption: experiments write data into a designated disk pool, and archiving, replication, and policy enforcement are handled automatically. This approach creates the illusion of a hierarchical storage manager, allowing data acquisition to write into a disk pool at which point Rucio takes over, managing scalable disk workflows and flexible tape-backed archival.
The service also acts as a blueprint for new experiments, such as ShiP, providing a ready-to-use, production-grade data management environment from the beginning. At the same time, existing experiments (AMS, NA62) , benefit from a managed service, reduced operational effort, and modernization of their data management workflows.
We describe the system’s architecture, the integration model that minimizes data migration and lessons learned while deploying the services for AMS, ShiP and NA62 experiments.
Speaker: Hugo Gonzalez Labrador (CERN) -
59
interTwin Digital Twin Engine's Data Lake
The interTwin project, funded by Horizon Europe, developed a Digital Twin Engine (DTE), a platform for the development and running of Digital Twins across multiple scientific domains. A central component of the DTE is the interTwin Data Lake, a federated storage layer that integrates HPC, HTC, and cloud-based datasets and provides unified access while preserving site-local policies and permissions. The interTwin Data Lake is based on Rucio and FTS.
The project identified access to existing storage as a barrier to data lake adoption. To tackle this, the project developed two new components: Teapot and ALISE. Together these enable scalable and secure access to the Data Lake in HPC and HTC environments. They achieve this by providing automated, bulk access to site storage while mapping each request to a site-local account, without requiring centrally managed accounts.
Teapot is a multi-tenant WebDAV service built on StoRM-WebDAV and provides integration of HPC and HTC storage into the federated Data Lake. Its architecture preserves file ownership and enforces native filesystem permissions, allowing sites to expose storage resources without altering local policies. Teapot has enabled CESGA and KBFI to join the Data Lake, with additional sites in progress. By enabling HPC/HTC storage integration into the Data Lake, it supports Digital Twin workflows that require bulk data staging and processing on HPC and HTC resources.
Teapot integrates with ALISE, a lightweight user enrolement service that supports linking external- and site-local identities. ALISE makes this mapping information available to services, enabling the site to support OIDC-based authentication while retaining local account and authorization models.
Speaker: Dijana Vrbanec -
60
A Rucio-Based Global Data Lake for the SKA Regional Centre Network
The Square Kilometre Array (SKA) telescopes, currently under construction in South Africa and Australia, are due to enter Science Verification at the end of 2026. From this point, these interferometers will generate an increasing volume of data, with the science data processors eventually producing of order 1 PB per day of science-ready data products. Managing this archive across the globally federated SKA Regional Centre Network (SRCNet) of data centres is a key challenge in enabling timely and reliable access to SKA data for the astronomy community.
To address this, the SRCNet data lake is built around Rucio and FTS as its core data management and transfer technologies, complemented by a suite of auxiliary services tailored to SKA-specific requirements. In this talk, we outline the SRCNet data lake use case and describe the key challenges encountered when adapting Rucio to this environment. We highlight the supporting services developed and summarise the full data lifecycle, from ingestion at SKA Observatory interfaces, through global replication and distribution, to staging for scientific processing at SRCNet sites.
In contrast to traditional High Energy Physics workflows, where data access is typically organised around predefined datasets, astronomy requires multi-mission discovery with tight integration between physical replica management and rich, standards-based metadata systems, alongside support for proprietary data embargoes. We discuss the mechanisms implemented to manage and expose science metadata within the data lake and to control access to embargoed data products, both increasingly important requirements for large-scale distributed astronomical archives. These experiences are expected to be relevant to the wider community considering Rucio-based data lakes with complex metadata and federation requirements.
Speaker: James Collinson (SKAO) -
61
Enhancing Rucio for Gravitational Wave Experiments with MADDEN
Large-scale scientific experiments, such as those in gravitational-wave (GW) science, produce extensive datasets that are often stored in isolated data lakes. The second-generation interferometers—LIGO, Virgo, and KAGRA—are part of an international scientific network, the International Gravitational-Wave Observatory Network (IGWN). A similar framework is envisaged for the third-generation interferometers: the Einstein Telescope (ET) in Europe and Cosmic Explorer (CE) in the United States. Data from the interferometers must be readily shared among scientists across all collaborations, enabling coincidence detection to distinguish genuine astrophysical signals from local noise and to achieve accurate sky localization. Data distribution, storage, and access should follow FAIR principles to streamline data analysis. Currently, both LIGO and Virgo use Rucio for data distribution, and ET is evaluating it as a Distributed Data Management (DDM) system.
In this context, two projects—MADDEN and ETAP— were funded by the first OSCARS (Open Science Cluster’s Action for Research and Society) Open Call. MADDEN (Multi-RI Access and Discovery of Data for Experiment Networking) focuses on enhancing Rucio functionalities to better meet the requirements of the GW community. Within MADDEN, we will extend Rucio to support multi-RI (Research Infrastructure) data lakes, simplifying authentication and user management. We will present the design and implementation of a POSIX-like view of the Rucio catalogue in a multi-RI environment and provide support for advanced metadata queries. These capabilities will be showcased and integrated into ETAP (Einstein Telescope Analysis Portal), which will provide a complete environment for data analysis for ET to be used in the next ET Mock Data Challenges. In this contribution we will also report on the participation in the ESCAPE xRIDGE data challenge planned for early 2026.Speaker: Federica Legger (Universita e INFN Torino (IT)) -
62
Integrating Globus as a Transfer Tool for Rucio: Bridging HEP and ASCR Data Management Infrastructure
Rucio, the scientific data management system developed by ATLAS at CERN, has become widely adopted across high-energy physics experiments for managing distributed datasets at exabyte scale. Traditionally, Rucio relies on the WLCG File Transfer Service (FTS) for data movement between storage elements. We present recent developments enabling Globus—the research cyberinfrastructure platform operated by the University of Chicago and used extensively by DOE-ASCR and NSF HPC facilities—as an alternative transfer tool within Rucio.
Building on initial work from 2019, we have updated and validated the Globus transfer tool integration with the latest Rucio releases. The implementation uses OAuth2 authentication with the Globus SDK, supports both bulk and single-file transfers, and works with Globus Server endpoints as well as Globus Connect Personal endpoints. The integration required only approximately 100 lines of code modifications to Rucio's transfer tool components.
We report on functional testing of transfers and deletions using test RSEs, and describe ongoing scale testing between Argonne’s LCRC and NERSC at LBNL. This integration provides HEP experiments with access to Globus's 10,000+ connected storage systems across 2,600+ institutions, while enabling the ASCR community to leverage Rucio's sophisticated data management capabilities. We discuss plans for adoption by DUNE, ATLAS, Rubin Observatory, and other experiments, as well as future work on Globus Connect Personal support for end-user data access.Speakers: Benjamin Gutierrez (Argonne National Laboratory), Doug Benjamin (Brookhaven National Laboratory (US)) -
63
SRCNet v0.1 and the Data Path to SKA Science
The SKA Regional Centre Network (SRCNet) is a globally federated infrastructure providing data distribution and science workflows for the Square Kilometre Array (SKA). The v0.1 test campaign delivered the first system-level validation across nine accredited nodes, integrating global services (Rucio, FTS, SKA-IAM, perfSONAR) with site services (storage, compute, science platforms) and executing progressive stress tests. We demonstrated sustained site-to-site throughput of up to ~20 Gb/s and achieved 24-hour data-movement challenges reaching ~150 TB and ~1M files, at scales expected for upcoming Science Verification commissioning periods. Rucio’s policy‑based replication strategies supporting long‑haul distribution across SRCNet are also presented.
In this work, we present the test campaign setup, including use of the Rucio Task Manager framework to orchestrate test plans, and provide results on reliability, and network and storage performance. We also report updated SRCNet characterisation results, including site-to-site performance and operational lessons from token‑only transfer workflows. Results from file‑concurrency studies, alongside per‑file transfer speeds and file‑size variation, are presented, which are important to inform large‑scale replication and data‑placement strategies for different formats of astronomy data.
Building on these baselines and leveraging the experience from WLCG-style Data Challenges, we present the validation plan to demonstrate readiness for SRCNet to accept, distribute, and serve the SKA observatory data flow. This provides the path to meet the ~700 PB/yr observatory ingest plus derived data products, as SKA reaches full operations, positioning SRCNet as a scalable federation, extending and augmenting proven HEP methodologies for next-generation radio astronomy.
Speaker: James William Walder (Science and Technology Facilities Council STFC (GB))
-
58
-
Track 2 - Online and real-time computing
-
64
Autoencoders for real-time event selection at the LHCb experiment
The LHCb experiment operates a fully software-based trigger that must reduce the 40 MHz collision rate to an output bandwidth of around 10 GB/s, making real-time event selection a central computing challenge. Current selections in the second-level trigger (HLT2) are largely based on hand-crafted cuts, which can be difficult to optimise in high-dimensional spaces and may lack robustness against unforeseen background sources, as well as on supervised-ML algorithms such as classification Boosted Decision Trees (BDTs).
In this work, a novel selection strategy is presented based on an unsupervised-ML model, an autoencoder, trained solely on simulated signal events. The network learns a compact representation of the signal of interest and uses the reconstruction error as a discriminating variable, allowing model-independent background rejection. For the rare decay Lambda_b to proton pi- mu +mu-, the autoencoder achieves significantly improved performance compared to the existing HLT2 cut-based line: for the same rate, it increases the efficiency by 30%; for the same efficiency, it decreases the rate by 80%. For completeness, a supervised Boosted Decision Tree trained on the same feature set is also evaluated. The model was deployed in the LHCb trigger using the ONNX framework for real-time inference, and has been running since October of 2025.
This aims to showcase the potential of unsupervised-ML approaches for optimizing real-time event selection at the LHCb experiment.Speaker: Paloma Laguarta González (University of Barcelona (ES)) -
65
Realtime updates for realtime ML
Machine Learning (ML) algorithms are becoming a key tool in fast decision making in high energy physics experiments from event-level classifiers in FPGA-based triggers down to cluster identification on detector module ASICs. Operating so close to raw detector data exposes these models to evolving experimental conditions that can introduce distribution shifts and degrade their performance. Updating these algorithms for FPGAs is a time consuming process and effectively impossible for ASICs.
We present realtime updates to ML algorithms in-situ using external control signals to the FPGA or ASIC. The Forest Processing Unit (FPU) is demonstrated as a fully updatable decision tree implementation where the entire model can be updated externally. Updatable scaling layers for generic ML applications are also introduced where inputs or outputs to a model can be calibrated depending on the changing input to the model. These methodologies offer a generic solution for maintaining on-chip ML model performance in changing detector conditions, which we demonstrate with examples from the CMS Phase-2 Level-1 Trigger upgrade.
Speaker: Christopher Edward Brown (CERN) -
66
Online searches for long-lived particles at LHCb: BuSca (Buffer Scanner)
The new fully software-based trigger of the LHCb experiment at CERN operates at a 30 MHz data rate, opening a search window into previously unexplored regions of the physics phase space. The BuSca (Buffer Scanner) project at LHCb acquires and analyses data in real time, prior to any trigger decision, extending sensitivity to new particle lifetimes and mass ranges.
Displaced tracks that originate downstream of the LHCb vertex detector are reconstructed and selected. BuSca identifies hotspots in the data indicative of potential new long-lived particle candidates in a model-independent manner, providing strategic guidance for developing new trigger lines. In this talk, we will present the current status and potential developments of this pioneering framework, along with the results from the analysis of the Run 3 data.Speaker: Valerii Kholoimov (EPFL - Ecole Polytechnique Federale Lausanne (CH)) -
67
Accelerating Graph Neural Networks on FPGAs for Real-Time Level-0 Muon Triggering
The High-Luminosity LHC will generate unprecedented data rates, pushing real-time trigger systems to their limits. We present a novel approach deploying graph neural networks (GNNs) on FPGAs to achieve fast, sub-microsecond inference for Level-0 muon triggers. Exploiting the sparse, relational structure of detector hits, the method preserves key spatial correlations while enabling hardware-efficient, low-latency execution. We explore model compression, pipelined parallelism, and resource-aware design to optimise throughput under stringent real-time constraints. Preliminary results indicate that this approach can scale to high-rate environments, demonstrating the potential of FPGA-accelerated GNNs for AI-assisted event selection at the first step of the Level-0 muon trigger chain. Our work highlights strategies for integrating machine learning with FPGA-based triggers, offering a path toward real-time processing in next-generation high-energy physics experiments.
Speaker: ATLAS TDAQ collaboration -
68
Particle-Based Representation Learning for Anomaly Detection in the CMS High-Level Trigger
Anomaly detection at the LHC aims to identify events that deviate from dominant Standard Model (SM) processes while minimizing assumptions inherent to predefined trigger selections, enabling model-agnostic searches for new physics. The CMS experiment employs a two-stage trigger system that reduces the LHC bunch-crossing rate of up to 40 MHz to an output rate of approximately 9 kHz for offline processing in Run 3.
This work explores a proposed additional anomaly-detection layer at the High-Level Trigger (HLT), complementing the AXOL1TL system deployed at Level-1. The approach uses self-supervised representation learning to construct a physics-informed latent space in which the main SM processes populate well-separated regions, while anomalous or previously unmodeled event topologies tend to occupy distinct areas.
The model ingests the full set of reconstructed particles and their features, processes them with an attention-based architecture, and produces a compact fixed-size event representation. Preliminary results demonstrate the potential of this strategy to preferentially highlight anomalous events and to achieve rate reduction while improving sensitivity to a broad range of signal scenarios relative to dominant SM backgrounds.Speaker: Mehrnoosh Moallemi (Science and Technology Facilities Council STFC (GB)) -
69
Advances in the Optimisation of the LHCb First Level Trigger
The first level of the LHCb experiment’s trigger system (HLT1) performs real-time reconstruction and selection of events at the LHC bunch crossing rate using GPUs. It must balance the diverse goals of the LHCb physics programme, which spans from kaon physics to the electroweak scale.
To maximise the physics output across the entirety of LHCb's physics programme, an automated bandwidth division has been deployed since 2024. This procedure uses adaptive moment estimation to determine the optimal selection criteria while satisfying constraints such as the total HLT1 output rate and thresholds shared between common trigger categories.
The bandwidth division is now widely available to the collaboration for development and testing of new trigger lines via Continuous Integration (CI) on GitLab, providing information about exclusive and inclusive rates per trigger line, trigger efficiencies for each simulation sample included, and inclusive rate overlap between trigger lines. A near term goal is to develop the bandwidth division CI into allowing pseudo-real-time tunings. This will enable the end-users to study and monitor the effect of selection criteria on the trigger rate under realistic conditions for the study and monitoring of trigger performance.
This talk will present the bandwidth division algorithm, its place in the LHCb real-time analysis paradigm, and the development towards having a pseudo-real-time bandwidth division for LHC Run 4 and onwards using continuous integration and automation.
Speaker: Andreea-Irina Hedes (The University of Manchester (GB))
-
64
-
Track 3 - Offline data processing: Tracking 1
-
70
Using Line Segment Tracking to enable ML-based Track Reconstruction in CMS
High-pileup conditions in CMS during the HL-LHC era make charged-particle tracking increasingly challenging as detector occupancy and combinatorics grow. We present a hybrid approach that exploits Line Segment Tracking (LST) objects rather than individual hits to enable the first CMS ML-based track reconstruction algorithm. The LST segments are built according to geometry- and physics-driven criteria and carry richer local structure, geometrical cues, and physics quantities such as transverse momentum, which helps the model incorporate detector-level information from the start.
We apply novel ML architectures like Graph Neural Networks and Transformers to embed segments in a learned latent space and use object condensation to assemble full tracks in a single inference step. This segment-based representation reduces complexity in dense environments and makes the model less sensitive to ambiguities introduced by high pileup. Evaluated on CMS Phase-2 simulation, this approach achieves high reconstruction efficiency with low fake and duplicate rates, demonstrating the promise of advanced ML architectures for next-generation particle tracking.
Speaker: Aashay Arora (Univ. of California San Diego (US)) -
71
Hybrid Track Candidate Finder Combining the Hough Transform and Neural Networks for Fast Charged-Particle Reconstruction
Reconstructing charged-particle tracks in silicon detectors is one of the most computationally demanding tasks in high-energy physics. When applied in online event selection systems, additional latency constraints make the problem even more challenging. Within the reconstruction chain, the efficient and high-purity formation of track candidates plays a critical role in the overall performance.
Among the many approaches developed over the years, the Hough Transform (HT) has been widely studied as a fast, geometry-driven method for track finding. However, in high-occupancy environments such as those expected at the High-Luminosity LHC (HL-LHC), the HT tends to produce a large number of spurious candidates, leading to increased computational overhead in subsequent reconstruction stages.
In this work, we present a hybrid approach in which the HT serves as a first-stage data preparation step, providing its parameters space image as an input to a neural network trained to suppress false track candidates. The method combines the speed of the HT with the discriminative power of machine learning to achieve both efficiency and purity. In addition no data transformations are involved when combining these steps resulting in simpler and more performant algorithm. Performance studies using the Open Data Detector simulated in the ACTS framework under realistic HL-LHC pileup conditions will be presented.Speakers: Carlo Varni (AGH University of Krakow (PL)), Krzysztof Cieśla (AGH University of Krakow (PL)), Marcin Wolter (Polish Academy of Sciences (PL)), Tomasz Bold (AGH University of Krakow (PL)) -
72
Boosting the ATLAS Event Reconstruction Efficiency via an Improved Track-Overlay
n response to the rising computational and storage demands of the High-Luminosity Large Hadron Collider (HL-LHC), efforts are underway to boost the processing efficiency of ATLAS Inner Detector (ID) event reconstruction. Our strategy to reduce the computational demands employs a Track-Overlay approach, which uses pre-reconstructed pile-up tracks (from separate minimum-bias simulations) and then runs the ID reconstruction procedure exclusively on hits corresponding to the simulated hard-scatter event. This method concentrates computing resources on events of interest, achieving a reduction in the required CPU time of up to 40% compared to standard reconstruction techniques.
A crucial component of this workflow is the incorporation of Machine Learning (ML), which guides the selection of events suitable for the Track-Overlay approach. High-density track environments, like those associated with high-pT jets, are intentionally directed toward the standard reconstruction method. This ensures a balanced approach that maximises resource efficiency while successfully maintaining high precision across all relevant physics processes.
This contribution details the construction of the ML model and the validation of the full workflow under the anticipated data-taking conditions of the HL-LHC's first run. Our validation studies include detailed comparisons of physics modeling for a diverse selection of processes, alongside robust CPU benchmark studies comparing the performance of standard reconstruction against our Track-Overlay approach. A core element of this effort is the rigorous evaluation of the ML model’s performance under the extreme conditions of the HL-LHC, which must contend with up to 200 pile-up vertices, far exceeding the 50–60 vertices encountered in current Run 3 data taking.Speakers: Clara Nellist (Nikhef National institute for subatomic physics (NL)), Fang-Ying Tsai (Stony Brook University (US)) -
73
Graph Neural Network Based End-to-End Track Reconstruction with Drift Chamber and CGEM at BESIII
We present an end-to-end track reconstruction algorithm based on Graph Neural Networks (GNNs) for a 35 layers multilayer drift chamber (MDC) combined with a 3 layers cylindrical gas electron multiplier (CGEM) in the BESIII experiment at the BEPCII collider. The algorithm directly processes MDC wire measurement and CGEM cluster as input to simultaneously predict the number of track candidates and their kinematic properties in each event. The reconstruction efficiency achieves parity with or surpasses traditional methods, demonstrating marked improvement for low-momentum samples. In addition to the track parameters, the detailed information of hit, such as position, momentum, flight length and left-right ambiguity, can all be predicted. Further improvements are anticipated as the research progresses.
Speakers: Yunhe Yang (Nankai University), Xinyu Zhuang -
74
Track Reconstruction for the COMET Drift Chamber
The COMET experiment is designed to search for charged lepton flavor violation (CLFV) through coherent muon-to-electron conversion, characterized by a 105 MeV electron signal. In Phase I, an all‑stereo‑layer Cylindrical Drift Chamber (CDC) is used as the main tracker for charged‑particle measurement. A key challenge is that all the signal tracks are curled and about one‑third of the tracks in the CDC are multi‑turn, which complicates both track finding and fitting. Furthermore, owing to the high luminosity of the experiment, tracking performance is challenged by high hit occupancy.
We present the track‑reconstruction pipeline developed for the COMET Phase‑I CDC. Our approach employs multiple track‑finding algorithms to identify track candidates. For track fitting, with an emphasis on multi‑turn trajectories, we describe an enhanced algorithm based on the Genfit toolkit. The performance of the full reconstruction chain will also be discussed.
Speaker: 兆轲 张 (中国科学院高能物理研究所) -
75
Efficient Graph Segmentation via Global Path Inference and Learned Edge Embeddings for Scalable GNN-based Tracking
Graph Neural Networks (GNNs) are a leading approach for particle track reconstruction, typically following a three-step pipeline: graph construction, edge classification, and graph segmentation. In edge-classification pipelines like ACORN, the segmentation step is often a trade-off between the speed of local algorithms (e.g., Junction Removal) and the accuracy of global algorithms (e.g., Walkthrough). While the latter achieves superior physics performance by evaluating global path features, its combinatorial complexity limits scalability.
In this talk, we introduce two novel methods to bridge this gap. First, we propose “D-WALK” (Dynamic programming-based WALKthrough), an algorithm that enables access to global path features without the exhaustive combinatorial search, significantly improving execution timing. Notably, D-WALK extends the capabilities of traditional walkthroughs by enabling efficient path comparisons in both directions, resolving ambiguities at junctions with both multiple incoming and multiple outgoing edges. Second, we introduce “Junction Resolution,” an extension to local segmentation that resolves graph ambiguities using learned edge embeddings from a modified GNN. We demonstrate that Junction Resolution achieves physics performance comparable to or better than global walkthroughs while maintaining the efficiency of local methods. Using the newly released ColliderML dataset, we show that these methods offer a scalable path forward for GNN-based tracking with substantial gains in both physics accuracy and computational throughput.
Speaker: Jay Chan (Lawrence Berkeley National Lab. (US))
-
70
-
Track 4 - Distributed computing
-
76
Multicore workload scheduling, then vs. now
This effort revisits the issue of scheduling multicore workloads on shared multipurpose, multi-user clusters. This issue was extensively studied and reported on for CHEP 2015. Since then, both the cluster-management technology and the typical grid-cluster workloads have evolved, with consequences for scheduling approaches.
The relevant developments will be discussed, and arguments made that the problem as originally described is currently less challenging, within some operational limits that will also be described (for example, whole-node scheduling).Speaker: Jeff Templon -
77
Job scheduling optimization for heterogeneous resources in the ALICE Grid
Authors: Maria-Elena Mihăilescu (National University of Science and Technology Politehnica Bucharest, maria.mihailescu@upb.ro), Costin Grigoraș (CERN, costin.grigoras@cern.ch), Latchezar Betev (CERN, latchezar.betev@cern.ch), Mihai Carabaș (National University of Science and Technology Politehnica Bucharest, mihai.carabas@upb.ro)
on behalf of the ALICE CollaborationJAliEn functions as the middleware backbone for the ALICE Grid, managing modules such as job scheduling, accounting, monitoring, and isolated execution environments. Currently, every payload is assigned a strict Time To Live (TTL), while execution hosts operate within fixed availability windows (typically 24 to 72 hours).
Due to the heterogeneous nature of Grid resources - varying in hardware, software, and system load - job TTLs are statically configured to accommodate the slowest hosts. This approach is sub-optimal - high-performance hosts are often unable to schedule payloads with these inflated TTLs because the remaining time in their open slots is shorter than the requested TTL. Conversely, when these jobs do run on faster hosts, they finish significantly earlier than the static TTL, resulting in underutilized slot time.
This contribution presents scheduling optimizations implemented in JAliEn to enhance Grid resource efficiency. We first analyze the current scheduling algorithm, demonstrating that job rejection is frequently caused by a mismatch between the requested TTL and the remaining slot time, rather than a lack of hardware resources (CPU/Disk). However, historical data indicates that most jobs complete well within their assigned TTL.
To address this, we propose two optimization strategies for Monte Carlo simulations and I/O-intensive payloads: historical prediction, which predicts job TTL based on the execution history of jobs with similar characteristics (production type, CPU model, and site configuration), and data scaling, which scales the TTL based on the ratio of assigned input data to the maximum possible data load.
Combined, these approaches maximize the utilization of batch queue slots and improve global resource usage across the ALICE Grid.Speaker: Maria-Elena Mihailescu (National University of Science and Technology POLITEHNICA Bucharest (RO)) -
78
Advanced support for heterogeneous resources for the future CMS Offline Computing
The resource landscape available to LHC experiments is evolving, driven by industry trends and funding-agencies policies, from traditional WLCG sites dominated by x86 CPU resources, towards larger consolidated facilities, with a growing fraction of supercomputing centers, and a rising degree of hardware heterogeneity. The CMS experiment, which has already demonstrated substantial throughput gains in its High-Level Trigger through the use of coprocessors, is now preparing to profit from these rapidly changing scenario by extending heterogeneous computing to its offline workflows.
To support this evolution, the CMS Computing Model, and in particular the Submission Infrastructure (SI) system, must be proactively adapted and exercised for a future in which GPUs and other co-processors play a central role. The SI is responsible for provisioning compute resources and matching them to data processing, simulation, and analysis tasks according to CMS priorities and policies. Ensuring the most effective use of heterogeneous resources therefore requires a number of enhancements of the current SI.
Ongoing efforts include extending the description of remote resources and job requirements via HTCondor classads to enable precise matchmaking in the context of heterogeneous hardware architechtures, as well as providing physicists with an accurate catalogue of available capabilities. Policies for acquiring and allocating heterogeneous compute slots, including opportunistic resources, must evolve to maximize their impact on CMS processing needs. The execution of selected benchmarks will allow the rating of individual devices, thus enabling a precise prediction of tasks execution time. In addition, the possibility of dynamically allocating a device to multiple payloads will result in a more efficient utilization. Other developments are being considered, namely on the configuration and execution of mixed CPU+GPU workflows, support for non-NVIDIA GPU architectures, and the access and use of heterogeneous slots at HPC facilities.
This contribution will present the current status of these activities, along with the following steps in SI towards fully supporting CMS offline computing on the future heterogeneous resource ecosystem.
Speaker: CMS Collaboration -
79
Fair-Share Versus Opportunism in Multi-VO Environments: The Complexity of Job Slot Allocation at the RAL Tier-1
Managing job-slot allocation in a multi-VO environment remains a persistent operational challenge for WLCG sites, particularly when each Virtual Organization (VO) employs distinct workload-management and scheduling behaviors. At the RAL Tier-1 (RAL-LCG2), more than a dozen VOs—including CMS, ATLAS, LHCb, and several smaller communities—compete for heterogeneous resources while relying on subtly different submission patterns, priority models, and pilot-job strategies. Ensuring that each VO receives its guaranteed share, while simultaneously enabling opportunistic exploitation of unused capacity, requires a balancing strategy that extends beyond static fair-share configurations.
This contribution examines the complexities associated with harmonising these competing requirements, focusing on the interactions between VO-specific schedulers and site-level controls. We discuss the divergent workload characteristics of the major LHC VOs—such as CMS’s high pilot turnover, ATLAS’s multi-queue depth behaviour, and LHCb’s latency-sensitive submission logic—and how these differences influence both resource contention and backfill opportunities.
We present operational experience from the RAL Tier-1 in tuning HTCondor and ARC-CE parameters to shape multi-VO throughput, including dynamic slot-partitioning strategies, negotiator policy refinements, and queue-level throttling designed to preserve fairness while maximising utilisation. Results illustrate that achieving balanced allocation is non-trivial: naïve configurations can lead to persistent starvation, inefficient backfill, or pathological pilot cycling. The study demonstrates that sustained high efficiency in a multi-VO environment requires continuous calibration of site-level scheduling knobs, coordinated with VO-specific workload characteristics, to deliver both fair-share compliance and robust opportunistic use of spare capacity.
Speaker: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory) -
80
Whole-node scheduling in ALICE: lessons learnt during five years of operation
ALICE Grid sites employ heterogeneous resource allocation policies, where each configuration is tailored to the specific conditions of the sites, their user communities, and local scheduling preferences. The design and implementation of JAliEn have been specifically developed to be flexible and adaptable to these varied configurations and execution systems, allowing it to utilize the allocated resources as efficiently as possible.
In the typical Grid scheduling scenario, sites have traditionally run with 8-core slots. However, a significant number of sites have begun migrating to larger allocations, which range from 16-core slots up to the use of whole nodes, where the latter case is typical in HPC resources. In the current operational scenario, half of the ALICE Grid CPU cores are still allocated in 8-core slots, while another third of the total CPU cores have already transitioned to whole-node scheduling. The remaining minority of resources are configured as 16, 64, or 96-core slots. Some recent nodes offer up to 640 cores in a single batch slot. Given its flexibility and potential to mix workloads with differing resource requirements and usage patterns, whole-node scheduling has become our preferred resource allocation scheme. It also proves highly beneficial for the efficient management of heterogeneous computing resources, such as GPUs. Consequently, we are focusing our major development efforts on improving resource management specifically within this scheduling model.
This contribution details how ALICE was one of the first LHC experiments to transition to whole-node scheduling and shares the experience we have accumulated during its five years of operation, from the initial implementation targeting High-Performance Computing (HPC) facilities to its current large-scale deployment. The key strengths of this scheduling model are presented, specifically highlighting the use cases where it has been essential for processing physics workloads that are critical to the experiment's scientific goals.
Speaker: Marta Bertran Ferrer (CERN) -
81
Advanced Scheduling Strategies for Improved CPU Efficiency in CMS Offline Computing
Efficient use of distributed computing resources is essential for sustaining the growing processing demands of the CMS experiment. Building on our previous work to assess and minimize unused CPU cycles, new advances in scheduling strategies that further improve resource utilization are being developed for the CMS Global Pool.
The CMS Submission Infrastructure team is deploying enhanced scheduling policies for our late-binding multicore pilot-job model, which is employed to aggregate and exploit large amounts of Grid and HPC compute resources. These policies include mechanisms for dynamically placing additional user payloads on pilot resources to maximize overall processing throughput, as well as the introduction of dedicated I/O-bound slots for jobs that are limited by data-access latency rather than CPU availability. Together, these strategies allow pilots to better balance job types and to optimize the utilization of compute capacity without affecting job success rates or remote sites stability.
Results demonstrating consistent CPU efficiency gains across multiple resource providers wiil be presented, along with an analysis of their impact on workflow throughput. These developments highlight the effectiveness of advanced job-placement and scheduling techniques inside CMS pilots and represent an important step toward increasing the overall efficiency of the CMS Submission Infrastructure.
Speaker: CMS Collaboration
-
76
-
Track 5 - Event generation and simulation: Fast Simulation 2
-
82
Fast Hybrid Simulation for the LHCb Calorimeters
About 90% of the distributed computing resources available to the LHCb experiment are used for physics event simulation, and half of the corresponding CPU time is spent on the Geant4-based simulation of the calorimetric system.
This talk presents a hybrid fast-simulation approach, implemented in the LHCb Gauss Simulation Framework, that combines the established hit-library technique with machine-learning models to accelerate the Calorimeter simulation by three orders of magnitude compared to detailed Geant4 simulation, while maintaining excellent accuracy. The method is designed to simulate not only the response of the electromagnetic and hadronic calorimeters, but also the punch-through particles from hadronic showers that reach the muon system located downstream of the calorimeter.
The performance obtained for the main types of particles entering the calorimeter, including photons, electrons/positrons, and charged hadrons will also be shown.Speaker: Matteo Rama (INFN Pisa (IT)) -
83
End-to-End Fast Simulation of the ALICE Zero Degree Calorimeter using Generative Models
End-to-End Fast Simulation of the ALICE Zero Degree Calorimeter using Generative Models
Davide Fuligno
On behalf of the ALICE Collaboration
Università di Pisa and INFN, Trieste ItalyThe ALICE experiment at the LHC faces unprecedented computing challenges in Run 3 and 4, necessitating innovative solutions to cope with the increased data-taking luminosity and the continuous readout. A critical bottleneck in the current simulation pipeline for Pb-Pb collisions is the Zero Degree Calorimeter (ZDC), which characterizes collision geometry by detecting spectator nucleons. The full Geant4 transport simulation of hadronic showers in the ZDC is currently so computationally expensive that it is omitted from standard Monte Carlo productions, resulting in the absence of a realistic modeling of forward energy, centrality, and spectator-nucleon multiplicity.
In this contribution, we present a novel deep-learning-based fast simulation framework designed to overcome this limitation. We employ a generative architecture combining an encoder with a neural network, trained on simulated samples of spectator protons and neutrons in Pb-Pb collisions. A distinguishing feature of our approach is its end-to-end capability. The model bypasses traditional hit generation and digitization steps by directly predicting the detector response in the form of digitized outputs. This allows the direct reconstruction of structures compatible with the ALICE Analysis Object Data format, thereby streamlining the entire simulation chain.
The inference engine is designed for seamless integration into the ALICE O2 software framework using the ONNX standard, ensuring portability across heterogeneous computing resources. Preliminary results indicate a computational speed-up of approximately six orders of magnitude compared to the full simulation.We will report on architecture optimization, hyperparameter tuning, and the comparative evaluation of generative models, including Normalizing Flows, Diffusion Models, and Conditional Flow Matching. These results, supported by validation studies, demonstrate the potential to enable high-statistics ZDC simulation in future ALICE production campaigns.Speaker: Davide Fuligno (University of Pisa and INFN Trieste (IT)) -
84
Electro-magnetic calorimeter shower simulation using machine learning techniques at BESIII experiment
The detailed simulation of electromagnetic calorimeters (EMC) remains computationally intensive due to simulation of millions of secondary particles.
Machine learning offers a promising alternative by bypassing explicit shower simulation, though its accuracy must be rigorously validated.In this work, we develop fast simulation models for the BESIII EMC using generative adversarial networks (GANs) and diffusion models. Initial experiments with a baseline conditional GAN show limitated accuracy across a broad range of experimental conditions. To improve performance, we integrate a pre-trained generator designed to produce richer conditional inputs, providing more precise guidance for the generation and leading to a significant improvement. Additionally, we design a conditional diffusion model capable of efficiently simulating multiple track types within a single architecture by injecting a track-type condition. Both models achieve accuracy comparable to Geant4-based simulation. The process is accelerated by up to three orders of magnitude.
A benchmark dataset of single-track events simulated with Gean4 is released for the study and to support further research, covering the full experimental condition space.
Speaker: Tong Liu -
85
Improving simulation of extreme EM calorimeter showers with deep neural networks
Calorimeter simulation is among the most resource-hungry components of modern collider experiments such as ATLAS and CMS, currently accounting for half of the total CPU budgets at the LHC, and will only increase in the future High Luminosity phase. This exploding computing demand and the arrival of sizeable open datasets such as CaloChallenge have spurred the development of numerous alternatives to GEANT4 based on state-of-the-art deep learning architectures to accelerate the simulation process. Nevertheless, despite their impressive performance, neural network generators have found limited use in production, due mainly to the inaccuracy in simulating rare events.
In this contribution, we present a study to improve the performance of generative networks in modelling extreme electromagnetic calorimeter showers. We first define a number of metrics sensitive to out-of-distribution generated showers and evaluate the best models from the CaloChallenge competition. Using these metrics as our guide and a data centric approach for training and fine-tuning, we retrain the models to focus on the distribution tail without impacting their performance in the core. We propose a post-processing method based on binary classifiers to "sculpt" the generated distribution into that of the GEANT4 truth. Our study is the first to address extreme shower modelling, establishing a viable procedure to improve precision in targeted regions of the phase space, and easing the path to adopting neural network simulators in HEP experiments.
Speaker: Minh-Tuan Pham (University of Wisconsin Madison (US)) -
86
Exploring Potential Pathways to Accelerate ePIC Detector Simulation
The ePIC Physics and Detector Simulations leverage the Geant 4 and DD4hep software frameworks, which serves as a single source of truth for detector description, ensuring consistent configuration across full (Geant 4/DDG4) and accelerated simulation models. As simulation complexity scales, we employed a systematic profiling methodology using the DD4hep plugin mechanism to pinpoint computational bottlenecks. This analysis definitively showed that optical photon propagation in Cherenkov detectors to identify particles and electromagnetic shower physics consume the largest fraction of compute time, thus defining our acceleration R&D priorities.
We have achieved significant advancements in GPU-based acceleration for optical photon transport. The EIC-Opticks framework, utilizing the NVIDIA OptiX Ray Tracing Engine, demonstrated an order-of-magnitude speedup over multi-threaded Geant 4 for a simplified Cherenkov detector geometry, validating a highly promising technique for low-to-moderate photon yield detectors. To ensure comprehensive performance evaluation, we established a parallel effort for comparative studies with other GPU transport solutions like the Celeritas project, implementing a Celeritas-DD4hep integration plugin via the G4TrackingManager.
Finally, we are exploring the integration of AI/ML surrogate models to accelerate detector simulations. We have already developed a framework-agnostic ML training and inference system for reconstruction tasks that provides the foundation for deploying new models. The complex particle showers within calorimeters are ideal candidates for FastCaloSim-inspired ML surrogate models. We are now investigating the ddFastSim DD4hep-native framework as a concrete path to integrate and validate these fast simulation models, making them accessible to all ePIC calorimeters within the existing DD4hep framework.Speaker: Sakib Rahman -
87
Evaluation of VecGeom Solids in the Context of ALICE O2 Simulation
VecGeom is a modern C++ geometry modeling library specifically designed to accelerate particle detector simulation by leveraging Single Instruction Multiple Data (SIMD) vectorization. It offers optimized geometric primitives, developed in collaboration with the USolids project. Since Geant4 10.5, users can replace native Geant4 geometry primitives with VecGeom solids. This feature has already been adopted by several LHC experiments, with CMS simulations reporting a significant 7–13% CPU speed improvement from code improvements alone.
The ALICE O2 (Online-Offline) computing system uses a simulation framework based on ROOT's TGeo for geometry definition and Geant4 for particle transport via the Virtual Monte Carlo (VMC) interface, employing G4Root for geometry navigation. This geometry stack is currently suboptimal for modern hardware and vectorization.
This paper presents the integration and performance evaluation of VecGeom solids within the ALICE O2 simulation framework. We detail the necessary integration effort within the core packages, Geant4 VMC and VGM, to enable the transparent use of VecGeom's accelerated primitives while preserving the existing TGeo-defined geometry and G4Root-based navigation. We report on the results of performance benchmarks using realistic ALICE O2 simulation workflows, comparing the baseline configuration against the new configuration utilizing VecGeom solids. The evaluation focuses on quantifying the resulting reduction in total CPU time for the full simulation. We expect to confirm a measurable
efficiency gain, likely in the few-percent range, on the total simulation time.Integrating VecGeom into the core VMC packages will allow other VMC-based experiments to benefit from improved simulation speed with minimal further implementation effort.
Speaker: Ivana Hrivnacova (Université Paris-Saclay (FR))
-
82
-
Track 6 - Software environment and maintainability: Testing, QA, and ValidationConveners: Arantza De Oyanguren Campos (Univ. of Valencia and CSIC (ES)), Ruslan Mashinistov (Brookhaven National Laboratory (US))
-
88
Tackling complexity in the development of HEP applications by introducing unit testing in the Gaudi framework
For over 20 years, the Gaudi framework has been used by major HEP experiments, including the LHCb and ATLAS experiments on the Large Hadron Collider (LHC) but also in the Future Circular Collider (FCC) studies. Testing mechanisms have been present almost from the beginning of the framework, but the number of applications and the corresponding amount of code to validate have increased significantly since then. Besides, the current technologies in place only support integration tests, which are run in a continuous integration pipeline whenever code is pushed to a repository, as well as on a nightly basis. This limitation results in three major issues: An increased complexity for developers seeking to validate their algorithms, the existence of redundant integration tests, and the misuse of computational resources.
We define a three-phase approach to tackle these problems. This paper presents a new tool that aims to enable unit testing in the Gaudi framework and applications that rely on it, while being compatible with the current integration-based testing paradigm. This tool allows users to split existing integration tests into independent sections of varying sizes, including at the unit level, ensuring they can run without any other algorithms and services. It also paves the way for a second phase, which will consist in an analysis of the level of redundancy in the current suite of tests in the LHCb experiment.
Speaker: Pol Muñoz Pastor (La Salle, Ramon Llull University (ES)) -
89
Hypothesis-awkward: Property-Based Testing Strategies for Awkward Array
Hypothesis-awkward is a collection of Hypothesis strategies for Awkward Array. Awkward Array can represent a wide variety of layouts of nested, variable-length, mixed-type data that are common in HEP and other fields. Many tools that process Awkward Array are widely used and actively developed. Unit test cases of these tools often explicitly list many input samples in attempting to cover edge cases. However, in practice, such manually enumerated test samples can cover only a small portion of the vast combinatorial space of valid Awkward Array instances. Hypothesis, a Python property-based testing library, strategically generates test data that can fail test cases and automatically explores edge cases; developers do not need to craft test data manually. Early versions of hypothesis-awkward include strategies that generate Awkward Arrays converted from NumPy arrays and nested Python lists. The collection extends toward comprehensive strategies that generate fully general Awkward Arrays with multiple options to control the layout, data types, missing values, masks, and other array attributes. These strategies can generate thousands or more test samples per test case, automatically searching for rare bugs. These strategies help close in on edge cases in tools that use Awkward Array and Awkward Array itself.
Speakers: Tai Sakuma (Princeton University), Tai Sakuma (Princeton University) -
90
A Record-and-Replay Workflow for Floating-Point Error Analysis of GPU Kernels
Reliable floating-point behavior is increasingly difficult to ensure as HEP applications adopt heterogeneous architectures, multiple GPU vendors, and aggressive compiler optimizations such as fast-math. We introduce a non-intrusive workflow that enables detailed floating-point error analysis of GPU kernels without modifying application code. The method records SYCL kernel executions on Intel GPUs using the OpenCL Intercept Layer, capturing SPIR-V kernels and their input/output buffers. These kernels are then replayed on CPUs through PyOpenCL and the PoCL runtime, where they are instrumented by Verificarlo to explore IEEE-compliant behavior, stochastic arithmetic, and reduced-precision formats.
By isolating kernel execution from the surrounding application, the workflow accommodates the additional overhead of floating-point instrumentation while preserving a realistic production workload. This targeted replay model supports the validation process without architecture-specific code paths or rebuilding the full application. Demonstrated on a gravity kernel from HACC (Hardware/Hybrid Accelerated Cosmology Code), used for extreme-scale cosmological simulations, and currently being evaluated on tracking kernels from the ACTS/traccc project, a performance-portable particle tracking framework, it delivers reproducible cross-runtime behavior, well-structured ULP-error distributions, and clear quantification of reduced-precision and fast-math effects.
The workflow offers a sustainable mechanism for HEP developers to assess numerical robustness as software evolves and hardware diversifies, supporting long-term maintainability and reproducibility.
Speaker: Esteban Rangel -
91
The ATLAS C++ static checker
For the development of its offline C++ software, ATLAS uses a custom static checker. This is implemented as a gcc plugin and is automatically enabled for all gcc compilations by the ATLAS build system. This was an important tool for the multithreaded migration of the ATLAS offline code, where it was used to flag constructs which are legal C++ but not thread-friendly. Besides thread-safety, the checker also enforces some ATLAS code quality rules, such as naming conventions, use of proper base classes for Gaudi components, and proper use of Gaudi event context objects. This talk will discuss the capabilities and implementation of the checker and the experience of using it at ATLAS, as well as comments on using it on other code bases, such as key4hep.
Speaker: Scott Snyder (Brookhaven National Laboratory (US)) -
92
Enabling end-to-end testing: Porting low latency gravitational wave search pipelines to kubernetes
The LIGO, Virgo, and KAGRA gravitational-wave (GW) detectors exchange and analyse data at low latency to identify GW signals and rapidly issue alerts to the astronomy community. This low-latency computing workflow comprises multiple complementary search pipelines that continuously process streaming detector data, followed by an orchestration layer that produces an optimized GW event candidate for dissemination. While this system has evolved over many years, several components retain bespoke operational requirements and tightly coupled deployment models, making comprehensive end-to-end testing of the low-latency GW computing stack difficult.
To address these challenges, the University of Geneva (UniGe), building on work within the Virgo Collaboration, is developing a cloud-native testbed for end-to-end testing of low-latency GW computing based on Kubernetes. By porting containerized search pipelines to Kubernetes, the deployment model improves portability, reproducibility, and scalability. Given the continuous, streaming nature of GW data analysis, Kubernetes services provide a natural paradigm compared to the HTCondor deployments commonly used today. This contribution presents the design of the Kubernetes-based testbed and reports on experience gained and lessons learned from migrating several low-latency GW pipelines to a cloud-native environment.
Speaker: Paul James Laycock (Universite de Geneve (CH)) -
93
AllocMonitor: An Extensible Memory Monitor
A major constraint on CMS production jobs is the amount of memory they require. CMS needs the ability to monitor and investigate memory usage in its software. General-purpose memory profilers often significantly slow down the monitored application, and require lots of additional memory. For many cases, the detailed information about memory allocations and deallocations recorded by a general-purpose profiler are not necessary.
The CMS AllocMonitor package provides a generalized facility for adding callbacks for when C++ allocates and deallocates memory using the C++ standard approved APIs. The callback system is fast and has low memory overhead. The work done by the callbacks primarily determines the full extent of the time and memory overheads of the profiling, and therefore can be tuned for the different use cases. For example, sometimes one only needs a very fast thread-safe overview of the amount of memory used by a job. At other times, one might want to know for a specific algorithm the per-thread memory usage.
This contribution will discuss three main items about AllocMonitor. First, an overview of the mechanisms used to acquire the data from the standard C++ API are described. Second, details about how one can use and customize the gathered data for different use cases are presented. Third, real world examples of how these tools have been used to monitor memory usage in CMSSW Continuous Integration (CI), as well as to investigate the causes for excessive memory usage in CI and production jobs.
Speaker: CMS Collaboration
-
88
-
Track 7 - Computing infrastructure and sustainability
-
94
Energy-aware compute resource modulation at the WLCG PIC Tier-1 site: drainage strategies, CPU frequency scaling, and predictive control
The rapid growth in data centre energy demand poses significant challenges for the sustainability of large-scale scientific computing. In alignment with CERN and WLCG strategies on environmentally responsible computing, this work investigates methods to reduce energy consumption, electricity costs, and CO₂ emissions at the PIC WLCG Tier-1 site through energy-aware compute resource modulation.
Three complementary studies are presented. First, simulated natural job drainages were applied to real HTCondor utilisation data from 2023–2024 to assess the impact of temporarily halting job acceptance during periods of high electricity prices or carbon intensity. While this approach achieved limited economic and environmental gains, it resulted in disproportionate computational losses, primarily due to non-energy-aware scheduling, hardware heterogeneity, hyperthreading effects, and long job runtimes. These results highlight the limitations of naïve drainage strategies.
Second, dedicated experiments were conducted to evaluate the impact of dynamically adjusting CPU clock frequencies across PIC compute nodes. The study quantifies the relationship between CPU frequency, delivered compute performance, power consumption, and energy efficiency, demonstrating that frequency scaling can offer meaningful reductions in power draw and operational costs with controlled performance degradation. This enables finer-grained, node-level modulation of the compute farm compared to natural drainage strategies.
Finally, an XGBoost-based machine learning model was developed to predict CPU core availability following real-time drainage decisions using only information available at decision time. Trained on two years of site-specific HTCondor data, the model accurately forecasts core reductions, particularly in the 8–40 hour window after a drainage event, enabling proactive and informed resource management.
Together, these results provide actionable insights and practical tools for implementing energy-aware scheduling and control at PIC, and offer a scalable framework applicable to other WLCG sites pursuing sustainable computing operations.
Speaker: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES)) -
95
Estimating the Carbon Footprint of Computations in the LHCb Grid
A comprehensive assessment of the environmental impact of the LHCb distributed computing requires a detailed understanding of its carbon footprint sources. This involves moving beyond a simple comparison of regional carbon intensity, as the hardware executing the jobs exhibits significant variation in both energy efficiency and computational performance in HEP tasks.
In this work, we present the first estimates of the carbon footprint for individual LHCb jobs and entire computing sites. Our model integrates historical grid job data, hardware performance and power profiles, energy consumption models, and region-specific carbon intensity data. Key findings include and a comparative performance analysis of over 100 CPU models in events generation and per site rate of emission. Based on these results, we provide recommendations for computing sites, the LHCb experiment, and the wider WLCG community to optimize for sustainability.
Speaker: Henryk Giemza (Warsaw University of Technology) -
96
The WLCG Sustainability Forum: Progress and Directions
The WLCG Sustainability Forum was set up in the summer of 2025, to build on the momentum generated by the WLCG Sustainability Workshop in December 2024 and the WLCG workshop plenary session on sustainability in April 2025. In this presentation we review the topics covered in the approximately monthly meetings and highlight community progress towards a better understanding of how to deliver LHC computing in the most sustainable manner. We consider progress in terms of the four-M’s that underpin these efforts: Measure, Model, Monitor, and eventually Moderate carbon impact. The first of these refers to the ability to measure power-usage from both the job, and the facility, sides and translate that information into a meaningful carbon footprint. With sufficiently granular data, carbon-models can be developed to inform users and facilities, and longer-term monitoring can be implemented, which should lead to moderation of the carbon footprint. As power supplies become less carbon-intense, the impact of embodied-carbon becomes increasingly important, leading to an evolving optimisation of hardware lifetime and utilisation. Finally, we note that storage is a significant part of the equation that must also be addressed. This presentation is intended as an overview of community work, setting the scene for more detailed presentations on individual topics, but will consider what we have learnt so far, and the future directions.
Speaker: David Britton (University of Glasgow (GB)) -
97
Evaluating Performance and Power Efficiency of Ceph Storage Configurations for Large-Scale Scientific Computing
Large-scale scientific computing relies on cost-effective, high-capacity storage systems to support data-intensive workloads , such as those from the Worldwide LHC Computing Grid and future data-intensive sciences like the Square Kilometre Array Observatory. At STFC, we evaluated three Ceph-based storage configurations – 8TB HDD, 22TB HDD, and 15TB TLC NVMe flash. Using low level benchmarks across varied erasure coding and replicated pool layouts, we measured performance and power efficiency under varied workloads. Results show NVMe delivers superior small I/O performance and better IOPS-per-watt, while dense HDD remains cost and power-effective for capacity-driven workloads. However, idle power dominates overall energy consumption, making device density a key factor in reducing operational costs. We discuss trade-offs between cost, performance, and sustainability, highlighting erasure coding layout impacts and the emerging viability of QLC flash. We also draw comparisons with the at-rest power consumption of our tape storage. These findings could guide procurement strategies for large-scale scientific infrastructures seeking to optimize performance and energy efficiency.
Speaker: Thomas Byrne -
98
Development of a Breathing Computing Center for HEP
Modern computing sites need to operate on state-of-the-art hardware to achieve efficiency in both economic and environmental terms. As a consequence, sites accumulate substantial amounts of legacy equipment that is no longer competitive for continuous operation. However, this equipment still provides meaningful compute capacity and becomes attractive again when electricity prices are low or even negative, or when local renewable energy production creates periods of energy surplus.
The Breathing Computing Center project addresses this challenge by enabling legacy hardware to be activated opportunistically and on demand during such low-cost, green-energy windows. When renewable sources (e.g., solar generation) produce excess power, the site can temporarily integrate older machines back into service without compromising sustainability targets.
To support this, the dynamic orchestration tool TARDIS is extended to include dynamic, transparent, and on-demand bare-metal resource provisioning through Red Hat Satellite or IPMI interfaces. Combined with the balancing framework COBalD, TARDIS can incorporate real-time energy availability into its decision logic and scale legacy hardware usage accordingly. This allows sites to tap into additional compute capacity when energy is sustainable and cheap, and withdraw it smoothly when conditions change.Speaker: Lars Sowa (KIT - Karlsruhe Institute of Technology (DE)) -
99
The High-Low project: High-Performance Algorithms for Low Power Sustainable Hardware
Measurements of power consumption and sustainability are an imperative matter in view of the next high luminosity era for the LHC collider, which will largely increase the output data rate to perform physics analysis. In the context of the High-Low project at IFIC in Valencia, and involving the ATLAS and LHCb experiments, several studies have been conducted to understand how to optimize the energy usage in terms of the computing architectures and the efficiency of the algorithms which are running on them. They include tracking reconstruction algorithms running in real time at 30 MHz, physics event generators, detector simulations and prospects for quantum computing algorithms. Several architectures (CPUs, GPUs, and FPGAs) are evaluated to assess their performance, energy efficiency, cost, and potential for sustainable high-performance computing.
Speaker: Arantza De Oyanguren Campos (Univ. of Valencia and CSIC (ES))
-
94
-
Track 7 - Computing infrastructure and sustainability
-
100
Watt Matters: cluster-level frequency control for lower CO2 in HEP
We present a pragmatic study of energy-management strategies in a WLCG Tier-2 environment. Building on prior node-level benchmarking (HS23/Watt) and IPMI-based telemetry, we deployed coordinated CPU frequency modulation across the few hundred physical servers at ScotGrid Glasgow and measured cluster-level effects under controlled operating conditions.
Scaling CPU frequency to a mid-range value is a proven strategy to improve HS23/Watt in benchmarks, but it has not been widely tested in production. Our tests compare baseline operation with underclocked regimes and quantify the energy saved in the IT layer alongside impacts on job latency and throughput.
Using available real-time CO2 intensity data we compute net CO2 savings at cluster scale for representative production workloads. We describe our methodology for data collection and validation, explain how cluster operation can be aligned with temporal variations in grid carbon intensity, and offer practical, actionable recommendations and a reusable measurement framework that other sites can adopt.Speaker: Emanuele Simili -
101
CERN IT's Approach to Environmental Sustainability
Environmental Sustainability of computing has garnered public attention. CERN IT is taking an active role to minimise its environmental impact. This contribution will describe how CERN IT assess its carbon footprint, reduces the impact through improvements of the infrastructure, conscious purchasing and lifecycle management. It will also cover the impact of the the improvement of the efficiency of some of the HEP communities' core software components, based on work conducted within the IT department. This contribution will include a comparison between the different activities and the overall impact of the LHC program and point out options for future activities related to sustainability and the limitations in this area.
Speakers: Eric Bonfillou (CERN), Markus Schulz (CERN), Wayne Salter (CERN) -
102
A study of heat reuse from a scientific computing facility
Reusing heat from computers has the potential of reducing the environmental
impact of scientific computing in cold places with low carbon electricity.In previous work, we have done a lifecycle analysis of carbon emissions
from scientific computing[1] and in this work we included a simplistic
model for how heat reuse in northern Sweden could affect the total carbon
footprint of WLCG computing and scientific computing in general.This is a detailed study of current practices and potential changes in how
heat reuse from HPC2N's computing facility provides heating for the Umeå
University campus. The current practice is based on heat pumps and district
heating and cooling in a mix, and we provide a study of running this optimised
for carbon emissions or minimal financial cost. Potential scenarios includes
hot water cooling and other cases that involve significant changes in both
computers and infrastructure.We present this in both summary CO2e/HS23 ratios for lifecycle use like in[1],
but also go into details regarding practical limits and concerns from the
facility point of view for the entire campus, including reliability and cost.1) Wadenstein, M., Vanderbauwhede, W. Life cycle analysis for emissions of scientific computing centres. Eur. Phys. J. C 85, 913 (2025). https://doi.org/10.1140/epjc/s10052-025-14650-8
Speaker: Mattias Wadenstein (University of Umeå (SE)) -
103
Operational Experience of a Refurbished, Energy-Efficient Data Centre at Queen Mary University of London
Queen Mary University of London completed a long planned [1] major refurbishment [2] of its data centre in Autumn 2024. The GridPP Tier-2 cluster is the main tenant of the datacentre which had been upgraded with heat recovery technology to improve energy efficiency whist also increasing rack capacity.
This contribution reports on the operational experience of the facility from initial commissioning in late 2024 through to mid-2026, focusing on stability, metrics, performance, key operational challenges, lessons learned, and infrastructure developments during sustained production use. The success of the system is evaluated through analysis of preliminary energy-use data and heat recovery data.
During this period, additional compute and storage resources have been deployed to support WLCG workloads, primarily for the ATLAS experiment, alongside preparations for increased demand from the High-Luminosity LHC and future astronomy projects such as LSST and SKA. The resulting increase in rack density and heat output has provided valuable insight into the behaviour of the enclosed hot-aisle, in-row water-cooled infrastructure and the water-to-water heat pump system under real operating conditions.
The presentation discusses capacity growth, day-to-day operations, and infrastructure issues, including cooling and power stability, monitoring, and recovery from incidents such as cooling outages, water leaks, and power events. Incremental improvements and mitigations introduced since commissioning are described. Updated measurements of energy use, heat recovery into the university district heating system, and carbon savings are presented, along with planned developments and anticipated challenges through May 2026 and beyond.
[1] https://doi.org/10.1051/epjconf/202429507012
[2] http://doi.org/10.1051/epjconf/202533701253Speakers: Alex Owen (Queen Mary University of London), Dr Sudha Ahuja (Queen Mary University of London) -
104
European XFEL Scientific Data Infrastructure.
At photon-science facilities such as the European XFEL, large data volumes are generated at multiple experiment stations and under frequently changing configurations.
The experiments that produce these data typically last only a few days and are carried out by external user teams.In this environment, effective management of experimental data is essential for delivering timely, high‑quality scientific results,
ensuring that data produced at large-scale research facilities can be reliably captured, accessed, processed, and preserved.We present the architecture and operation of the European XFEL data management infrastructure,
built around a four-tier storage model tailored to the different phases of the data lifecycle.An online storage layer located close to the instruments is designed for high performance and exceptional reliability.
It buffers the data produced by instruments at extreme rates, reaching up to 15 GB/s per individual detector.
A high-performance storage layer, located in the DESY computing centre, supports both prompt processing during beam time and subsequent offline analysis.The data management infrastructure is connected to the European XFEL experiment hall’s InfiniBand fabric via a 4.4 km, 1 Tb/s link.
Mid-term access to data is provided by a mass storage layer, while a tape archive ensures reliable long-term preservation with a retention time of at least 10 years.
Together, these systems support the handling and processing of up to 2 PB of newly recorded data per day and are tightly integrated with a shared compute cluster
for near-online analysis, as well as supporting remote analysis by external users for several years after the experiment.In the context of environmental sustainability, the continual and future operations of European XFEL will require a review of resource consumption and usage policies.
Therefore, in addition, we discuss emerging sustainability measures, including per-user and per-job energy and emissions reporting,
comprehensive power metering of data centre infrastructure, and dynamic resource provisioning linked to user demand and green energy availability,
developed in the context of projects such as RF2.0.Speaker: Janusz Malka (European XFEL GmbH) -
105
Monitoring Sustainability at an ATLAS Tier 2 Computing Centre
AbstractMonitoring and improving the sustainability of large-scale computing infrastructures has become an increasingly important challenge in High Energy Physics. This work presents the design and implementation of a sustainability-oriented monitoring dashboard for an ATLAS Tier 2 computing centre. The dashboard integrates global site-level metrics and proposes a set of job-level metrics aimed at evaluating both computational efficiency and environmental impact. The dashboard is developed and deployed at the IFIC Tier 2 centre, where detailed operational and energy-related data are analysed to explore how sustainability indicators can be derived and linked to computing workloads. We present our approach to monitor per-job energy consumption, which is not trivial since computing nodes usually run multiple workloads and energy measurements are generally available only at the system level. We also explore ways to make the proposed model usable at other ATLAS Tier 2 sites, studying possible methods to access the necessary data for its adoption. The results provide a foundation for sustainability-aware monitoring within the ATLAS distributed computing infrastructure.
Speaker: Miguel Villaplana (IFIC - Univ. of Valencia and CSIC (ES))
-
100
-
Track 8 - Analysis infrastructure, outreach and education: Preservation & reproducibility
-
106
Comprehensive Data and Analysis Preservation at RHIC: Lessons Learned and Path Forward
The RHIC experiments at Brookhaven National Laboratory have developed a comprehensive Data and Analysis Preservation (DAP) plan, covering PHENIX, STAR, and sPHENIX. This multi-faceted effort addresses the critical challenge of ensuring long-term accessibility of large volumes of nuclear physics data and reproducibility of analyses developed over 25 years of the RHIC program as the community transitions toward the Electron-Ion Collider era.
The DAP strategy encompasses multiple complementary approaches including infrastructure for long-term data storage, software and workflow preservation techniques, and knowledge management systems designed to bridge institutional memory gaps. Significant progress in demonstrating analysis reproducibility have been made not only by encouraging experiments to thoroughly document their workflows but also by enhancing the description of their MetaData. We are in the process of establishing frameworks that enable not just reproduction of published results but also the potential reanalysis with modified parameters.
This presentation will cover our phased implementation approach, the role of emerging technologies in improving data accessibility, and the importance of coordination across multiple experiments to leverage shared infrastructure and lessons learned. We will discuss practical challenges encountered in maintaining complex multi-decade experimental datasets and share recommendations for other experiments preparing for HL-LHC, the EIC and beyond, with a particular focus on ensuring that scientific outputs remain accessible to future researchers.
Speaker: Eric Lancon (Brookhaven National Laboratory (US)) -
107
Towards fully reproducible physics analyses: lessons learned and practical implementations at LHCb
Reproducibility has become a cornerstone of modern particle physics analysis, ensuring that scientific results can be validated, extended, and reinterpreted by the broader community. Building on previous work on analysis modularization and workflow management, this contribution presents practical experiences in achieving full reproducibility for physics analyses at the LHCb experiment. We discuss the implementation of containerized analysis environments, version-controlled workflows, and automated validation pipelines that together form a complete reproducibility chain from raw data to final results. The challenges encountered when applying FAIR principles to complex multi-stage analyses are examined, including dependency management, software environment preservation, and documentation standards. We demonstrate how these practices have been applied to analyses using LHCb Run~2 data, highlighting both successes and remaining obstacles. Furthermore, we explore the integration of analysis preservation tools with collaboration-wide frameworks and discuss strategies for long-term maintainability. The lessons learned provide guidance for achieving reproducibility at scale as the field prepares for the high-statistics era of Run~3 and beyond.
Speaker: Dr Mindaugas Sarpis (Vilnius University (LT)) -
108
Celebi: a Novel Architecture for Reproducible and Preserved Physics Analyses
Reproducibility and transparency are increasingly critical in high-energy physics, where analyses rely on complex, evolving workflows and heterogeneous software environments. While existing initiatives such as the CERN Analysis Preservation portal and REANA provide essential infrastructure, the day-to-day management and long-term maintainability of individual analyses remain fragmented and analyst-dependent.
We present Celebi, a new analysis-management architecture and toolkit designed to support the full lifecycle of a physics analysis, from active development to long-term preservation. Celebi combines a descriptive and distributed workflow specification with a structured file-organization model and an immutable “impression” mechanism that captures versioned analysis states. A dedicated middleware layer, Yuki, translates high-level analysis logic into executable workflows for containerized backends, enabling seamless execution on platforms such as REANA using Snakemake.
By integrating workflow description, provenance tracking, execution, and preservation into a coherent framework, Celebi enables reproducible execution, transparent re-analysis, and flexible analysis evolution with minimal disruption to existing analyst practices. This approach provides a practical and scalable foundation for collaborative development and long-term analysis preservation in modern high-energy physics experiments.
Speaker: Mingrui Zhao (Peking University (CN)) -
109
Building a local Virtual Research Environment for the Einstein Telescope project
We present the development of a Virtual Research Environment (VRE) for the Einstein Telescope (ET) project, implemented within the Bologna research unit to support collaborative, high-performance, and reproducible research across the ET community. The Einstein Telescope is a next-generation underground gravitational-wave observatory designed to explore the Universe throughout its cosmic history, and its scientific goals require advanced computational and data-analysis capabilities.
The ET Bologna VRE is built on the BETIF/DIFAET computing infrastructure, funded by the Italian NRRP, and adopts a modular, cloud-native architecture using open-source technologies including Docker, Kubernetes, Jupyter, and CERN’s Rucio/Reana frameworks. This design enables both interactive analysis and large-scale computations within an orchestrated environment. The platform is fully customizable, supporting multiple software stacks via CVMFS and providing seamless integration with external Rucio Storage Elements for distributed data management. Authentication and authorization are handled through Indigo-IAM, ensuring compliance with the ET federation.
The system supports heterogeneous resources, including CPU and GPU nodes, and allows dynamic scaling based on workload and user needs. Through its Python-friendly interface and integration with common scientific frameworks, the VRE lowers the entry barrier for analysis development while guaranteeing portability of collaboration workflows.
Beyond its immediate application to data analysis and algorithm prototyping, the ET Bologna VRE serves as a testbed for future computational strategies within the broader ET project. It demonstrates how local resources can be orchestrated into a flexible, cloud-native environment, paving the way for a distributed, sustainable, and collaborative data-analysis model essential for the next era of gravitational-wave astronomy.Speaker: Tommaso Diotalevi (Universita e INFN, Bologna (IT)) -
110
Construction and Application of a Computing Platform for Diverse Data Analysis Scenarios at HEPS
The High Energy Photon Source (HEPS), located in Beijing, is an advanced public research facility designed to support multidisciplinary scientific innovation and high-technology development. HEPS is scheduled to complete construction and enter operation in 2026. It will deliver synchrotron radiation with high energy, high brilliance, and high coherence, achieving spatial, temporal, and energy resolutions at the nanometer, picosecond, and millielectronvolt levels, respectively, making it one of the brightest fourth-generation synchrotron light sources worldwide.
During operation, HEPS will generate massive, heterogeneous, and highly time-sensitive experimental data, posing significant challenges to computational resource scheduling, data processing efficiency, and the flexibility of analysis environments. To address the diversity of experimental data types, the heterogeneity of analysis workflows, and the dynamically evolving computational demands of users, this work integrates multiple data analysis scenarios to construct a unified computing platform for HEPS.
The platform is built on a cloud-native technology stack and integrates virtualization, containerization, and high-performance computing resources, forming a comprehensive computing service architecture that supports various application scenarios, including online monitoring, offline data reconstruction, large-scale batch data analysis, and interactive scientific computing. Through unified resource management and scheduling, elastic allocation and efficient utilization of computing resources are achieved. In addition, standardized data access mechanisms and encapsulated analysis environments lower the barrier for user adoption and improve the reproducibility and automation of data analysis workflows.
During the commissioning phase of HEPS, the platform has been deployed and validated across multiple beamlines and associated data processing tasks. The results demonstrate that the platform effectively supports the complex and diverse data analysis requirements of HEPS, significantly improving data processing efficiency and system stability. The design concepts and practical experience presented in this work provide valuable references for the construction of computing platforms for other large-scale scientific facilities.
Speaker: Qingbao Hu (IHEP) -
111
Status and future of the BaBar Long Term Data Preservation and Computing Infrastructure
BaBar stopped data taking in 2008, but its data is still analyzed by the collaboration. In 2021 a new computing system outside of the SLAC National Accelerator Laboratory was developed and major changes were needed to keep the ability to analyze the data by the collaboration, while the user facing front ends all needed to stay the same. While the new computing system has worked well since then, the hardware nears its end of life. We will report on the current system and its challenges to keep it running without funding for new hardware, as well as on plans to make the data, the analysis system, and the documentation public.
Speaker: Dr Marcus Ebert (University of Victoria)
-
106
-
Track 9 - Analysis software and workflows
-
112
Particle physics data analysis on the GPU: from IO to data reduction
The hardware landscape in today's data centers is rapidly evolving, with access to GPUs becoming the standard rather than the exception. Currently, physics data analysis using RDataFrame is still limited to execution on multi-core CPUs and distributed systems.
To reduce the time to results and enhance energy efficiency, we are investigating the feasibility of accelerating physics analysis with GPUs. Given that collision events are independent of one another, this workload is well-suited for leveraging the massive parallelism offered by GPUs.
However, we observe that the computational intensity of the analysis workflow is often quite low, rendering it memory-bound. To address the data movement bottleneck, we execute the entire analysis pipeline on the GPU. This pipeline includes data loading, decompression, filtering, deriving new values, and histogramming, utilizing capabilities such as NVIDIA's GPU Direct Storage. We assess the effectiveness of our solution using the ADL benchmark suite [1].
This work represents the first feasibility study of executing a full event-loop style analysis entirely on GPUs within the ROOT ecosystem. By clarifying the opportunities of analysis on the GPU, our study offers a foundation for informed analysis tools and computing infrastructure.
[1] https://github.com/iris-hep/adl-benchmarks-index"
Speaker: Lukas Breitwieser (CERN) -
113
Bridging the Vendor Gap: Enabling AMD GPU Support for Awkward Array via ROCm/HIP for the HL-LHC Era
The computational demands of the High-Luminosity LHC (HL-LHC) necessitate a transition toward heterogeneous computing environments. While the Scikit-HEP ecosystem has historically leveraged NVIDIA GPUs through CUDA, the increasing deployment of AMD-based supercomputers requires a vendor-neutral approach to performance portability.
This contribution details the design and implementation of the initial ROCm/HIP backend for Awkward Array, the foundation for Pythonic HEP analysis. By implementing the library’s kernel infrastructure in HIP, we enable the execution of complex, nested, and irregular array operations on AMD hardware without sacrificing the library's high-level user interface.
We present a performance evaluation conducted on the Princeton Della Cluster, utilizing AMD Instinct MI210/MI250 (CDNA 2) GPUs. Our benchmarks focus on kernel execution throughput and memory management efficiency for jagged data structures, comparing these results against traditional CUDA-based execution. Furthermore, we discuss the forward-compatibility of this infrastructure with the latest CDNA 4 architecture (AMD Instinct MI350 series). We highlight how the massive 288GB HBM3E memory and advanced datatype support of the MI350 series can further alleviate the memory-bottlenecks typical of HEP data processing. This work represents a critical step in providing the HEP community with a truly hardware-agnostic analysis stack, ensuring high performance across the full spectrum of modern HPC resources.
Speaker: Ianna Osborne (Princeton University) -
114
GPU acceleration of end-user analyses at the LHC
The upcoming high-luminosity era at the LHC (HL-LHC) aims to produce exabyte-scale datasets that will significantly increase opportunities for new physics discoveries at the energy frontier. At the same time, future analyses will be increasingly computationally demanding. Larger datasets, increased analysis complexity, and the widespread adaption of machine learning techniques in HEP will drastically lengthen analysis runtimes - and correspondingly decrease the physics research throughput - if current computational strategies remain. In this presentation, we explore how GPU acceleration in the analysis pipeline can decrease analysis turnaround time; representative analysis computations are benchmarked, and the challenges and potential acceleration of executing a full LHC analysis on GPU are discussed.
Speaker: Ianna Osborne (Princeton University) -
115
Flare: an open source data workflow orchestration tool
The FCCee b2Luigi Automated Reconstruction And Event processing (FLARE) package is an open source python based data workflow orchestration tool powered by b2luigi. FLARE automates the workflow of Monte Carlo (MC) generators inside the Key4HEP stack such as Whizard, MadGraph5_aMC@NLO, Pythia8 and Delphes. FLARE also automates the Future Circular Collider (FCC) Physics Analysis software workflow. These two workflows are naturally combined inside of FLARE allowing a user to have a fully automated pipeline from MC production to final FCCanalysis histograms. With its many customizations and easy to use API, FLARE can simplify running FCCee analyses especially those that require their own MC to be produced via the Key4HEP stack. FLARE also gives HEP researchers interested in the FCC project an easy way to begin FCCee analyses in an automated and controlled way. FLARE is available on PyPI as the hep-flare package.
Speaker: Cameron Harris -
116
b2luigi - bringing batch 2 luigi
Workflow Management Systems (WMSs) provide essential infrastructure for organizing arbitrary sequences of tasks in a transparent, maintainable, and reproducible manner. The widely used Python-based WMS luigi enables the construction of complex workflows, offering built-in task dependency resolution, basic workflow visualization, and convenient command-line integration.
b2luigi is an extension designed as a drop-in replacement for luigi. It introduces user-friendly input/output handling and seamless integration with batch systems such as HTCondor, LSF, Slurm, and the WLCG, thereby enabling heterogeneous tasks and computing environments to be combined within a single workflow. In addition, b2luigi provides dedicated interfaces for the Belle II analysis software framework and the Belle II distributed computing ecosystem.
The Belle II collaboration currently maintains and develops b2luigi. Recent enhancements include support for executing tasks in Apptainer containers, integration with the Slurm batch system, and the ability to define targets using the XRootD protocol. The documentation has been substantially expanded as well, now featuring a step-by-step tutorial covering all major functionalities. Today, the b2luigi package is an indispensable tool throughout the Belle II collaboration for orchestrating a wide range of complex workflows, including physics analyses, software release validation, data reprocessing, detector calibration, and the derivation of systematic corrections.
In this contribution, we present an overview of the current status of the b2luigi project, emphasizing recent developments, newly added capabilities, and its deployment within the Belle II collaboration. We also discuss its adoption beyond Belle II, demonstrating the versatility of b2luigi and its growing relevance within the broader high-energy physics community.Speaker: Alexander Heidelbach -
117
Reproducible and Modular Analysis Workflows in High-Energy Physics: Concepts and Implementation of a Code-Centric Approach
Modern high-energy physics (HEP) analyses rely on complex, multi-stage workflows combining heterogeneous software and distributed data. While individual analysis tools are well developed, their orchestration is typically ad hoc, leading to duplicated effort, inconsistent configurations, and limited reproducibility. Existing workflow systems based on static dependency graphs struggle to capture the iterative and complex nature of physics analyses.
We present a lightweight, generic workflow management tool built on the Python-based Luigi and law frameworks that automates the full analysis lifecycle. Analysis steps are implemented as parameterised task classes defining dependencies, outputs, and runtime behaviour directly in code, enabling dynamic task generation and flexible dependency structures beyond simple DAGs. The framework is backend-agnostic and automates dependency management, scheduling, versioning, and provenance tracking, ensuring reproducible and portable analyses.
The approach is demonstrated in an ATLAS top-group use case integrating TopCPToolkit, FastFrames, and TRExFitter, but is readily extensible to other analyses and domains. By providing a unified configuration and execution model, the framework enables consistent, shareable, and reproducible workflows. This long-term sustainability allows for analyses to be synchronised and re-evaluated seamlessly, thus providing a foundation for global combinations.
Speaker: Maximilian Horzela (Georg August Universitaet Goettingen (DE))
-
112
-
-
-
Plenary
-
118
SRCNet — Vision, Progress, and Cross-Community Computing for the SKA Telescope
SRCNet — Vision, Progress, and Cross-Community Computing for the SKA Telescope
The SKA Regional Centre Network (SRCNet) is a cornerstone of the Square Kilometre Array Observatory’s distributed science computing model, federating regional centres into a coherent global infrastructure providing user access to data, processing, and analysis.
The SRCNet Project is an international project to deliver the SRC Network, pulling in expertise and data centres from across 15 SKA countries and from within the SKA Observatory itself. Here we will present the progress we have made in the past three years, with particular emphasis on how solutions developed in the high-energy physics community are being evaluated, adopted and extended to meet SRCNet requirements.
We review the current state of the SRCNet global software stack and the local node deployments that comprise the SRC "Network" outlining the key technical and operational challenges at SKA scale. We conclude with an outlook on upcoming developments as SRCNet moves toward production readiness and early science operations in line with the SKAO's construction and move to operations.
Speaker: Rosie Bolton (SKA Observatory) -
119
From Petabytes to Discovery: The Computing Ecosystem Powering Rubin Observatory's LSST
The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) is set to revolutionize our understanding of the Universe with groundbreaking images and scientific results. In this plenary I will highlight early discoveries and iconic images, focusing on the data management system that enables Rubin science at scale. Rubin has drawn on synergies between the high-energy physics and astronomy communities, leveraging experience from LHC-scale distributed computing. CERN-developed tools such as Rucio, PanDA, and CERN VM-FS have been deployed at SLAC, the lead Rubin data facility and a Tier‑2 site for ATLAS and CMS, as well as at the French and UK Rubin data facilities, both long-standing LHC Tier‑1 centres, to enable large-scale data distribution and workflow management for Rubin science. I will describe the end-to-end Rubin computing ecosystem and highlight lessons from cross-disciplinary collaboration for future data-intensive experiments.
Speaker: Leanne Guy -
120
WLCG Technical Evolution: Preparing for HL-LHC with a Community-Driven Roadmap
With the end of Run 3 of the LHC approaching, the Worldwide LHC Computing Grid (WLCG) is entering an important transition toward the HL-LHC era. To meet the substantial increase in data volume, computational requirements, and resource heterogeneity, while preserving reliability, sustainability, and community cohesion, we have launched the development of the WLCG Technical Roadmap 2026-2030. The Roadmap is a consensus-driven, community-owned actionable plan that identifies key gaps and defines concrete milestones across the infrastructure and services landscape. It is organized into nine chapters each outlining a major area of evolution. Planning for WLCG facilities evolution, data management, networking, workflow management, authorization and authentication (tokens), security, WLCG services, heterogeneous resources and new infrastructures (for example, leveraging GPUs) are presented, taking into account that experimental requirements are still evolving. By presenting this roadmap at CHEP, we seek to engage the broader HEP community to gather feedback, foster collaboration, and build collective awareness. As one of the pillars of LHC physics and a cornerstone for data-intensive science even beyond the LHC, WLCG’s future relies on coordinated innovation and active community participation.
Speaker: WLCG Technical Coordination Board
-
118
-
10:30
Break
-
Plenary
-
121
HEP Benchmarking: Journey from CPU HEPScore23 to GPU Benchmarks in the HL-LHC Era
The transition to the HL-LHC era brings unprecedented computing demands and a rapidly shifting hardware landscape. Since its successful deployment in 2023, HEPScore has become the standard CPU benchmark for WLCG sites, replacing the legacy HEP-SPEC06. The journey to HEPScore was a major collaborative effort, involving software developers, data analysts, site personel, and the WLCG Deployment Task Force, to create a representative benchmark based entirely on real-world HEP applications.
Today, as GPU applications and GPU availability over the Grid rise, the community urgently requires a comparable standard to measure raw performance and assess the cost-benefit ratio of GPU resources. Building on the proven methodology of HEPScore23, the HEPiX Benchmarking Working Group is intensifying the development of a GPU equivalent, again utilizing representative HEP GPU workloads as payloads.
This next phase introduces complex new challenges, including heterogeneous architectures, strong workload-specific optimizations, and a fast-evolving hardware and software ecosystem. This contribution reviews the collaborative journey and technological solutions that made CPU HEPScore23 a reality, analyzes the novel challenges of GPU standardization, and presents our strategy and current development status for establishing a robust GPU benchmark for the WLCG.Speaker: Robin Hofsaess -
122
The end of the x86 dominance - orchestrating the heterogeneous Grid
For decades, the x86 architecture has been the bedrock of Grid computing. That era of uniformity is over. Driven by the specialized demands of next-generation applications, we have entered a Resource Renaissance - a period defined by the rapid proliferation of ARM and RISC-V CPUs and various types and generations of GPUs across Grid computing centers.
However, this hardware abundance carries a significant "complexity tax." As the computing landscape fragments, traditional resource allocation methods are hitting a ceiling in both efficiency and scalability. The "one-size-fits-all" approach to scheduling is no longer viable in an environment where the difference between hardware capabilities and software requirements is widening.
This talk explores the shifting paradigms in resource brokering, matching, encapsulation and monitoring. We will examine the architectural friction inherent in modern workflows and analyze the strategies ALICE has employed to mitigate these challenges. Specifically, we will discuss how ALICE provides a universal platform that abstracts hardware diversity, effectively shielding users from the underlying complexity of the modern Grid while maximizing the potential of specialized hardware.Speaker: Maksim Melnik Storetvedt (Western Norway University of Applied Sciences (NO)) -
123
Archiving 60 PB/month to tape — lessons learned and a look forward to Run-4
During the last year of LHC Run-3, several new records were set by the CERN Tape Archive (CTA) service at WLCG Tier-0: the rate of data archival to tape peaked at over 60 PB/month and the total volume of data grew to more than 1 Exabyte.
The CTA service was able to scale up to meet these demands thanks to architectural choices made prior to Run-3 as well as responses to specific operational problems encountered during the run. The architecture of CTA was a deliberate departure from the Hierarchical Storage Management (HSM) model of previous runs, but certain practices inherited from previous runs were addressed only as they had an operational impact.
This talk will explain both the architectural choices for CTA and the operational experience gained during Run-3. Topics covered will include hardware and systems architecture, I/O throughput planning, disk buffer management and specific operational mitigations. The talk is targetted at Tier-1 sites who will have to achieve similar tape archival performance during Run-4.
Speaker: Mr Julien Leduc (CERN)
-
121
-
12:30
Lunch
-
Track 1 - Data and metadata organization, management and access: Data transfers, federations and infrastructure planning
-
124
CMS transfer rate estimates in Run-4
For Phase-II of the Large Hadron Collider program, a dramatic increase in data quantity is expected due to increased pileup, higher experiment logging rates and a larger number of channels in the upgraded detector components. For Run-4, beginning in around 2030, and using the current computing model without software improvements, CMS estimates growth of an order of magnitude in computing resource needs compared with Run 3. We anticipate the demand to double again for Run-5, when the luminosity will reach the ultimate goals. This presentation discusses the construction of expected data transfer rates and the varied assumptions that contribute to the modeling. It further indicates where bottlenecks in our existing system may lie. The modeling considers the significant network traffic flows between major sites, as well as the expected peak read and write rates at tape endpoints. The results of these modeling exercises form the early planning stage for the upcoming WLCG “Data Challenge 2027”, scheduled for spring 2027. During this exercise the WLCG community will attempt to demonstrate the capability to handle typical Run-4 flows to the level of 50% of the estimated capacity.
Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB)) -
125
Advancing Global Scientific Infrastructure: Ongoing Mini-Challenges and Roadmap to WLCG Data Challenge 2027 (DC27)
The WLCG Data Challenge 2027 (DC27) represents a critical milestone in preparing our global distributed computing and networking infrastructure for the demands of HL-LHC and next-generation data-intensive experiments. Building on the successes and lessons learned from previous challenges, the DC27 program is driven by a coordinated series of mini-capability and mini-capacity challenges. These targeted tests address key aspects such as site network validation, end-to-end monitoring, protocol innovations, and performance scaling across Tier-1 and Tier-2 sites worldwide.
This presentation summarizes the work carried out in preparation for the challenge including results from recent mini-challenges, improvement of the monitoring with upgrades of the perfSONAR infrastructure, the implementation of flow and packet marking, packet pacing, and SDN (Software Defined Networking) integration. Notable achievements include improved network validation across major WLCG sites, successful deployment of advanced monitoring and accounting tools, and collaborative stress tests involving multiple science domains. We highlight the strategic value of incremental "unit tests" and cross-domain mini-challenges in isolating issues and accelerating readiness.
Looking forward, we present the roadmap toward DC27, detailing planned milestones for further hardware and software upgrades, broader adoption of IPv6 and SDN technologies, and expanded collaborative testing with emerging experiments such as DUNE and SKA. We also outline our approach to capacity scaling, site readiness evaluation, and systematic benchmarking against production workloads.
DC27 will serve as a showcase for best practices in orchestrating complex, multi-site data movement, robust monitoring, and agile adaptation to evolving research needs. This talk provides a compelling overview for the CHEP community, demonstrating the collective progress, challenges overcome, and the vision for sustaining leadership in distributed scientific computing.
Speaker: Alessandra Forti (The University of Manchester (GB)) -
126
FTS4 a first encounter with the experiments
Version 4 of the File Transfer Service (FTS4) is currently under active development within the CERN IT Storage group. This project aims to address issues which have prevented version 3 from being proposed as a candidate for automating bulk file-transfers during LHC Physics Run 4.
FTS4 has taken an incremental rather than big-bang approach to its development. FTS4 started with the FTS3 code base and then replaced target areas in a piecemeal fashion, keeping a fully functional system available at all times. This availability has purposely allowed FTS4 to be field tested with the LHC experiments at a relatively early stage in its development cycle. This was a critical goal in project planning. We must ensure FTS4 addresses experiment requirements and does not drift away from their core expectations.
The early field testing of FTS4 started with the ATLAS experiment towards the end of 2025 and finished with CMS and other interested parties by the end of the first quarter of 2026. This paper describes the goals of FTS4 and its developments, reports on the findings of the first field tests with the experiments and concludes with a description of the next steps.
Speaker: Nicola Pace -
127
ATLAS journey integrating into the SENSE/Rucio Priority Data Transfer Service
Given the increased amount of data expected during the HL-LHC and the escalation of data transfers that this implies, it becomes of paramount importance to have control over the available network bandwidth and the ability to allocate this bandwidth for high-priority and time sensitive data flows.
The Rucio/SENSE integration project intends to provide Rucio with Software Defined Networking capabilities to create bandwidth guaranteed network circuits on-demand for desired data flows. CMS has been the initial use case focus and this has now expanded to the ATLAS workflow. In this work we talked about the experience of the first two ATLAS sites that have joined this effort. Having different security constraints, on each of them, we have adopted different approaches on these two deployments. On NET2/UMass we have full access to their network infrastructure which has allowed us to experiment with different configurations and focus on maximizing throughput. On the other hand, in UChicago we have faced a more restrictive scenario with no network access, hence giving us the opportunity to try-out our software-router solution.
Speaker: Aashay Arora (Univ. of California San Diego (US)) -
128
Building a European federation for sync and share with CERNBox
CERNBox is a leading participant in the emerging European sync-and-share federation effort, promoting interoperable, standards-based collaboration across scientific communities. As an active contributor to European E-Infrastructures, it plays a key role in shaping open, federated data services. This contribution will present recent work on integrating CERNBox into the current sync-and-share ecosystem and review upcoming changes to the service.
Over the past year, CERNBox has improved interoperability through enhanced support for the Open Cloud Mesh (OCM) standard, strengthening collaboration within the EOSC Federation and opening new opportunities across the HEP community and beyond.
In parallel, significant technical advancements include improved performance and reliability of the EOS interface and the commissioning of a new hybrid CephFS driver that leverages kernel-mount performance, simplifying deployments across diverse storage backends.
Looking ahead, CERNBox is evolving toward the new Spaces model, introducing heterogeneous Spaces designed to support different privacy and collaboration requirements.
Together, these developments reinforce CERNBox’s role as an open, interoperable, and scalable collaboration hub for (large-scale) scientific communities.
Speaker: Diogo Castro (CERN)
-
124
-
Track 2 - Online and real-time computing
-
129
Advancing the CMS Level-1 Trigger: Jet Tagging and pT regression with Deep Sets at the HL-LHC
At the Phase-2 Upgrade of the CMS Level-1 Trigger (L1T), particles will be reconstructed by linking charged particle tracks with clusters in the calorimeters and muon tracks from the muon station. The 200 pileup interactions will be mitigated using primary vertex reconstruction for charged particles and a weighting for neutral particles based on the distribution of energy in a small area. Jets will be reconstructed from these pileup-subtracted particles using a fast cone algorithm. For the first time at the CMS L1T, the particle constituents of jets will be available for jet tagging. In this work we present a new multi-class jet tagging neural network (NN). Targeting the L1T, the NN is a small DeepSets architecture, trained with Quantization Aware Training. The model predicts the classes: light jet (uds), gluon, b, c, $\tau_h+$, $\tau_h-$, electron, muon. The model additionally predicts the $p_{T}$, using a new method compared to the previously introduced version of our model. For each jet constituent a weight and an offset are derived to correct the constituents $p_{T}$ and in turn the jet $p_{T}$, by summing over the corrected constituents. The new model enhances the selection power of the L1T for various important processes for CMS at the High Luminosity LHC such as di-Higgs and Higgs production via Vector Boson Fusion. Furthermore, it outperforms the previous strategy used to derive jet $p_T$ corrections, resulting in improved efficiencies for jet p$_T$ based selections. We present the model including its performance at object tagging and deployment into the L1T FPGA processors, and showcase the improved trigger capabilities enabled by the new tagger.
Speaker: Stella Felice Schaefer (Hamburg University (DE)) -
130
Modular Neural Network Deployment in hls4ml for Next-Generation LHC Trigger Systems
Machine-learning algorithms are becoming central to real-time event selection at the LHC, where future trigger systems must process substantially more complex detector information at fixed, sub-microsecond latencies. These constraints create a growing need for flexible workflows that can map large neural networks onto heterogeneous trigger hardware while preserving strict timing budgets. We present recent developments in hls4ml aimed at supporting these physics-driven requirements for future trigger systems, achieved as part of the ongoing Next-Generation Triggers (NGT) project.
The new Multi-Graph feature enables large neural networks to be decomposed into multiple subgraphs at chosen layer boundaries, allowing experiments to explore modular deployment strategies such as distributing subgraphs across FPGA regions, balancing latency paths, and performing step-wise optimisation on models that exceed standard HLS tool limits. As an additional benefit, the subgraphs created by the Multi-Graph feature can be synthesized in parallel, resulting in a great reduction in synthesis time (up to 3.5×) and enhanced debugging flexibility.
Complementing this trigger-motivated modularisation workflow, we introduce a plugin-based backend system that allows support for additional hardware targets to be developed externally. As an example, we highlight the aie4ml plugin, which brings support for AMD AI Engines and demonstrates how hls4ml’s parsing and quantisation infrastructure can be reused for non-HLS toolflows. While this work is independent of the Multi-Graph development, the plugin system provides the foundation for future studies in which subgraphs may be targeted to different accelerator technologies. In this way, these features illustrate how hls4ml is evolving into an extensible ecosystem capable of accommodating heterogeneous hardware relevant to long-term trigger R&D. These developments position hls4ml to better support the modular and scalable algorithm-design workflows required for trigger systems at the HL-LHC and beyond.Speaker: Enrico Lupi (CERN) -
131
Optimizing GNNs for the Wild: PyTorch-to-ONNX Acceleration of GarNet on CPUs and GPUs
Graph-based reconstruction methods are well-suited to the sparse and irregular geometry of modern calorimeters, but their deployment often depends on achieving low and predictable inference latency across heterogeneous computing environments. We evaluate GarNet, a lightweight Graph Neural Network (GNN) for calorimeter energy reconstruction, focusing on its cross-backend performance using PyTorch and ONNX Runtime on both multi-core CPUs and NVIDIA A100 GPUs. For small graph inputs representative of PicoCal deposits, ONNX Runtime delivers up to 5× faster CPU and nearly 2× faster GPU inference than Pytorch Runtime while maintaining FP32 numerical agreement at the 10⁻⁷ level.
This work is motivated by real-time deployment within the HLT1 reconstruction stage of the LHCb trigger, where the fully GPU-based Allen framework performs low-latency event processing to select interesting collisions at the LHC bunch-crossing rate. Integrating fast, portable ONNX-accelerated GarNet inference into this environment would enable graph-based calorimeter reconstruction to contribute directly to real-time decision-making in future high-rate running.
Speakers: Ronald Caravaca-Mora (Consejo Nacional de Rectores (CONARE) (CR)/Universidad de Costa Rica (UCR) (CR)), Cilicia Uzziel Perez (La Salle, Ramon Llull University (ES)) -
132
It’s not a FAD: first demonstration of Flows for unsupervised Anomaly Detection at 40 MHz for use at the Large Hadron Collider
We present the first implementation of a Continuous Normalizing Flow (CNF) model for unsupervised anomaly detection within the realistic, high-rate environment of the Large Hadron Collider's L1 trigger systems. While CNFs typically define an anomaly score via a probabilistic likelihood, calculating this score requires solving an Ordinary Differential Equation, a procedure too complex for FPGA deployment. To overcome this, we propose a novel, hardware-friendly anomaly score defined as the squared norm of the model's vector field output. This score is based on the intuition that anomalous events require a larger transformation by the flow, and it is shown to be physically interpretable as the norm of the input features for our specific training choice. Our model, trained via Flow Matching on Standard Model data, is synthesized for an FPGA using the hls4ml and da4ml libraries. We demonstrate that our approach effectively identifies a variety of beyond-the-Standard-Model signatures with performance comparable to existing machine learning-based triggers. The algorithm achieves a latency of a few hundred nanoseconds, or even less when using advanced quantization techniques, and requires minimal FPGA resources, establishing CNFs as a viable new tool for real-time, data-driven discovery at 40 MHz.
See also the published work at https://iopscience.iop.org/article/10.1088/2632-2153/ae51dd
Speaker: Dimitrios Danopoulos (CERN) -
133
COLLIDE-2V - 750 Million Dual-View LHC Event Dataset for Low-Latency ML
Modern foundation models (FMs) have pushed the frontiers of language, vision, and multi-model tasks by training ever-larger neural networks (NN) on unprecedented volumes of data. The use of FM models has yet to be established in collider physics, which both lack a comparably sized, general-purpose dataset on which to pre-train universal event representations, and a clear demonstrable need. Real-time event identification presents a possible need due to a requirement for fast event classification and selection of all possible collisions at the LHC. As a result, we construct a dual-view LHC collision dataset (COLLIDE-2V), a 50TB public dataset comprising ~750 million proton-proton events generated with MadGraph + Pythia + Delphes under High-Luminosity LHC conditions (<μ> = 200). Spanning everything from minimum-bias and γ+jets to top, Higgs, di-boson, multi-boson, exotic long-lived signatures and dark showers, the sample covers 50+ distinct processes and >99% of the CMS Run-3 trigger menu in a single coherent format. To allow for effective real-time event interpretation each event is provided twice, as Parquet files which retain physics-critical content:
- Offline: a full CMS-like reconstruction emulated by a tuned Delphes card
- L1T: a low-latency, lower-resolution view obtained via a custom Level-1 Trigger (L1T) parameterisation (degraded vertex, track and calorimeter performance, altered puppi, |η| ≤ 2.5 tracking, pT thresholds, etc.)
As a proof-of-concept, COLLIDE-2V supports a wide spectrum of research applications ranging from few-shot transfer learning, fine-tuning, pileup mitigation, detector-level generative modelling, cross-experiment benchmarking, to fast simulation surrogates and real-time trigger inference, and entirely novel anomaly-detection - thereby accelerating the shift from handcrafted topology cuts to data-driven decision making throughout the HL-LHC program.
Speaker: Eric Anton Moreno (Massachusetts Institute of Technology (US))
-
129
-
Track 3 - Offline data processing: New Approaches
-
134
Probe-aware Self-supervised Holographic Reconstruction Network
X-ray phase contrast imaging based on propagation is a crucial technique for achieving non-destructive detection at micro and nano scales. However, the recovery of phase information from intensity measurements presents a typical ill-posed inverse problem. Traditional iterative algorithms often necessitate multiple distance measurements, which increases both the complexity and time cost of experiments. Although deep learning introduces a novel paradigm for phase retrieval, supervised methods depend on paired data that are challenging to obtain. Furthermore, existing approaches frequently ignore the wavefront distortion of the illumination probe, leading to significant background artifacts in reconstruction results. To address these challenges, we propose a probe-aware self-supervised holographic reconstruction network (PASNet), designed to simultaneously recover high-quality complex amplitudes of both the object and probe from single-distance holograms. PASNet employs a compacted U-Net architecture that takes as joint input holograms representing "object + probe" and "probe only." It utilizes a physically driven loss function to integrate the Fresnel diffraction into the network optimization process, enabling self-supervised learning without requiring labeled data. Evaluation on simulated and experimental datasets demonstrates that PASNet effectively decouples object and probe wavefronts while eliminating background noise. With just a single exposure, our proposed algorithm achieves superior reconstruction quality compared to traditional iterative algorithms reliant on multi-range data. Additionally, robustness analysis indicates that our method maintains strong performance across varying propagation distances and noise levels.
Speaker: Jiarui Hu (IHEP) -
135
Exercising the novel and promising Mojo language in HEP frameworks
High Energy Physics uses C++ for performance-critical, large-scale (50 million lines of code) libraries. Python is used for analysis. C++ is complex and getting more so, with industry creating a very competitive market for developers. Python is very slow but very common. Is there any way out? As part of the R&D done in the Next Generation Triggers project we are looking at novel languages that could replace C++ at some point in the future - out of curiosity. One such very promising language is Mojo: compiled, statically typed, yet "feels like Python" and Python-compatible. We demonstrate the simplicity of the language which has been proven by students porting parts of the standalone CMS pixel track reconstruction from C++ to Mojo. We show surprisingly excellent performance results. We demonstrate initial benchmarks of Mojo in multi-threaded code and with GPU kernels written in Mojo. All of this is maybe less surprising once you know that the creator of Mojo is the creator of llvm/clang and Swift. Should HEP move to Mojo? Not for many years to come. Is it worth having an early look, and can we learn something from it? Absolutely, as this contribution will show.
Speaker: Axel Naumann (CERN) -
136
Machine-Learning Methods for Detector Optimization in HIBEAM/NNBAR
Machine-learning techniques are becoming an increasingly important part of the design and physics reach of the proposed HIBEAM/NNBAR program at the European Spallation Source. Building on our previously published ML studies for particle identification and event reconstruction, we are developing a broader suite of ML tools to support detector optimization, vertex and event reconstruction, and signal–background discrimination for future neutron–antineutron searches and related rare-process measurements. As one component of this program we use graph-based deep learning to study vertex reconstruction in the TPC concept under study, using simulated datasets with controlled detector smearing to evaluate robustness across detector configurations. The work outlines current progress and illustrates how modern ML methods can be integrated into the HIBEAM/NNBAR analysis chain to improve reconstruction performance and inform detector design choices.
Speaker: Lucas Astrand -
137
CMS Tracker Data Quality Certification enhanced with Machine Learning tools
The CMS Pixel Detector in Run 3, with about 1400 silicon modules, is a central part of the Tracker, providing precise tracking and vertex reconstruction. Ensuring high quality data requires continuous monitoring, as modules can degrade or suffer operational issues. Traditionally, experts relied on a GUI that displayed histograms integrated over entire runs, making it difficult to spot short-lived or localized anomalies. This monitoring involved visually inspecting hundreds of histograms, a process that is slow and prone to human error. To address this, the DIALS platform was deployed in 2024 which now provides histograms at the Lumisection (LS) level - each LS representing roughly 23 seconds of data. This finer granularity enabled Machine Learning (ML) models, such as Non-negative Matrix Factorization (NMF), to identify brief anomalies, sometimes lasting only a minute, that previously went unnoticed. Removing these affected LS improves overall data quality while minimally impacting integrated luminosity. Integrating ML into the DQM workflow has thus improved anomaly detection, making Run 3 data certification faster and more reliable.
Speaker: Richa Sharma (University of Puerto Rico (US)) -
138
Quantum Reservoir Computing for Sustainable Forecasting of Cosmic-Ray Neutron Monitor Time Series
Reliable short- to medium-horizon forecasts of cosmic-ray/neutron monitor count rates support detector operations, data-quality monitoring, and space-weather analyses, but modern deep sequence models can be costly to train and tune across stations and solar conditions. We present a practical Quantum Reservoir Computing (QRC) pipeline for sustainable time-series forecasting on neutron monitor data, focusing on long-running stations including Lomnický štít and complementary monitors as well as temporal cosmic-ray datasets. Our approach uses a small, fixed parameterized quantum circuit as a nonlinear dynamical reservoir driven by lagged count-rate inputs; only a lightweight linear readout is trained, enabling rapid model updates and low training energy. We evaluate point and probabilistic forecasts under diurnal/seasonal variability and solar transient periods using rolling-origin backtesting, and compare against persistence, ARIMA, classical echo-state networks, and compact deep baselines (e.g., temporal CNN/LSTM). Across stations, QRC achieves competitive accuracy while substantially reducing trainable parameters and retraining time, and it remains robust under moderate concept drift via fast readout re-fitting. We provide an end-to-end, reproducible workflow (data ingestion, normalization, feature lagging, hyperparameter sweeps, and metrics) suitable for deployment in monitoring pipelines and for extension to multivariate inputs (e.g., pressure corrections, geomagnetic indices). This work demonstrates QRC as a feasible near-term quantum/hybrid tool for HEP-adjacent time-series workloads emphasizing efficiency and maintainability.
Speaker: Krishna Bhatia
-
134
-
Track 3 - Offline data processing: Reconstruction 1
-
139
Reprocessing All IceCube Data - It Should be Easy, Right?
Re-processing data with improved detector understanding, new data processing methods, etc. is natural for any particle physics experiments over the course of its life. The IceCube Neutrino Observatory has previously re-processed its data nearly a decade ago. Now we are processing the data for the third time, which we call Pass3. With this reprocessing, we have recorded three times as much data, and need to apply new algorithmic improvements that increase the run time by 25%. This processing is being run exclusively at the Texas Advanced Computing Center’s Vista HPC system because the data is co-located at the facility. We will share our experience with retrieving all data from tape, reprocessing it, and uploading the new version to our main data storage at UW-Madison.
Speaker: David Schultz (University of Wisconsin-Madison) -
140
Advancing IACT Data Analysis: A Status Update on the CTLearn Framework
The Cherenkov Telescope Array Observatory (CTAO) represents the next generation of ground-based gamma-ray telescopes, designed to probe the very-high-energy (VHE) sky above 20 GeV with unprecedented sensitivity. With the first Large-Sized Telescope (LST-1) prototype already taking data on La Palma, robust software is required to accurately reconstruct the properties of primary particles (type, energy, and arrival direction) from the stereoscopic records of extensive air showers. In this contribution, we present a status update on CTLearn, a deep-learning-driven framework for event reconstruction in imaging atmospheric Cherenkov telescopes that is compatible with ctapipe, the standard low-level data processing library for CTAO. We highlight a substantial architectural expansion of the framework: while originally built exclusively for TensorFlow, CTLearn has been updated to support both TensorFlow and PyTorch backends while maintaining a unified interface for users. CTLearn utilizes convolutional neural networks to infer event properties directly from pixel-wise camera data, exploiting both integrated charge and temporal waveform information to capture the full evolution of the shower. We report on recent activities validating CTLearn against standard Random Forest methods using LST-1 observations of the Crab Nebula. We discuss training strategies to handle varying observational conditions, specifically comparing the performance of single generalized models versus altitude-dependent ensembles. We also present novel efforts to optimize model performance using transfer learning to adapt networks across telescope configurations, and model compression techniques such as pruning. These optimizations aim to significantly reduce computational resource consumption and inference time while maintaining the high sensitivity required for CTAO science goals.
Speaker: Prof. Daniel Nieto (IPARCOS-UCM) -
141
End-to-End Reconstruction with Transformers
Modern collider detector experiments comprise of multiple different detector subsystems, each of which require dedicated reconstruction algorithms. Manually tuning these algorithms such that they work optimally not only in isolation, but also when combined together to form a full reconstruction chain, is a time consuming task that poses technical and organisational challenges. We demonstrate how mask transformer models, originally developed for image segmentation tasks, can be used to perform end-to-end particle reconstruction and event classification, performing pixel and calorimeter clustering, tracking finding, track fitting, track-cluster matching, and calibration in a single step. Using FCC-ee CLD detector full simulation data as a testbed, we are able to achieve large gains in tracking efficiency, energy resolution, and particle identification over existing hand crafted tracking and particle flow algorithms, at a reduced inference time. With nominal reconstructability requirements, we are able to achieve ~ 30% and ~15% increase in integrated charged hadron efficiency and purity respectively, with gains increasing as reconstructability requirements are loosened. This demonstrates how the combination of low level information from different sub-detectors and end-to-end differentiability allow us to maximally exploit our data at a reduced person power and compute cost.
Speaker: Max Hart (University College London (GB)) -
142
Novel technique to improve anti-neutron reconstruction at Belle II
The MANTRA (Measuring Anti-Neutron: Tagging and Reconstruction Algorithm for frontier experiments) is a PRIN 2022 Italian project which proposes a new method to measure the energy of anti-neutrons produced in high-energy physics experiments. Anti-neutrons cannot be reconstructed by the tracking systems; however, they can produce so-called annihilation stars in electromagnetic calorimeters, which can form self-contained hadronic showers consisting mainly of pions and other hadrons. The signature from anti-neutron annihilations can be distinguished from the signatures of photons and other neutral particles due to the characteristic shape of the shower. However, only a part of the energy is deposited in the calorimeters, leading to a momentum uncertainty of approximately 50%.
To overcome this limitation, we propose combining calorimetric information with precise timing measurements from fast detectors placed upstream of the calorimeter. These timing detectors can detect either the annihilation of the anti-neutron or the calorimetric shower back-splash in order to extract the momentum with a high precision.
We will show the proof-of-concept results, obtained by applying this method to the Belle II experiment. We will demonstrate the improved anti-neutron identification and momentum resolution by combining information from the time-of-propagation Cherenkov detector and the electromagnetic calorimeter.
Speaker: Stefano Spataro (Torino University and INFN) -
143
Key techniques for performance optimization of astronomical satellite data processing
Astronomical satellites serve as critical infrastructure in the field of astrophysics, and data processing is one of the most essential processes for conducting scientific research on cosmic evolution, celestial activities, and dark matter. Recent advancements in satellite sensor resolution and sensitivity have led to petabyte (PB)-scale data volumes, characterized by unprecedented scale and complexity, posing significant challenges to data processing. However, traditional data processing methods are facing the issues including intricate interdependencies among multi-level data products (e.g., Level 0 to Level 2), limited memory resources, and high memory occupancy rates, which collectively affect data processing efficiency. To address these issues, this study proposes a performance optimization framework for astronomical data processing. Firstly, an adaptive data chunking model is established, which realizes the data partitioning dynamically based on real-time memory availability and computational load. Secondly, a multi-level memory management method is presented, which optimizes memory utilization through caching the frequently accessed data into memory and building the priority-based queuing mechanism. Finally, a parallel data processing interface is introduced, which is developed to transform the algorithm from single-threaded serial execution to parallel processing. Experiments are conducted to verify the availability and practicality of the proposed framework, and the result shows that the data processing efficiency has been improved by 13%, effectively solving the deficiencies in traditional methods. The research outcomes will be implemented in the data processing tasks of the enhanced X-ray Timing and Polarimetry (eXTP) satellite, while also providing guidance for the data processing workflows of other astronomical satellites.
Speaker: Shuang Wang (IHEP)
-
139
-
Track 4 - Distributed computing
-
144
Advancing Workflow Validation at Scale: A Modern and Containerized HammerCloud Architecture for the WLCG
Distributed computing infrastructures that support modern large-scale scientific experiments must remain reliable, scalable, and flexible. HammerCloud (HC) provides an automated framework for continuous testing, benchmarking, and commissioning of services within the Worldwide LHC Computing Grid (WLCG), using realistic full-chain experiment workflows.
As the technical computing environment continues to evolve, maintaining long-term software sustainability and adaptability has become a key challenge. HammerCloud has undergone a major modernization to address these needs, focusing on security compliance, maintainability, and integration with contemporary technologies. In this paper, we describe the redesigned HC architecture, which leverages industry-standard solutions such as containerization to support agile development and validation. We evaluate the impact of these improvements on reliability, deployment efficiency, and introduction of new tests, comparing the resulting system with related testing frameworks within the WLCG ecosystem.
The results show improvements in resource efficiency and operational simplicity, and lay the foundation for future developments, including a gradual transition toward Kubernetes-based orchestration.
Speaker: Lorenzo Valentini (CERN) -
145
Evolving PanDA: Toward Sustainable, Intelligent, and Heterogeneous Workload Management in ATLAS and beyond
The ATLAS experiment at the CERN Large Hadron Collider relies on a worldwide distributed computing infrastructure to process millions of production and analysis jobs daily across grid, cloud, and HPC resources. The ATLAS Distributed Computing (ADC) system integrates workload, data, and resource management services to ensure efficient use of heterogeneous environments. Within ADC, the PanDA Workload Management System (WMS) provides large-scale job brokerage, pilot submission, and monitoring. Recent development extends PanDA to address sustainability, hardware diversity, and intelligent automation. A new module estimates per-job CO₂-equivalent emissions, combining runtime metadata with regional carbon-intensity data. The resulting gCO₂ values are stored in the PanDA database and visualized through monitoring dashboards to raise awareness of computing-related emissions. The PanDA brokerage has been extended to support GPU-based scheduling, using a redesigned JSON resource description that encodes CUDA version, GPU model, memory, and benchmark data to enable precise resource matching. The Worker Node Map collects detailed CPU and GPU specifications reported by pilots and correlates them with HEPiX benchmark results to improve CPU-time normalization and provide operational insight into site homogeneity and hardware aging. Ask PanDA introduces an AI-driven assistant that orchestrates multiple specialized clients in a coordinated workflow and employs retrieval-augmented generation to provide contextual answers on PanDA operations. These developments represent a significant step toward a more transparent, efficient, and sustainable computing ecosystem for ATLAS and future large-scale scientific workflows.
Speaker: Fernando Harald Barreiro Megino (University of Texas at Arlington) -
146
Scaling ePIC Simulation Production: Distributed Workflow and Data Management
The ePIC experiment at the upcoming Electron-Ion Collider (EIC) continues to expand its simulation production capabilities on the Open Science Grid (OSG) infrastructure. We report on three significant developments since our previous work: the integration of background processes into simulation production, comprehensive testing of the PanDA workload management system, and progress in Rucio adoption for distributed data management.
Background embedding has enabled more realistic detector and physics studies while substantially increasing computational demands. As physics studies intensify to verify detector requirements and define the science program for the first years of operation, our simulation volumes continue to grow, necessitating improved workflow orchestration. We present results from PanDA integration tests and readiness assessment for production deployment. We also report on Rucio adoption status for data cataloging and replication across available computing resources.
These developments position ePIC to efficiently scale simulation production for EIC detector design and physics studies.Speaker: Sakib Rahman -
147
DiracX in action
DiracX is the next incarnation of DIRAC. This is a modern, cloud‑native platform for managing distributed computing across multiple research infrastructures for one or more virtual organizations. Leveraging two decades of DIRAC experience, DiracX delivers a faster, more capable, and user‑friendly environment for scientists, administrators, and developers alike.
In this contribution we build on our previous CHEP contribution (CHEP 2024), which outlined the motivations behind DiracX, its core technical capabilities, and the architectural choices that underpin it. Eighteen months later, DiracX has matured into a production‑grade service. We describe the practical experience of operating DiracX in parallel with the legacy DIRAC stack, share early adopter impressions, and highlight the novel technical innovations that distinguish DiracX from its predecessor.
Speaker: Alexandre Franck Boyer (CERN) -
148
Toward a Sustainable Workload Management Architecture for CMS at HL-LHC
The Compact Muon Solenoid (CMS) experiment is reassessing its Workload Management (WM) stack to meet HL-LHC scale, heterogeneity, and a 20–25-year sustainability horizon. Over the past year, we surveyed multiple pathways (including reuse of external WM systems, hybrid approaches, and a ground-up redesign) and developed a blueprint that emphasizes architectural principles of the HL-LHC WM project.
The blueprint centers on the separation of concerns between request intake/policy and execution, unique workflow specification portable across heterogeneous provisioners, data-locality-aware placement integrated with the data management and caching layers, modern security practices, modular architecture with clear APIs, strong observability, and automated operations. Scalability targets include hundreds of thousands of workflows per day, horizontal elasticity, priority-aware queuing, and unified retry semantics.
We present the evaluation framework used (functionality parity, operability, cost of change, ecosystem maturity), results from early prototypes, and a phased migration plan covering central production and user analysis. The goal is to converge on an implementation that adheres to these principles, reduces bespoke adapters, and remains adaptable as sites and technologies evolve through Run-4 and Run-5.
Speaker: Andrea Piccinelli (University of Notre Dame (US))
-
144
-
Track 5 - Event generation and simulation: Fast Simulation 3
-
149
CC4SF - Calibrated classifiers for scale factors (FastSim)
As the LHC moves into its high-luminosity phase, the CMS experiment must handle increasingly complex data collected at much higher rates. To complement real data, simulated samples must also scale in volume and complexity while meeting the growing demands of the CMS physics program. Increased use of the CMS fast Monte Carlo production framework (FastSim) can help meet these demands, particularly if its accuracy is improved. We introduce calibrated classifiers for scale factors (CC4SF), a machine-learning approach that computes scale factors—-weights applied to simulated objects to account for biases in reconstruction and selection efficiency—-in a differential way. Using a multi-dimensional input feature space, we showcase CC4SF for the example of FastSim electron, muon, and tau lepton scale factors, which serve to correct FastSim's efficiency modeling so that it closely matches the more accurate but resource-intensive FullSim.
Speaker: Samuel Louis Bein (Northeastern University (US)) -
150
Accelerating ALICE background simulations: A WGAN Approach to slow-neutron induced TPC loopers
The simulation of background processes in high-energy physics can be computationally expensive and time–consuming. To provide the most realistic data description at the ALICE experiment using Monte Carlo simulations, we investigated alternative solutions to generate the products of electromagnetic interactions initiated by slow neutrons in the Time Projection Chamber (TPC). Specifically, electrons and positrons from these processes can spiral repeatedly within the magnetic field in the detector (so–called “loopers”) and tracking them via full transport frameworks, such as GEANT4, requires extensive computing resources.
A fast simulation framework based on a Wasserstein Generative Adversarial Network (WGAN) was developed to speed up the simulation process. Our solution is designed to bypass the significant computational cost associated with transporting neutrons to low energies by directly generating the kinematic properties (e.g., momentum, production vertex) of the resulting electron-positron pairs (from neutron capture) and electrons from Compton scattering.
The generative model successfully reproduces the complex multidimensional distributions of the background tracks with high fidelity, maintaining agreement with the full GEANT4 simulation. By replacing the full transport of these specific background components with the WGAN inference, a tenfold speedup is achieved, making the simulation of loopers a negligible component of the total processing time. Moreover, the fast simulation framework is modular, allowing for the future inclusion of other background sources, such as nuclear spallation products.Speaker: Marco Giacalone (CERN) -
151
Quantum based Generative Models for Fast Calorimeter Simulation in ATLAS and Future Colliders
The increasing demands on simulation statistics for HL-LHC analyses challenge the scalability of traditional calorimeter simulation across all LHC collaborations. While machine learning based fast simulation techniques have demonstrated strong performance, future collider experiments will require generative models that are not only accurate and fast, but also scalable and interpretable in regimes of unprecedented detector granularity and event rates. In this context, quantum based generative models offer a novel direction.
In this work, we investigate the integration of Quantum Neural Network (QNN) architectures into the ATLAS fast calorimeter simulation framework (fastCaloSim), with the dual goal of improving generative performance and generalisation while maintaining practical inference times. We present a comparative study of two complementary approaches. The first adopts a hybrid quantum/classical strategy, where a quantum Generative Adversarial Network (qGAN) is used during training to learn the latent space of calorimeter shower representations, followed by classical neural networks for event generation. The second approach introduces a Quantum Invertible Neural Network (qINN), a generative architecture still largely unexplored especially in practical implementation, which provides a bijective mapping between input kinematic parameters and calorimeter shower observables, enabling explicit likelihood evaluation and enhanced interpretability.
Both approaches are implemented and benchmarked within fastCaloSim using ATLAS open-data samples, and evaluated in terms of generative fidelity and compatibility with ATLAS production workflows. Preliminary results demonstrate the feasibility of quantum based generative models for fast detector simulation and highlight the potential of qINN-based approaches as a novel paradigm for future event simulation at next generation collider experiments.Speaker: Matteo Franchini (University of Bologna and INFN (IT)) -
152
Machine Learning for Faster Simulations at Belle II
One of the major goals of the Belle II Experiment is the search for rare decay processes, which manifest as tiny signals over large background contributions. Measuring such delicate signals with the highest possible precision requires not only large datasets from the actual experiment, but typically even larger simulated datasets for the development of such analyses.
Since running the analysis software over these entire datasets for every analysis is computationally wasteful, centrally produced, preselected subsets of collision events (so-called skims) are essential for efficient data access. However, the generation of skimmed simulated datasets itself is computationally inefficient, because the entire simulation chain, including expensive detector simulation and reconstruction algorithms, must be run even for events that will be discarded by skims later on.
To remedy this issue, we present a method which uses machine learning algorithms to predict already before the expensive steps of the simulation whether an event will be selected by a skim or not, such that wasteful computation can be skipped for discarded events. In particular, a transformer-based neural network architecture is employed in conjunction with importance sampling to avoid biases in the data selection.
This contribution will highlight the development and validation of our approach, its incorporation into the Belle II production software and its future potential in the face of ever growing data challenges.
Speaker: David Giesegh (Belle II Experiment) -
153
step2point: Enhancing data preparation for point-cloud-based fast simulation
Fast calorimeter shower simulation is an active field of study, with numerous models having been explored. Recently, several models have explored a point cloud representation of energy deposits, as opposed to the more common image-like voxelisation of a shower. However, direct use of the output from the detailed Geant4 simulation as an input to these machine learning models is computationally prohibitive. Creating a point cloud which features a sufficiently reduced number of points while still preserving key physics observables has not been widely explored and currently requires handcrafted detector-dependent procedures. To address this, we present the step2point library, a lightweight and configurable tool designed to preprocess electromagnetic and hadronic showers into optimally compressed point-cloud representations, while preserving physically relevant shower characteristics. The step2point workflow reduces the density of raw simulation hits by merging deposits according to tunable spatial, energy, and topological criteria. These parameters are exposed to users to accommodate the specific granularity requirements of different detector concepts, enabling a flexible balance between fidelity and computational efficiency.
We demonstrate the core functionality of the step2point library using the Open Data Detector together with a publicly available step2point dataset. Benchmark studies show that the approach significantly decreases data volume while maintaining accurate simulation-level observables. The step2point library provides a common tool for the community to accelerate fast simulation R\&D and facilitates integration of point-cloud–based shower models.
Speaker: Anna Zaborowska (CERN)
-
149
-
Track 6 - Software environment and maintainability: Software Systems, Frameworks, and IntegrationConveners: Arantza De Oyanguren Campos (Univ. of Valencia and CSIC (ES)), Gaia Grosso (IAIFI, MIT)
-
154
A Feedback-Driven Evolution of the Belle II Distributed Computing System
As the Belle II dataset grows towards a high luminosity scenario, the requirements for the distributed computing framework have grown in complexity and scale. To ensure long-term software maintainability, the Belle II Distributed Computing team is implementing a feedback-driven development model. This approach bridges the gap between the end-user experience and system evolution, aiming for more frequent engagement and rapid development cycles supported with continuous integration pipelines. Central to this strategy is a computing users forum, which acts as a dynamic feedback loop where users can request assistance and contribute ideas for new features. A recent comprehensive user survey revealed a high baseline of satisfaction with both current operations and the responsiveness of the team.
We present the overview of our most recent feedback survey, including a sample of the most interesting requests from users, and a summary of our development work to address their evolving needs. In response to the continuous feedback, we have prioritized targeted developments to enhance user autonomy, like diagnostic tools that automatically capture errors and tracebacks, and provide AI-powered smart troubleshooting. By presenting these results, we demonstrate how a user-centric feedback loop reduces maintenance overhead and fosters a resilient distributed computing environment for the future of Belle II.
Speaker: Quinn Campagna -
155
Building an AI Assistant for ATLAS Computing Operations and User Support
We present the development of an AI Assistant designed to support ATLAS computing operations and users at the UChicago/MWT2 facilities. A significant portion of effort in distributed computing is spent helping users, debugging systems, optimizing workflows, and maintaining a diverse ecosystem of tools and services. Modern large language models offer a practical opportunity to reduce this effort. Our system integrates OpenAI Assistant APIs, multiple agent frameworks, and a growing set of Model Context Protocol (MCP) services deployed on our Kubernetes-based UC Analysis Facility. By combining general LLM capabilities with facility- and experiment-specific knowledge, the assistant can help users at all experience levels - from basic environment setup and system-readiness checks to advanced coding, dataset navigation, and physics-analysis reasoning. The assistant also targets operational intelligence across services such as Rucio, HTCondor, PanDA, FTS, and caching layers, enabling automated diagnostics and system-level reasoning. We summarize the architecture, current capabilities, and challenges, focusing on achieving robust, production-ready performance. This work demonstrates that domain-aware AI agents can significantly enhance user support and operational efficiency within ATLAS computing.
Speaker: Ilija Vukotic (University of Chicago (US)) -
156
Exploring AI-Assisted Coding for Storage Systems: Practical Examples and Preliminary Evaluation
Advances in AI-assisted code generation are changing how complex software systems are designed, built, and improved over time. In storage software development for scientific computing, we explore how AI-based code synthesis and refinement workflows can speed up prototyping, strengthen maintainability, and clearly express architectural intent.
We present several practical examples developed in this framework:
cern-nfs, a user-space NFS 4 server in C++, demonstrating how AI-assisted generation can support the implementation of protocol layers and asynchronous I/O logic.
cern-hpack, an archive-friendly backup tool for POSIX file systems, illustrating the use of AI-driven scaffolding to model metadata handling, integrity verification, and incremental versioning.
htd, a high-performance transfer daemon, where AI-based refinement was applied to optimize concurrency patterns and I/O pipelines
Refactoring and feature development in EOS Open Storage.
Each example illustrates how collaboration between humans and AI enables rapid prototyping of reliable, high-performance components while maintaining developer control over design and correctness. Beyond code synthesis, we discuss our experience in the iterative refinement process in which AI-generated code is validated, profiled, and gradually optimized through continuous feedback loops and low-level debugging.These case studies demonstrate how AI-assisted software development can increase developer productivity and reduce implementation time for complex systems such as distributed storage, data management, and high-throughput I/O frameworks, opening up new opportunities for innovation in the HEP software ecosystem.
Speaker: Andreas Joachim Peters (CERN) -
157
RLABC: A Sustainable Reinforcement Learning Software Framework for Accelerator Beamline Optimization
We present RLABC, an open and extensible software framework for applying reinforcement learning (RL) to particle accelerator beamline optimization. The framework is designed to bridge modern RL libraries with established accelerator simulation tools, enabling reproducible and maintainable development of learning-based control solutions. RLABC integrates Python-based RL workflows with the \textbf{Elegant} beam dynamics simulation code through a modular wrapper architecture, abstracting simulation control, data handling, and environment generation from algorithmic development.
A key software contribution is the automatic transformation of arbitrary Elegant beamline descriptions into Markov decision process–compliant RL environments via structured preprocessing and diagnostic watch point insertion. The framework provides standardized state representations, continuous action interfaces corresponding to realistic magnet controls, and configurable reward functions, allowing users to rapidly prototype and benchmark RL algorithms. RLABC is fully compatible with the Stable-Baselines3 ecosystem and supports custom agent implementations, staged training, and transfer learning across beamline configurations.
The software is validated on the VEPP-5 injection complex, where reinforcement learning agents achieve particle transmission efficiencies comparable to classical optimization methods while demonstrating improved reusability and scalability. RLABC is intended as a sustainable research and application platform for accelerator physicists and machine learning researchers, lowering the barrier to adopting RL-based control in accelerator environments.Speaker: Anwar Ibrahim -
158
The CMS Interactive Job Tracer
CMS applications are generally complex: they can have up to many thousands of components that the CMSSW framework schedules that are run on tens of threads. Understanding the timing characteristics of such complex applications is difficult, especially if understanding correlations between the components is necessary. To aid in understanding the runtime behavior of the applications, CMS has developed a web browser based interactive data visualization tool. The tool is composed of two parts: one for gathering the data from the framework in a compact format and a second for transforming the data into the Chrome Trace Event JSON format. These JSON files can be read by widely available web tools such as the Chrome browser or Perfetto, or converted to HTML pages. This contribution will describe both parts and how they work with each other. For the first part, we will describe the generic hooks the CMSSW framework provides that enable real time tracing of the activities of the framework. For the second part, we will outline how the data is represented in the JSON format in order to support different use cases such as identifying long-running framework activities, finding algorithm scheduling conflicts, and studying how each algorithm uses memory during a job.
Speaker: CMS Collaboration
-
154
-
Track 7 - Computing infrastructure and sustainability
-
159
Evaluating the scalability of CERN’s HTCondor batch system towards the High-Luminosity LHC
The CERN Tier-0, representing around 25% of WLCG’s total CPU capacity, currently handles 125 thousand concurrent jobs. For HL-LHC at full luminosity, we expect this number to increase by a factor between 4 and 7. Therefore, CERN's HTCondor batch system will need to manage a much larger pool of resources and many more computing tasks. This will have an impact on HTCondor's central components, Collector and Negotiator, which need to gather information about available computing resources and idle tasks awaiting execution, in order to match and schedule them. This work analyzes the factors impacting the scalability of these central components, and discusses the results of our large-scale stress tests, performed in conditions similar to those expected during the High-Luminosity era. The lessons learnt from this testing will help us dimension CERN's batch system service to ensure readiness for the HL-LHC challenge.
Speaker: Antonio Delgado Peris (CERN) -
160
Expanding the INFN Cloud Service Portfolio through the Federation of Kubernetes Clusters
The National Institute for Nuclear Physics (INFN) manages the INFN Cloud, a federated cloud platform providing a customizable portfolio of IaaS, PaaS, and SaaS services to meet the needs of the scientific communities it serves. PaaS services are implemented using an Infrastructure as Code approach, employing TOSCA templates, Ansible, Docker, and Helm technologies.
The federation middleware is based on the INDIGO PaaS Orchestration system, which integrates multiple open-source microservices. The INDIGO PaaS Orchestrator manages high-level deployment requests and coordinates provisioning across the federated IaaS platforms, which are currently all based on OpenStack. Recent development efforts focused on replacing legacy components with modular services to extend functionality and mitigate security vulnerabilities, with Python adopted as the primary programming language to support long-term maintainability.
This contribution focuses on the federation of Kubernetes-based cloud providers within the INFN Cloud and the development of TOSCA templates to automate the deployment of services across these providers. The supported Kubernetes clusters are deployed both on bare-metal hardware and on virtual machines. A key requirement for the federated providers is the use of Capsule to manage users’ resource quotas, ensuring fair and efficient distribution of computational resources among multiple scientific communities.
This work is particularly significant as Kubernetes is increasingly becoming a central technology for the research communities supported by INFN, offering scalable, portable, and resilient environments for diverse scientific workloads. This achievement further extends the INFN Cloud portfolio, making it more valuable to its users.Speaker: Luca Giommi (INFN) -
161
Performance optimisations for a cloud-based grid site
AU-Melbourne is the first grid computing site to be implemented entirely in the cloud. Virtual machines are managed with OpenStack and Cloud Scheduler v2 (CSv2) while an S3 object store functions as the storage backend. The site has been in operation for over 12 months, providing Compute Element (CE) and Storage Element (SE) resources for the ATLAS and Belle II experiments. Monitoring of the site over this period highlighted several performance deficiencies which needed to be addressed, particularly in the SE component. This presentation will discuss the issues identified, how they were investigated, solutions adopted to resolve the underlying problems, and the optimisation of the grid site's performance.
Speaker: Dr Jonathan Woithe (Adelaide University (AU)) -
162
Strengthening Vulnerability and SBOM Management in the CERN Container Registry
The CERN Container Registry is built on Harbor, a graduated CNCF project capable of managing a wide range of OCI artifacts. It serves use cases at CERN as well as workloads and services across WLCG, and acts as a central registry for Harbor instances running in other WLCG sites. Today, it hosts container images, Helm charts, machine-learning models, SBOMs, and numerous other artifact types. Beyond storage, versioning, and distribution, the service provides automated replication between registries, acts as a pull-through cache for external sources, and supports advanced quota and policy-driven tag-retention management. This session will give an overview of the various registry instances deployed across CERN, spanning both the general public network and the fully automated replication into the air-gapped Technical Network (TN), with particular emphasis on accelerator-sector use cases. We will then discuss how a centralized registry architecture enhances the tracking of vulnerabilities and Software Bills of Materials (SBOMs), supported by automatic scanning and on-publish generation of security metadata. Finally, we will present how runtime-level visibility was introduced to ensure timely responses to newly disclosed CVEs, even after initial artifact deployment. The talk will showcase the dashboards available in the service’s Security Hub portal and highlight the upstream contributions CERN has made to further strengthen Harbor’s capabilities.
Speakers: Jack Charlie Munday, Ricardo Rocha (CERN) -
163
Building a Hybrid Cloud HPC System
HPC services are increasingly constrained by fixed on-premises capacity, long procurement cycles, and data centre infrastructure limitations. At the University of Cambridge, these pressures are amplified by rapidly evolving AI workloads, where researchers benefit from access to diverse compute resources, both CPU and GPU, often on short timescales. This work presents our approach to extending the University’s on-premises CSD3 HPC service with an on-demand, burstable compute capability in AWS to meet the requirements of these users and UKAEA as one of our major collaborators. The cloud extension is designed as a premium option for researchers when local capacity is saturated, while maintaining a consistent operational model and user experience. Beyond capacity bursting, the same platform enables “test-before-buy” benchmarking, allowing research groups and service owners to evaluate application performance on candidate hardware configurations prior to committing to new on-premises procurements.
We describe the pilot architecture and operational model, including identity integration using existing LDAP-based credentials, POSIX-compatible storage access via FSx with S3-backed data staging, and a Slurm-based workflow aligned with established allocation and accounting practices. Using representative UKAEA workloads examples, we outline early findings on portability, performance, and cost, and discuss lessons for operating a hybrid HPC service sustainably and securely.
Speakers: Deepak Aggrawal (University of Cambridge), Shaun de Witt
-
159
-
Track 8 - Analysis infrastructure, outreach and education: Machine Learning
-
164
On-Premises Machine Learning Challenge Framework for High Energy Physics
Machine learning challenges have proven to be powerful tools for collaboration, benchmarking and algorithmic innovation in scientific communities. Global platforms such as Kaggle enable researchers to publish datasets, submit solutions and compare performance through structured competitions. However, they assume that participants can use public datasets and external computing resources, which limits their applicability when data is internal, experimental tooling is specific, and access to resources more controlled.
To address these constraints, we present an on-premises infrastructure for running ML challenges within CERN’s computing environment. Developed as part of the Next Generation Triggers (NGT) project, the platform builds on Codabench and integrates with the NGT computing stack to enable secure, reproducible and scalable execution of participant submissions. The system supports the full lifecycle of a competition, including dataset management, scoring functions, evaluation pipelines, resource orchestration and automated leaderboard management.
We will discuss the architecture, operational model and interaction with existing batch and accelerator resources, as well as the mechanisms for defining new competitions and managing submissions. Finally, we showcase initial challenges currently in use to illustrate the potential for internal benchmarking, community engagement and accelerated ML development in the HEP domain.
Speakers: Hannes Jakob Hansen, Paulo Guilherme Pinheiro Pereira (Universidade de Sao Paulo (USP) (BR)) -
165
INFN Hackathons: Five Years of Teaching AI for High Energy Physics and Beyond
We summarize five years of experience organizing educational Hackathons within the Italian research landscape of Artificial Intelligence (AI) and High Energy Physics (HEP). These events were part of the INFN AI and ML projects, which aimed to provision GPU and other hardware accelerators via an interactive JupyterLab-based platform providing an easy and highly customizable development environment. Over this period, we leveraged the platform to organize seven Hackathons sharing state-of-the-art AI methods and their applications to physics.
More than 300 participants - including PhD candidates, master students, early-stage and senior researchers - joined as students, while over 50 experts contributed as tutors and teachers. The core activities consisted of guided hands-on exercises, held during dedicated long sessions under expert supervision. Topics spanned multiple domains, including HEP, Medical Physics, Theoretical Physics, and Astrophysics, fostering interdisciplinary learning.
These Hackathons stood out for several reasons: a high tutor-to-student ratio, informal interactions with AI experts, and exposure to diverse applications and difficulty levels of AI in physics. Moreover, the exercises can be modulated according to the specific student expertise, from machine learning beginner to advanced data scientist level. Combined with the innovative JupyterLab platform as deployed either at HPC centers in Italy like CNAF, ReCaS or both, and provisioning an hardware-accelerator instance to each participant, the hackathons favor a unique environment that encourages participants to adopt and apply cutting-edge methodologies in their own research fields.Speaker: Francesca Lizzi -
166
Open-Source Tools for Effective Machine Learning Education in HEP
The rapid growth of machine learning has left an overwhelming abundance of teaching resources in its wake that often makes it hard for students to know where to start, how to progress, or what sources to trust. Simultaneously, LLM-based coding assistants enable students to produce working models almost immediately — often before they understand the underlying principles or common pitfalls. With the rapid developments, instructors also frequently replicate efforts, recreating teaching materials with minor variations. To address this, we present a suite of open-source tools. The project comprises three complementary components. First, a curated ML education website that aggregates vetted resources and provides code examples, learning paths and teaching slides. Second, a GitHub repository serving as a bootstrap framework for student projects, offering reusable workflows, example models, and best practices adaptable to research contexts. Third, a Jupyter-based extension enabling instructors to monitor student progress and provide real-time feedback during teaching sessions or coursework. Together, these tools aim to lower the barrier to ML-driven research and teaching while ensuring that students build genuine understanding rather than superficial fluency.
Speaker: Liv Helen Vage (Princeton University (US)) -
167
Machine Learning Training Facility at Vanderbilt University
MLTF (Machine Learning Training Facility) is hardware and software deployed at Vanderbilt University with a focus on portability, reproducibility and ease of exploiting hardware features like RDMA. The software integrates MLflow as an end-to-end ML solution for its capabilities as a user-friendly job submission interface; as a custom-built tracking server for model and run details, arbitrary metrics logging, and system diagnostics logging; and as an inference server.
MLTF integrates OpenID Connect (OIDC) protocol to facilitate federated, multi-institutional collaboration. Furthermore, by leveraging OIDC with Role-Based Access Control (RBAC) for system-level permissions and Identity-Based Access Control (IBAC) for user-level permissions, MLTF ensures experiments and metadata are strictly isolated in its custom-built tracking server, mitigating the native security flaws in MLFlow's tracking server. This granular control extends to the data layer, where model outputs and training data are abstracted through an S3-compatible endpoint, ensuring that users interact with a secure, private storage environment regardless of the underlying infrastructure.
This presentation provides an overview of MLTF's technical architecture, focusing on key developments like the REST-based client/server submission infrastructure, a southbound API for runtime configuration of cluster-provided resources, and a runtime hook interface which allows site-specific customizations to be run during training jobs. We conclude with a discussion of recent user experiences and future development plans.riences with this facility and plans for future development.
Speaker: Andrew Malone Melo (Vanderbilt University (US)) -
168
Scalable HTC-based Neural Network Training Workflow for Neural Simulation-Based Inference
The development of Neural Simulation-Based Inference (NSBI) algorithm requires training a large ensemble of neural networks, on the order of one thousand, which makes a serial single-node approach impractical. To address this, we are developing a scalable high-throughput training workflow built around Snakemake[1] and deployed on an HTCondor-based GPU facility. Each neural network training task is treated as an independent job within a well-defined directed acyclic graph, enabling efficient parallel execution while preserving reproducibility and fault tolerance.
The workflow is designed to automatically map job-level resource requirements, such as GPU and CPU requests, to the underlying cluster, allowing the system to fully utilize available hardware without manual intervention. All training runs are isolated and fully documented, with model weights, performance metrics, and resolved configuration files stored as explicit artifacts. This structure naturally supports a scatter-gather pattern, in which large numbers of models are trained independently and their outputs are later aggregated for downstream analysis. This approach provides a robust and reproducible foundation for large-scale neural network training campaigns critical for applications like NSBI. The workflow is being developed as part of the IRIS-HEP ecosystem of tools for NSBI analysis at the LHC [2].
[1] https://snakemake.github.io/
[2] https://github.com/iris-hep/NSBI-workflow-tutorial/tree/mainSpeaker: Jay Ajitbhai Sandesara (University of Wisconsin Madison (US))
-
164
-
Track 9 - Analysis software and workflows
-
169
Enabling distributed analysis for ALICE in Run 3
Commissioned in 2022, the organised analysis system Hyperloop has been the
primary platform for analysis within ALICE. The system was developed to meet the demands of the upgraded ALICE detector for Run 3, where the data-taking rate capability was increased by two orders of magnitude. To support analysis on such large datasets, the ALICE distributed computing infrastructure was revised and complemented by two key tools: the O$^2$ analysis framework, built on a multi-process architecture with a flat data format exchanged through shared memory, implemented in C++, and the Hyperloop train system for distributed analysis on the Grid and on dedicated analysis facilities, implemented in Java, React and PostgreSQL. 90% of all ALICE analyses use Hyperloop today. The system is operated 24/5 by four institutes spanning different time zones. In
2025, the system submitted over 130 million jobs, corresponding to 25 thousand CPU years and 1.9 EB of processed data. Efficient and reliable operation is crucial at this scale.This contribution summarises lessons learned from three years of Hyperloop operation and the effort that went into optimising CPU efficiency, storage locality and high job success rate. Finally, usage statistics and directions for the future will be presented.
Speaker: Nicolas Poffley (CERN) -
170
The Full Event Interpretation Algorithm at Belle II: Current Status and Developments
The Full Event Interpretation (FEI) algorithm is a central component of the Belle II analysis framework, designed for the efficient and flexible reconstruction of exclusive B-meson decays. It performs a hierarchical reconstruction of hadronic and semileptonic final states, using multivariate classification techniques to tag one of the two B mesons produced in electron–positron collisions. The FEI is particularly important for studies of signal-side decays involving missing energy. By reconstructing one B meson with the FEI, the kinematics of the other, signal-side B meson are tightly constrained, leading to improved background rejection.
In this contribution, we present the design, implementation, and recent developments of the FEI. We outline the reconstruction strategy, candidate selection, and multistage multivariate training based on gradient-boosted decision trees, together with the handling of training and inference in a distributed computing environment. Recent improvements to the FEI workflow, including improvements in the simulated data used for training, reconstruction of intermediate particles, and improved robustness of the multivariate inputs, are discussed in detail.
Beyond algorithmic updates, we also discuss newly developed techniques for the calibration and performance evaluation of the FEI. These include refined procedures to calibrate the efficiency, check the performance across decay channels and data-taking conditions, and application of reweighting techniques to diagnose and mitigate discrepancies between simulation and data. The impact of these methods on B tagging is demonstrated using representative Belle II Monte Carlo samples as well as real data.
Finally, we briefly discuss ongoing and future developments aimed at further improving reconstruction performance and computational efficiency.Speaker: Dr Rahul Tiwary (Toshiko Yuasa Laboratory (TYL), KEK) -
171
Evaluating ROOT RNTuple for Physics Analysis Workflows in ATLAS
ATLAS has developed a ROOT RNTuple prototype within its Athena software, enabling read/write support for event data and in-file metadata. Using this implementation, ATLAS converted the publicly available Open Data, comprising multiple tens of terabytes of 2015–2016 proton–proton collisions and associated Monte Carlo samples, from ROOT TTree to RNTuple in the official DAOD PHYSLITE format. The conversion achieved roughly a 50% reduction in storage footprint while preserving full physics content and compatibility with the ATLAS Event Data Model.
ATLAS is now extending RNTuple support to lightweight analysis frameworks used outside Athena to evaluate its performance in realistic physics analysis workflows. The plan is to perform a full physics analysis on Open Data, systematically studying throughput, memory and disk usage, and overall usability. This presentation evaluates the performance and usability of ROOT RNTuple for ATLAS analysis workflows, highlighting key results from integration studies and discussing implications for Run 4 analysis software.
Speaker: Alaettin Serhan Mete (Argonne National Laboratory (US)) -
172
CMS Analysis Frameworks
One of the main challenges currently facing high energy particle physicists analyzing data from the Large Hadron Collider (LHC) at CERN is the unprecedented volume of both real data and simulated data that must be processed. This challenge is expected to intensify as the LHC enters its high luminosity phase, during which it is projected to deliver up to ten times more data than before. At the same time, calibrations and analysis techniques have become increasingly fine-grained and sophisticated, further increasing the complexity of data processing. These developments have driven the evolution of both data formats and analysis frameworks to accommodate growing complexity and scale. In response, the CMS Collaboration at CERN introduced the NanoAOD data format, designed to be a more compact and standardized format that meets the needs of a major fraction of the physics analyses. To keep pace with these advancements, analysis frameworks must now be capable of handling both the vast data volume and the complexity of the analyses effectively. Efficient and scalable analysis tools are therefore essential to ensure timely and accurate physics results in this evolving landscape. This talk will summarize a typical analysis workflow, and how it is integrated within analysis frameworks, highlighting their features.
Speaker: Khawla Jaffel (National Institute of Chemical Physics and Biophysics (EE)) -
173
FCCAnalyses: A Core Component of the Emerging Analysis Ecosystem for FCC
The Future Circular Collider (FCC) project requires an analysis infrastructure capable of handling large simulated datasets while providing the flexibility needed for rapid detector optimization. We present FCCAnalyses, the flagship analysis framework for the FCC collaboration. Integrated within the Key4hep software stack, FCCAnalyses leverages ROOT’s RDataFrame to provide a declarative, high-performance interface for processing EDM4hep data.
This contribution introduces the architectural design of the framework, which combines the performance of C++ kernels with the agility of a Python-based user interface. We highlight the "Library of Analyzers" concept, which promotes code sharing and consistency across different physics working groups. A major focus of this presentation is the extension of FCCAnalyses' capabilities to support the detector performance studies required in the pre-TDR phase.
We describe the introduction of distributed computing capabilities into FCCAnalyses and its role as a key component within the developing Computing Model for the FCC. Given the growing number of FCC efforts, it is increasingly necessary to develop and adopt a diverse suite of software tools. We discuss the ongoing work to integrate these tools into a seamless software ecosystem, ensuring that FCCAnalyses becomes a highly interoperable and efficient part of the broader FCC computing landscape.
Speaker: Juraj Smiesko (CERN)
-
169
-
Poster
-
Track 1 - Data and metadata organization, management and access: XRootD ecosystem and data access
-
174
Designing a Unified XRootD Monitoring Framework for WLCG and experiment Operations
In preparation for Run-4 and the HL-LHC era, WLCG has initiated the redesign of its XRootD monitoring to provide a coherent and scalable view of data-access activity across distributed sites and experiments. Developed in close collaboration with CMS, the new architecture aims to serve both WLCG-level needs for global observability (such as assessing traffic patterns and validating large-scale data challenges) and CMS operational needs for fine-grained visibility into data-access performance and failure modes. At present, only ALICE provides validated XRootD monitoring by using experiment-specific site-level components. While CERN could theoretically aggregate all data centrally, doing so would require operating and coordinating multiple collectors at scale. In practice, the centrally received streams from other experiments remain incomplete and unverified, underscoring that a central-only model is operationally complex and fragile. Assigning lightweight validation responsibilities to sites offers a more scalable and reliable approach. Building on a systematic assessment of existing deployments, the proposed framework introduces standardised message schemas, resilient transport and buffering semantics, and a clearer separation of responsibilities between site-level and central services. This contribution presents the design, early validation results, and the roadmap towards a unified WLCG–CMS XRootD monitoring infrastructure ready for HL-LHC operations.
Speakers: Borja Garrido Bear (CERN), Panos Paparrigopoulos (CERN) -
175
Enhancing Redirection in the CMS Data Federation
The XRootD redirector plays a key Role in CMS Experiment's global data access infrastructure, determining where clients are sent to retrieve data across a heterogeneous, worldwide set of storage endpoints. The redirector has traditionally emphasised simplicity and performance; Its decisions tend to be opaque and based on limited inputs. This can lead to erroneous redirections, such as sending clients to distant sites even when nearby replicas are available. Operators also lack sufficient observability to understand and diagnose redirector behaviour.
We present a set of enhancements to improve both the transparency and the effectiveness of redirector decisions for CMS. First, we introduce a new framework for redirector performance metrics, tracing and decision-making metadata. This instrumentation provides operators and clients with clear insights into how and why a particular redirection was chosen. Further, we investigate mechanisms to increase the reliability of redirections. Having made redirector decisions more reliable and reduced the need for client retries, we establish a foundation for incorporating more intelligent redirection logic that is both configurable and potentially pluggable. Examples include implementing custom plugins that use GeoIP information and other metrics to guide clients toward topologically favourable data sources.
Together, these improvements represent an important step in enhancing the robustness of CMS data access and analysis, and we hope to address long-standing pain points experienced by CMS physicists.
Speaker: Rahul Chauhan (University of Wisconsin Madison (US)) -
176
Global Profiling of Open Science Data Federation Services using the Pelican Platform
Contemporary research relies heavily on computational resources and storage, with data sharing serving as a critical element. Data access remains a central challenge. The Open Science Data Federation (OSDF) project aims to establish a global scientific data distribution network by leveraging the Pelican Platform and the National Research Platform (NRP). OSDF is based on the XRootD and Pelican projects. Characterizing the performance boundaries of the Pelican Platform under various configurations, such as transfer rate limits, buffer settings, and geographic distribution, is essential. To this end, OSDF access was systematically profiled globally. The evaluation considered multiple file sizes, parallel data streams, and diverse client-server geographic relationships. This approach facilitates comprehensive monitoring of XRootD and Pelican performance across a range of scenarios. Results indicate that OSDF can deliver transfer speeds approaching the maximum capabilities of client systems.
Speaker: Fabio Andrijauskas (Univ. of California San Diego (US)) -
177
DataHarbor: A Secure Web Portal for Efficient Large-Scale Data Access via XRootD
DataHarbor is a modern web application designed to provide researchers with secure, intuitive access to large-scale data stored on distributed storage systems through the XRootD protocol. The system provides a web-based file browser that enables seamless directory navigation, metadata inspection, and on-demand file downloads. Files are streamed directly from XRootD storage to the user's browser using chunked transfer encoding with adaptive buffering, eliminating intermediate storage and enabling efficient and secure downloads of multi-gigabyte datasets over WAN connections.
The architecture implements comprehensive security at multiple layers using the Backend-For-Frontend (BFF) pattern. User authentication is managed through OpenID Connect (OIDC) with enterprise-grade security: all OAuth tokens are stored exclusively server-side, Secure, and SameSite cookies, preventing XSS and CSRF attacks. The frontend Vue.js application never accesses authentication tokens, while the Go backend handles all security-critical operations including session management, token validation, and automatic refresh token rotation.
A key technical contribution is the integration and extension of the native Go XRootD client from the go-hep.org/x/hep HEP project. While the original library provided basic XRootD functionality, we significantly extended it to support Zero-Trust Networking (ZTN) protocol authentication combined with TLS encryption for the native xrd:// protocol. This enhancement enables secure, token-based authentication on native XRootD connections using OAuth tokens from OIDC-compliant identity providers such as Keycloak, previously only possible via HTTP. The ZTN implementation validates tokens server-side through SciTokens, maps authenticated users to Unix credentials via the multiuser plugin, and ensures TLS-encrypted data transport while maintaining native protocol performance.
Speaker: Anar Manafov (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE)) -
178
Chasing the Bottleneck: Optimizing XRootD data servers for Columnar Analysis workloads, from I/O Wait to CPU Load
The Notre Dame CMS XRootD storage element, originally designed to handle traditional CMSSW workloads, underwent heavy I/O wait saturation when dealing with new data analysis workloads based on columnar analysis frameworks. These new workloads, using tools such as Uproot (to load data into structures such as Awkward Arrays), have revolutionized the I/O profile. This presentation starts by showing how this initial bottleneck was addressed with the help of Linux kernel-level tuning and a major revamp of the XRootD scheduler configuration, by changing its behavior from the default asynchronous mode (designed for a balance of interactive and batch jobs) to a more dynamic threaded model, better optimized for this new high-throughput, high-concurrency workload type.
While these optimizations successfully addressed the I/O wait bottleneck and dramatically improved performance, they brought a new kind of bottleneck to the table as transfer requests increased again: high CPU saturation.
The second part of this work presents the analysis of XRootD logs showing the correlation of this CPU load with the specifics of the new columnar analysis based workloads (a shift from sequential
ofs_readoperations to one dominated by a significant number of vectorized read (readV) requests). The analysis in this work shows a direct link between the number of readVs per TCP connection and server CPU load, identifying this as the new performance-limiting factor. We will outline our multi-stage tuning approach, discuss the analysis of these intensive bottlenecks, and show our capacity planning model to help deal with these intensive columnar analysis workloads.Speaker: Kenyi Paolo Hurtado Anampa (University of Notre Dame (US)) -
179
Transfer Learning based Resource Usage Prediction in In-network Caching
The rapid growth of data volumes in high-energy physics (HEP) collaborations, such as the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC), has necessitated the adoption of regional in-network caching strategies to mitigate data access latency. However, these caches often exhibit varying efficiencies across locations due to differing access patterns and storage policies. Improving resource utilization could significantly increase the performance of scientific computing infrastructure — yet exploring what-if scenarios for capacity planning has remained challenging.
This study investigates cache utilization patterns across three regional caches supporting the CMS experiment, situated in Southern California, Chicago, and Boston. We have developed two complementary prediction methodologies to forecast cache hit rates under hypothetical storage capacities: an LSTM-based model employing transfer learning, and a simpler analytical approach leveraging the footprint of active files for estimating cache hits. The transfer learning methodology utilizes observed modifications in storage capacity at the Southern California site to inform predictions for the Chicago and Boston caches, which have maintained their original capacities. A central contribution of this work is the application of these two distinct prediction techniques to cross-validate the results, thereby enhancing confidence in the what-if scenario analyses.
Our findings demonstrate that a two-fold increase in the storage capacity of the Chicago cache could potentially elevate its cache hit rates from 50% to 80%, significantly improving resource utilization. The integration of machine learning and analytical techniques presented herein offers a validated framework for optimizing cache efficiency, informing resource allocation, and guiding future cache deployments and resource management strategies within large-scale scientific collaborations.
Speaker: Chin Guok (ESnet)
-
174
-
Track 2 - Online and real-time computing
-
180
Workflow setup and configuration for the ALICE online processing
The ALICE experiment at CERN continuously reads out and records data at interaction rates of up to 50 kHz of Pb-Pb collisions. Online processing and reconstruction play a vital role for handling the enormous amounts of data, compressing about 3.5 TB/s of detector raw data down to 160 GB/s of compressed input data for offline reconstruction. The online processing is performed on dedicated Event Processing Nodes (EPNs). As the processing software is in continuous development, regular updates of the software on the EPNs are desirable but, at the same time, they need to be independent of ongoing online operations. In addition, the software needs to be validated on a decoupled production-like setup before being deployed on the production farm and the online workflow should be easily reproducible on a standalone server or laptop. Therefore, the topology of the global workflow graph and each individual process can be configured with a variety of common and unique options. We present how the topology is configured, taking into account the run parameters of the current data taking, input requirements of downstream components in the graph and configuration parameters for all individual tasks. In particular, we allow simple application of parameter overrides at run startup for tests as well as for mitigation of immediate issues during the data taking. The benefits of such an adaptable system will also be demonstrated by real-world examples.
Speaker: Ernst Hellbär (CERN) -
181
Towards Collaborative, Web-Native Monitoring Dashboards for Online and Archived Histograms in ATLAS using Grafana and JSROOT
In LHC Run 3, several hundred thousand histograms are continuously updated during data taking and used by automated algorithms for data quality assessment. A subset of these histograms is also presented to experts. The current online histogram display, based on a standalone C++ application using ROOT and Qt, provides reliable functionality but offers limited integration with modern web technologies, making it less convenient for use within the globally distributed ATLAS collaboration.
To address these limitations, for the ATLAS Phase-II Upgrade we are introducing a new monitoring visualization framework based on Grafana dashboards. We have developed two custom Grafana plugins to provide seamless access to ATLAS monitoring data: a datasource plugin that retrieves histograms from the online Information Service (IS) during data taking and from the Monitoring Data Archive (MDA) for retrospective analysis, and a panel plugin that renders histograms using JSROOT, enabling interactive ROOT-style visualization directly in the web browser. The panel plugin supports editing and persistent storage of visualization parameters within the dashboard configuration, which are automatically reapplied when histograms are refreshed, ensuring consistent display behavior across sessions.
This new approach provides a unified, fully web-based monitoring environment that offers access to both online and historical monitoring data and supports seamless integration with operational and performance metrics. It simplifies deployment and maintenance by removing the need for standalone client applications and aligns ATLAS with widely adopted open-source observability tools. In this contribution, we present the system architecture, implementation details, performance characteristics, and plans for full deployment during the Phase-II Upgrade.Speaker: ATLAS TDAQ collaboration -
182
Hydra: A scalable, open-source vision system for continual data quality monitoring
Maintaining high data quality in modern Nuclear and High Energy
Physics experiments increasingly requires scalable, automated solutions
as data rates and detector complexity continue to grow. Traditionally, hu-
mans monitored data quality with varying skill sets and expertise, while
any automation was typically overly bespoke, covering only specific de-
tector systems or processes. These human-driven methods do not scale
well as experimentation scales. To solve this Jefferson Lab has devel-
oped Hydra, a scalable open-source framework for training and managing
Artificial Intelligence (AI) models for near real-time monitoring. Hydra
enables the training, validation, and management of vision models that
operate directly on detector monitoring images, providing consistent and
scalable assessment of data quality across all of Jefferson Lab’s experi-
mental halls. Hydra is integrated into a web-based interface built using a
React front end and Flask backend and is deployed lab-wide to support
continuous experimental operations. This allows shift crews and experts
to rapidly interpret model outputs, validate findings, and focus attention
on anomalous conditions rather than routine inspection. Hydra enhances
operational efficiency, improves consistency in data quality assessment,
and provides quantitative insight into detector and human performance.
This talk will highlight the software, scalability, and proven reliability
in 24/7 operations, along with its extensibility toward future vision and
multi-modal based monitoring applications.Hydra: A scalable, open-source vision system for
continual data quality monitoring
Thomas Britton, Torri Jeske, Raiqa Rasool, Nataliia Matsiuk, Darren Upton, Jordan O’Kron
December 2025
Abstract
Maintaining high data quality in modern Nuclear and High Energy
Physics experiments increasingly requires scalable, automated solutions
as data rates and detector complexity continue to grow. Traditionally, hu-
mans monitored data quality with varying skill sets and expertise, while
any automation was typically overly bespoke, covering only specific de-
tector systems or processes. These human-driven methods do not scale
well as experimentation scales. To solve this Jefferson Lab has devel-
oped Hydra, a scalable open-source framework for training and managing
Artificial Intelligence (AI) models for near real-time monitoring. Hydra
enables the training, validation, and management of vision models that
operate directly on detector monitoring images, providing consistent and
scalable assessment of data quality across all of Jefferson Lab’s experi-
mental halls. Hydra is integrated into a web-based interface built using a
React front end and Flask backend and is deployed lab-wide to support
continuous experimental operations. This allows shift crews and experts
to rapidly interpret model outputs, validate findings, and focus attention
on anomalous conditions rather than routine inspection. Hydra enhances
operational efficiency, improves consistency in data quality assessment,
and provides quantitative insight into detector and human performance.
This talk will highlight the software, scalability, and proven reliability
in 24/7 operations, along with its extensibility toward future vision and
multi-modal based monitoring applications.Speaker: Thomas Britton -
183
ClickHouse-Based ATLAS Operational Monitoring Data Archiving Service Prototype
Since the beginning of LHC Run 2, the Trigger and Data Acquisition system of the ATLAS experiment at the Large Hadron Collider (LHC) at CERN has provided an operational monitoring data archiving service used by thousands of online clients. During data-taking periods, this system publishes various operational monitoring data to continuously monitor the status of hardware and software components at an average update rate of 600 kHz.
The service implementation used during LHC Runs 2 and 3 was based on a custom-built time-series database. However, it was decided to explore the possibility of replacing this system with a more recent, solid time-series database technology. After evaluating several options, the ClickHouse database was selected for the prototype.
This paper details the testing of ClickHouse using a subset of ATLAS operational monitoring data archived during LHC Run 3. It includes performance tests conducted on the testbed and compares the results with the current system.Speaker: ATLAS TDAQ collaboration -
184
Optimising Memory Usage in the LHCb Online Mover with a Rust-Based Architecture
The LHCb Online Mover is a critical component of the LHCb online computing stack, responsible for streaming data accepted by the High Level Trigger 2 (HLT2) from online storage to long-term offline infrastructure. During data-taking, data is produced at sustained rates of up to 20 GB/s, with bursts reaching 50 GB/s. For efficient long-term storage, the data must be compressed and packed into files. Sustaining these rates without interfering with trigger operations imposes strict CPU and memory constraints. Approximately 120 PB of data is processed per year and subsequently exported to EOS and registered in DIRAC for offline processing and analysis.
Zstd compression dominates CPU load, while DIRAC requires files of 5–10 GB. These files can only be transferred via XRootD at approximately 200 MB/s. To meet the overall throughput requirements, several such files must therefore be transferred in parallel, requiring temporary buffering while new files are produced concurrently. SSDs cannot withstand the required write endurance and HDDs are too slow, making RAM the only viable buffering medium. Minimising the memory footprint is therefore essential for scalable operation.
In this work, we present a newly redesigned Online Mover architecture based on Rust. The new design unifies data handling and compression pipelines, employs efficient multithreading, and introduces asynchronous I/O for network transfers. Several design variants were benchmarked in terms of throughput and memory usage. The optimal solution achieved a 90 % reduction in RAM consumption while increasing total throughput by 15 %, enabling a more scalable and cost-efficient Online Mover service.
Speaker: Robert Laszlo Gulyas (CERN) -
185
Reinforcement Learning-Based Control of Polarized Target and Goniometer Systems at Jefferson Lab
Jefferson Lab is developing autonomous control systems for polarized cryogenic targets and linearly polarized photon beams, enabling stable, high-performance operation over extended experiment run periods. Historically, maintaining optimal polarization of these critical systems required manual tuning by expert operators. This process is sensitive to experience and prone to human error, and keeps operators focused on low-level system adjustments rather than high-level oversight. Within the AI-Optimized Polarization (AIOP) project, these control systems leverage uncertainty-aware surrogate models and high-fidelity simulation environments to enable reinforcement learning agents to optimize control policies while respecting experimental and operational constraints. For cryogenic targets, surrogate models trained on historical data predict the polarization as a function of microwave frequency, accumulated radiation dose, and electron beam current. For photon beams produced with diamond radiators, simulation environments model the dependence of the photon spectrum and polarization on beam and radiator settings, providing testbeds for optimization strategies. We will present the surrogate- and simulation-based environments used for training, show the improved performance for polarized cryotargets, implementation strategies for both use cases, and a road map toward operations in which autonomous systems control routine shift operations while humans focus on oversight, safety, and scientific objectives.
Speaker: Torri Jeske
-
180
-
Track 3 - Offline data processing: Core Software and Frameworks 1
-
186
Evaluating Error-Bounded Lossy Compression on HEP Data with LossBench
The High-Luminosity Large Hadron Collider (HL-LHC) is expected to produce data at the exabyte scale, motivating the exploration of new methods for reducing data volumes. Error-bounded lossy compression has been adopted in many scientific domains as an effective strategy for reducing storage and I/O costs without compromising the quality of downstream analyses.
However, selecting an appropriate compressor for a dataset is not straightforward. Error-bounded lossy compression encompasses a diverse set of techniques. The performance of these techniques may depend on both the statistical properties of the target data and/or tunable parameters.
We present LossBench, a command-line utility for benchmarking error-bounded lossy compressors, with support for ROOT TTrees. LossBench takes a compressor configuration and benchmarks it on data from a specified ROOT file, writing results in the JSONL format. Recorded metrics include compression ratio, compression and decompression throughput, and several common distortion measures. Optionally, decompressed data can be written to a new TTree to assess the impact of compression on physics analyses. LossBench's framework enables the integration of new compressors and metrics to allow rapid experimentation with novel/custom compression techniques.
We demonstrate the use of LossBench through a case study applying SZ3, a state-of-the-art error-bounded lossy compressor, to ATLAS open data. Across a range of SZ3 configurations, we observe that certain parameter choices achieve higher compression ratios than lossless methods while maintaining low distortion. These results offer insight into which lossy compression strategies may be appropriate for HEP datasets and highlight key considerations when evaluating lossy compression for particle physics.
Speaker: Amy Byrnes -
187
Exploring lossy storage for analysis data with ROOT's RNTuple
With the data deluge that is expected to come with the High-Luminosity LHC and limited storage resources, the need to reduce the on-disk file size of High-Energy Physics (HEP) data becomes even more pressing. Lossless compression algorithms and encodings are already extensively used across all experiments data tiers, leading to often significant reductions of the total on-disk data volume for the collaboration. However, the aforementioned future storage challenges naturally lead to the question of whether more could be done. One potential next step to reduce data volumes even further is the use of lossy encoding schemes to store physics analysis data. The challenge with this approach, however, is the inherent loss in precision and (perceived) lack of predictability on its effects. In this contribution, we explore the impact of lossy compression on HEP data stored in ROOT's new RNTuple data format, which offers fine-grained mechanisms for low-precision data storage. We do this by evaluating different lossy encodings applied on a selection of particle quantities, and mapping out their effects on an open-data based analysis. With this evaluation, we aim to help the community in making informed decisions on the use of lossy compression for their use case.
Speaker: Florine Willemijn de Geus (CERN/University of Twente (NL)) -
188
Efficient Data Layouts and heterogeneous data handling in CMSSW
The High-Luminosity LHC will vastly increase both the volume and complexity of data to be processed within the CMS software framework (CMSSW), pushing computational throughput to its limits. Efficient use of accelerator hardware, especially GPUs, will be central to sustaining reconstruction and analysis performance under these conditions. Among the most impactful design choices for GPU-accelerated workloads is data layout, as memory-access patterns strongly influence the achievable level of coalesced reads and overall hardware utilization. Structure-of-Arrays (SoA) layouts naturally align with these requirements thanks to their contiguous, field-wise organization.
In this work, we present a generic and extensible SoA backend based on the Boost Preprocessor library, enabling highly portable and strongly typed data representations. The new system introduces MultiView, a mechanism that groups multiple SoA collections with identical schemas into a single logical entity. This abstraction removes the need for costly data reshaping, streamlines inter-module communication, and simplifies the design of downstream algorithms. A key outcome of this design is seamless interoperability with ML frameworks like PyTorch and SOFIE: SoA structures can be directly exposed as Tensors without transformation or memory copies, enabling fast heterogeneous inference workflows where machine-learning models operate natively on CMSSW event data.
Beyond in-memory layout optimization, we also investigate integration of NVIDIA GPUDirect Storage (GDS) to establish a direct, high-bandwidth I/O path between GPU memory and local or remote storage. By relieving the CPU of data-movement responsibilities, GDS has the potential to reduce latency and improve performance in I/O-bound workflows, an increasingly relevant challenge as CMS moves toward HL-LHC data rates.
Bibliography:
[1] M. Holzer, L. Beltrame, A. Bocci, F. Pantaleo, and S. Balducci, "User Story: Integration of ROOT RNTuple to CMSSW's SoA data structures," Nov. 2025.
[2] L. Beltrame, F. Pantaleo, A. Bocci, and E. Cano, "Evolution of Data Structures for Heterogeneous Reconstruction in CMSSW," 2025. doi: 10.17181/kd13h-42e08.Speaker: Felice Pantaleo (CERN) -
189
Modernizing the ATLAS Persistency Framework for the HL-LHC
The ATLAS experiment has surpassed 1 exabyte of stored data, much of it managed through the Athena POOL Replacement (APR) persistency framework. Derived from the original LCG POOL project, APR has long provided a technology-independent abstraction layer that enabled seamless support for multiple backends, including ROOT TTree, TKey, and more recently RNTuple. While APR has proven remarkably durable, its architecture reflects early-2000s design patterns, such as deep layering and outdated C++ constructs. Furthermore, because it originated as an external third-party software package, it provided only limited integration with the Gaudi/Athena model. These factors, along with redundant and unused components, make maintenance and new development efforts increasingly difficult. ATLAS is now exploring a modernized core I/O framework that preserves APR’s core strength, backend independence, while achieving closer integration with Athena and leveraging modern C++ features. The redesigned architecture focuses on maintainability, performance, and extensibility for future backends, building on lessons from earlier, highly flexible designs that served well during the initial few-PB data-taking phase but whose level of generality is no longer necessary now that years of operational experience have clarified the experiment’s precise needs at Exabyte scale. This presentation outlines the proposed design direction and highlights preliminary results from early prototyping studies toward a next-generation ATLAS persistency layer.
Speaker: Marcin Nowak (Brookhaven National Laboratory (US)) -
190
Lossy data compression for simulated CMS pileup datasets
The High-Luminosity upgrade of the LHC (HL-LHC) will present an unprecedented computational challenge for the CMS experiment, with the average number of simultaneous proton-proton interactions (pileup) expected to reach 200 per bunch crossing. Accurately modeling this background environment requires the production of massive, high-fidelity simulated event datasets. Currently, CMS employs a premixing strategy, where libraries of pileup events are simulated, digitized and stored for later reuse. However, the large volume of these datasets is becoming a significant storage bottleneck.
In this work we present a high-performance lossy data compression method designed to reduce the storage footprint of simulated CMS pileup datasets, with an initial focus on pixel detector data. The core of this approach utilizes Vector Quantization (VQ) techniques to map high-dimensional pixel hit information into compact codebook indices. We explore several VQ methods and discuss the trade-off between compression ratios, reconstruction fidelity and performance. Our results demonstrate that this method achieves substantial data compression rates while maintaining control over the reconstruction error. Finally, we discuss the integration of this compression layer into the CMSSW framework and its potential to alleviate storage pressures during the HL-LHC era.Speaker: Tomas Raila (Vilnius University (LT)) -
191
Usage of GPUs for ALICE Run 3 Offline Reconstruction in the GRID
ALICE is the dedicated heavy ion experiment at the LHC at CERN recording lead-lead collisions at a rate of up to 50 kHz interaction rate.
ALICE was the first LHC experiment to leverage GPUs for online data processing in LHC Runs 1 and 2, and its Run 3 online data processing scheme today is fully based on GPUs with more than 90% of the compute load offloaded to the accelerator.
In order to use its online processing server farm also for offline processing in an efficient way while the LHC is not operating, ALICE has been running the offline TPC tracking on GPUs since 2023.
Since then ALICE is conducting an ongoing effort to offload more offline compute steps to GPUs, and to use the GPUs at other GRID sites besides the ALICE online computing farm for offline reconstruction.
The talk will give an overview of the current status and the commissioning of GPUs for offline processing and outline the future plans.
This includes in particular running GRID jobs on the NVIDIA GPUs of the NERSC Perlmutter cluster, which is the first time an LHC experiment uses GRID GPUS for offline reconstruction.
The performance as well as GPU, CPU and memory utilization will be shown when offloading more steps than only TPC tracking to GPU, in particular ITS GPU tracking.Speaker: David Rohr (CERN)
-
186
-
Track 3 - Offline data processing: Reconstruction 2
-
192
Optimal use of timing measurement in vertex reconstruction at CMS
The upgrade of the CMS apparatus for the HL-LHC will provide unprecedented timing measurement capabilities, in particular for charged particles through the Mip Timing Detector (MTD). One of the main goals of this upgrade is to compensate the deterioration of primary vertex reconstruction induced by the increased pileup of proton-proton collisions by separating clusters of tracks not only in space but also in time.
This contribution discusses the latest algorithmic developments to optimally exploit such new information. Modern machine-learning-based techniques are explored as a possible alternative to traditional approaches: graph neural network architectures are studied to simultaneously cluster particles and assign them the correct mass hypotheses, which is needed to correctly determine the time at the vertex.
Speaker: Prabhat Solanki (Universita & INFN Pisa (IT)) -
193
Minimising Event Size, Maximising Physics: Inclusive Particle Isolation for LHCb's Run 3
To achieve higher physics precision, the LHCb experiment is operating at an increased instantaneous luminosity in Run 3, leading to an unprecedented challenge in total data volume. A single proton-proton collision generates hundreds of tracks, yet the target signals involve only a few; this imbalance severely inflates the event data size. To efficiently reduce the event size while retaining the physics information required for targeted analyses, a suite of inclusive isolation tools has been developed, featuring both the classical methods and a novel Inclusive Multivariate Isolation (IMI) algorithm. The IMI tool is designed to robustly distinguish the signal tracks from high pile-up background tracks, adapting the strengths of traditional isolation techniques while handling the diverse topologies and kinematics across various decay chains. These tools are currently deployed within the Run 3 LHCb software framework for both the High-Level Trigger and the offline reconstruction chain. By achieving a 45% reduction in total data size, the IMI tool preserves full physics performance, demonstrating a high selection efficiency of over 99% for target signal particles. Crucially, its robustness and stability have been validated under real data-taking conditions. Looking forward, the IMI methodology shows great potential as a fast, lightweight approach to support more compute-intensive selection strategies in the high-multiplicity environment of the High-Luminosity LHC.
(This work is based on a paper submitted to the journal, Computer Software and Big Science (CSBS).)Speaker: Ching-Hua Li (Aix Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France) -
194
Towards more precise data analysis with Machine-Learning-based particle identification with missing data
Identifying products of ultrarelativistic collisions delivered by the LHC and RHIC colliders is one of the crucial objectives of experiments such as ALICE and STAR, which are specifically designed for this task. They allow for a precise Particle Identification (PID) over a broad momentum range.
Traditionally, PID methods rely on hand-crafted selections, which compare the recorded signal of a given particle to the expected value for a given particle species (i.e., for the Time Projection Chamber detector, the number of standard deviations in the dE/dx distribution, so-called "nσ" method). To improve the performance, novel approaches use Machine Learning models that learn the proper assignment in a classification task.
However, because of the various detection techniques used by different subdetectors (energy loss, time-of-flight, Cherenkov radiation, etc.), as well as the limited detector efficiency and acceptance, particles do not always yield signals in all subdetectors. This results in experimental data which include "missing values". Out-of-the-box ML solutions cannot be trained with such examples without either modifying the training dataset or re-designing the model architecture. Standard approaches to this problem used, i.e., in image processing involve value imputation or deletion, which may alter the experimental data sample.
In the presented work, we propose a novel and advanced method for PID that addresses the problem of missing data and can be trained with all of the available data examples, including incomplete ones, without any assumptions about their values [1,2]. The solution is based on components used in Natural Language Processing Tools and is inspired by AMI-Net, an ML approach proposed for medical diagnosis with missing data in patient records.
The ALICE experiment was used as an R&D and testing environment; however, the proposed solution is general enough for other experiments with good PID capabilities (such as STAR at RHIC and others). Our approach improves the F1 score, a balanced measure of the PID purity and efficiency of the selected sample, for all investigated particle species (pions, kaons, protons).
[1] M. Kasak, K. Deja, M. Karwowska, M. Jakubowska, Ł. Graczykowski & M. Janik, Eur.Phys.J.C 84 (2024) 7, 691
[2] M. Karwowska, Ł. Graczykowski, K. Deja, M. Kasak, and M. Janik, JINST 19 (2024) 07, C07013
Speaker: Lukasz Graczykowski (Warsaw University of Technology (PL)) -
195
Machine Learning–Based Offline Search for Long-Lived Particles in the LHCb Muon System
Long-lived particles (LLPs) are present in many Standard Model extensions and could provide solutions to long-standing problems in modern physics. In this work, machine-learning based techniques are developed to probe for the presence of such particles, specifically Heavy Neutral Leptons (HNLs) and Axion-Like Particles (ALPs), decaying in the LHCb muon detector. Their decays will produce electromagnetic or hadronic showers, which can be reconstructed by effectively turning the muon system into a sampling calorimeter.
The algorithms are designed for offline analysis and make use of offline saved events in which all raw detector hits are stored. To overcome discrepancies between simulation and reality, a hybrid data strategy is employed, combining real-data and simulation datasets for training purposes. The approach integrates these machine-learning methods with standard techniques, such as hit clustering, for efficient offline processing of large datasets. This enables the use of raw information from the detector while maintaining computational efficiency.
Speaker: Valerii Kholoimov (EPFL - Ecole Polytechnique Federale Lausanne (CH)) -
196
Intelligent Primary Vertex Reconstruction in ATLAS Using Deep Neural Networks
The non-linear time scaling of traditional primary vertex reconstruction algorithms with increasing pile-up presents a major challenge for future high luminosity operations, particularly in the HL-LHC era where hundreds of simultaneous proton-proton collisions are expected per event. This motivates the development of fast, scalable algorithms whose performance remains robust as pile-up grows. We present a deep learning based primary vertex finding approach that leverages reconstructed track parameters to directly predict vertex positions and track-to-vertex associations. Our architecture combines convolutional neural network (CNNs) with graph neural network (GNNs) in a sequential hybrid design: CNN extract high-level features from track parameter distributions to estimate primary vertex locations, while GNN capture geometric relationships among tracks to determine track-to-vertex associations. We will demonstrate the algorithm’s performance using ATLAS Run-3 data and evaluate its scalability and reconstruction accuracy under HL-LHC conditions. These results highlight the potential of deep learning to provide a high-throughput, non-scaling alternative to classical primary vertex reconstruction techniques.
Speaker: Rocky Bala Garg (Stanford University (US)) -
197
Exploiting precise timing information for improving the event reconstruction at the CMS experiment and at future colliders
The extreme pileup conditions expected at the High-Luminosity LHC (HL-LHC) requires new technologies to cope with the higher occupancy. One of the strategies adopted to address this challenge is the usage of precise timing information in event reconstruction. The CMS experiment will introduce two new sub detectors with timing capabilities: the MIP Timing Detector (MTD) covering both barrel and endcap regions and the High-Granularity Calorimeter (HGCAL) in the endcaps only. Together they will provide a resolution in the order of tens of picoseconds, enabling the reconstruction of events in 5 dimensions (x, y, z, E, t).
Time information is being integrated into TICL, the framework developed for the HGCAL reconstruction and integrated in the CMS software. In HGCAL, each sensor with an energy deposit above a certain threshold will have a time information, that is used to compute the time of the 2D and 3D clisters. MTD instead assigns a time information to the tracks for charged candidates. During the linking between energy deposits in the calorimeters and tracks for the tracker, times from HGCAL and MTD are required to be compatible within a threshold to allow the linking. In the end, the time of the final charged candidates is obtained by combining the HGCAL and MTD times, improving the candidate time resolution.
Time information is also being exploited at future colliders, where detectors will features trackers with timing capabilities at each layer and HGCAL-like detetors with a time information for calorimeter hits as well. This opens the path towards 4D clustering, where spatial and temporal hits information are jointly used for pattern recognition, and toward particle-flow algorithms that are time-aware.
This contribution will present the current usage of timing in high energy physics event reconstruction, highlighting the algorithmic strategies enabled by HGCAL and MTD, and outline the evolution toward 4D event reconstruction at CMS and at future colliders.
Speaker: Aurora Perego (Universita & INFN, Milano-Bicocca (IT))
-
192
-
Track 4 - Distributed computing
-
198
Dr.Sai: A Pioneering LLM-Based Autonomous Agent for Physics Analysis at BESIII
We present Dr.Sai, a large language model (LLM)-powered multi-agent system designed to autonomously execute physics analysis at BESIII experiment. It interprets a physicist’s natural language request, decomposes it into tasks (e.g., data skimming, fitting), calls the appropriate scientific tools, and executes the workflow end-to-end. A demonstration will show Dr.Sai completing multiple simple analysis chains from a simple query, achieving over 90% overall success rate. This system marks a move toward scalable, AI-augmented discovery. Dr.Sai enhances productivity by automating complex workflows while ensuring full traceability, offering a practical blueprint for an “AI Scientist” assistant applicable to data-intensive sciences.
Speaker: Zhengde Zhang (中国科学院高能物理研究所) -
199
Artificial Intelligence for Operations at CMS
A2rchi (AI Augmented Research Chat Intelligence) is an open-source, end-to-end framework for building AI agents to automate research and operational workflows. Various groups have already applied the system to their use case; the most advanced is the Computing Operations (CompOps) team at the Compact Muon Solenoid (CMS) experiment at CERN. CompOps has a private, constantly evolving, and scattered knowledge base, with scarce personnel on short term contracts. A2rchi puts together state-of-the-art, open-source tools like LangChain, knowledge graphs, and Model Context Protocol, and combines documentation, code, tickets, and live diagnostics to accurately retrieve relevant information, assisting operators in daily tasks, improving operator efficiency, and lessening the load on experts. Further work is being undertaken to develop fully autonomous agents to perform non-trivial operations, which is reliant on highly accurate retrieval and expert knowledge. Other groups at CMS deploying A2rchi for their use case include the Data Quality Monitoring (DQM) team and a group focusing on retrieval of the vast analysis code and documentation across the CMS landscape.
Speaker: Pietro Lugato (Massachusetts Inst. of Technology (US)) -
200
An On-Grid deployment of ML Inference as a service at a Tier-2
Recent developments demonstrate that HEP software can run effectively on
GPUs, while advances in ML models have shown predictable scaling laws
for compute, data, and model size, consistent with trends across the
wider AI community. As a result, there is growing demand within HEP for
inference using larger models that have already delivered significant
physics gains, such as b-tagging in ATLAS with the GN2 transformer-based
neural network.At present, ML inference in HEP is largely performed on CPUs using
translation libraries such as ONNX. However, a sharp rise in RAM
costs—driven by supply constraints and strong demand for HBM2
high-bandwidth memory—makes it increasingly unlikely that WLCG sites
will move far beyond the 2 GB per-job memory limit. In response, both
the ATLAS and CMS collaborations have proposed inference-as-a-service
solutions to simplify model deployment while addressing memory
constraints and rapidly growing model sizes.One possible implementation is an on-Grid inference-as-a-service
deployment that uses site-local GPUs with the NVIDIA Triton inference
server and standard Grid tools, including ARC-CE, HTCondor, CVMFS, and
XCache. We describe progress on this approach at the Glasgow Tier-2 WLCG
site, along with tests involving the submission of Grid jobs. Reusing
underutilised GPU resources already available at Grid sites could offer
a pragmatic way to meet the increasing demand for this type of service.Speaker: Albert Gyorgy Borbely (University of Glasgow (GB)) -
201
"ChatGPT run my workflow": Exploring agentic AI interfaces to workflow management
For distributed High Throughput Computing (dHTC), the original -- and potentially still most popular -- interface for workflow management is the command line interface (CLI). Decades of researchers have been trained on the CLI and knowledgeable users can effectively integrate it into larger scripts with little friction. As the ecosystem has grown and matured, new interfaces have appeared such as Application Programming Interfaces (APIs), targeting automated systems that interact with the dHTC layer; RESTful Interfaces, targeting remote interactions over the internet; or web user interfaces, targeting individuals accessing via the browser.
In 2025, a new approach surfaced: an “AI interface” which allows Large Language Model (LLM)-based agents to invoke and interact with tools as part of an agentic AI driven workflow. In this work, we present new interfaces, based on the popular “Model Context Protocol” (MCP) to two common dHTC software packages, the HTCondor Software Suite and the Pelican Platform. These MCPs provide building blocks: from an user platform like VS Code, agents can submit jobs, check status, or transfer objects between storage. How much can the AI agent close the gap between these “building blocks” and “running science”? Can a user leverage these tools to work with complex cyberinfrastructure with minimal expertise? Can an agent effectively monitor and fix problematic workflows? This work explores not just the immediate functionality but how well the setup works across sample problems encountered in the dHTC ecosystem.
Speaker: Brian Paul Bockelman (University of Wisconsin Madison (US)) -
202
AI in the EGI e-infrastructure - Past experiences and future strategies
The EGI Federation, that emerged from WLCG in 2010, has been a cornerstone of European and global digital science for over 15 years, providing a federated e-infrastructure for 150,000+ researchers across all scientific disciplines. The recently approved “EGI Federation Strategy 2026–2030” sets out an ambitious plan for the next 5 years to ensure that EGI remains an accelerator for science.
One of the centrepieces in the new strategy is strengthening our value proposition in AI through 3 activities: (1) Expanding EGI’s sovereign Compute-Data Federation with AI-ready compute resources (GPUs, HPC sites), and with AI-ready scientific datasets; (2) Eliminating and working around policy and funding barriers that limit cross-border access and delivery of compute, data and application services; (3) Strengthening EGI as an AI R&I ecosystem where innovations persist beyond project lifecycles.
Recent EGI flagship projects, such as iMagine (AI-based imaging data and services for aquatic science) and InterTwin (Co-designing and prototyping an AI-driven interdisciplinary Digital Twin Engine) demonstrated the use of AI frameworks, models and data across shared, national facilities, often overlapping with WLCG sites. Current EGI flagship initiatives further strengthen the AI trajectory: (1) EOSC Data Commons establishes a trusted Research Commons with AI tools that provide seamless access and integration of research outputs, applications and services. (2) RI-SCALE develops and deploys ‘Data Exploitation Platforms’ that enable peer-partnerships between Research Infrastructure data holdings and HPC/cloud providers to jointly offer scalable, AI-driven analysis and processing environments for scientists. (3) SCIANCE establishes the foundations for the ‘Resource for AI Science in Europe’ (RAISE) programme that will pool European compute resources, specialized datasets, and expert talent, to accelerate breakthroughs in and with AI in European science and strategic sectors.
This presentation will provide an overview of recent AI-experiences in EGI, and will detail EGI’s future strategy concerning AI. The contribution will facilitate knowledge sharing between the WLCG and EGI communities, with the ultimate goal to enable coordinated and collaborative uptake of AI within our intertwined infrastructures.
Speaker: Sergio Andreozzi -
203
Towards Autonomous Computing Operations with AI Assistance for Belle II Experiment.
The Belle II experiment at KEK, Japan, operates with data volume reaching over 30 petabytes, with datasets distributed and processed worldwide using DIRAC and Rucio. With the globally distributed computing infrastructure, and expecting an order of magnitude larger data volume, we face operational challenges for both computing experts and end-users. The end-users frequently struggle with multiple issues (e.g. problem with job submission, locating relevant documentation) generating load on experts who provide support.
This contribution reports on ongoing research and development of an intelligent, automated assistance system. The proposed system is designed to optimize experiment workflows, diagnose common failures, and provide continuous 24/7 monitoring to reduce service downtime and accelerate incident response. Our work leverages recent advances in open-source Large Language Models (LLMs) combined with Retrieval-Augmented Generation (RAG) to incorporate experiment-specific documentation such as software guides, troubleshooting resources, and FAQs for authoritative, context-aware assistance. In parallel, we explore AI-Agents for automated analysis of grid job logs, failure classification, and root-cause suggestion.
This research proposes a local LLM infrastructure for enhanced privacy, security, and sustainability by keeping sensitive data internal. The self-contained deployment allows for task-specific fine-tuning, integration with Model Context Protocol (MCP) tools, and long-term cost control. The contribution details the prototype architecture, preliminary evaluation, and a roadmap to improve Belle II Experiment operations and user experience.Speaker: Mr Dhiraj Kalita (KEK (High Energy Accelerator Research Organization))
-
198
-
Track 5 - Event generation and simulation: Fast Simulation 4
-
204
Data Overlay for Underlying Event modeling in Heavy Ion collisions in ATLAS
Accurate modeling of the underlying event (UE) in heavy-ion collisions poses a significant challenge, particularly for analyses involving hard probes. No existing Monte Carlo (MC) simulation can reproduce the complex underlying physics. To address this, the ATLAS Collaboration developed an innovative technique that overlays simulated signal events onto real minimum-bias data recorded by the detector (Data Overlay). This approach ensures that both the UE and detector response are represented with high fidelity in the reconstructed events. This strategy reduces the computational cost of MC production by limiting full detector simulations to relatively simple signals.
For the Data Overlay method, a dedicated dataset is collected at a rate exceeding 1,000 events per period during which detector conditions remain stable. The data are preprocessed such that detector readouts and reconstructed vertex positions are available on an event-by-event basis. In the Data Overlay procedure, the simulated signal vertex is aligned with the corresponding vertex in data, and the combined event is processed through the standard ATLAS reconstruction chain.
This technique has been successfully applied in Run 2 ATLAS heavy-ion analyses but has now been modernized for use with the heavy-ion Run 3 dataset. An extensive validation program has been carried out, and the results of this study will be presented together with the technical details of this approach.The first application on light-ions data taken in the summer of 2025 is envisaged.
Speaker: Tadej Novak (Jozef Stefan Institute (SI)) -
205
CaloTrilogy: Toward a Breakthrough in One Step, End-to-End, Physics-Guided Shower Generation for Modern Calorimeters
High-precision calorimeter simulation at current and future colliders puts growing demands on computing resources, motivating ML-based alternatives to traditional Monte Carlo tools such as Geant4. In practice, generative models based on flow matching and diffusion have become de facto standards for high-dimensional fast calorimeter simulation, thanks to their excellent fidelity and strong track record across ML research and industry applications. However, this typically comes with longer generation time, since high precision shower modeling needs more function evaluations during inference. In addition, many current proposals introduce separate networks to correct or constrain high-level observables, such that the overall pipeline is no longer strictly end-to-end.
In this study, we introduce three ingredients that together form a unified framework aimed at a better balance between generation speed, shower quality, and physics fidelity. First, the method employs an average velocity field integrator that allows sampling in one or a few steps, instead of the many evaluations required by conventional solvers. Second, a learned generative prior based on an isotropic Gaussian in shower space is constructed from the shower rather than from a random noise distribution, which leads to faster convergence and improved shower quality. Third, physics-guided loss terms provide useful inductive bias to constrain key observables during training. These elements act purely as training-time regularizers, so inference remains strictly end-to-end with no additional cost. With only one or a few evaluations, we achieve shower quality competitive with state-of-the-art flow and diffusion models that use O(100) solver steps, demonstrated on several public high granularity calorimeter datasets. The resulting model captures detailed inter-layer shower structure consistent with the underlying physics, which has been challenging for many previous approaches, providing a strong candidate for future fast simulation workflows.Speaker: Cheng Jiang (The University of Edinburgh (GB)) -
206
The electronics simulation software in the JUNO experiment
The Jiangmen Underground Neutrino Observatory (JUNO) is a large-scale neutrino experiment using a 20-kt liquid scintillator Central Detector surrounded by a 35-kt water Cherenkov veto Detector, and an almost 1000-m2 plastic scintillator Top Tracker. Following the completion of detector commissioning, JUNO began physics data taking on August 26, 2025.
The electronics simulation (ElecSim) is the key component of JUNO’s offline software. It takes the photoelectron information from Geant4-based detector simulation and simulates the response of PhotoMultiplier Tube (PMT), the trigger logic and the readout electronics for all sub-detectors. A “pull” workflow is employed to load and process events only when a readout is required, thereby avoiding excessive memory usage.
Following extensive commissioning efforts, ElecSim has been updated to more accurately reflect JUNO electronics. Realistic PMT response models and sub-detector-specific trigger configurations have been updated and validated using real data. PMT parameters, which are known to change during data taking, are stored as conditions data in the database. The software framework has also been optimized to facilitate integration of additional sub-detector components. Finally, we present the performance of a fast-simulation mode for high-energy events and report computing speed on various data samples.
Speaker: Ze Chen -
207
Updates on ML-based fast and flash simulations at LHCb
Experiments in high energy physics rely heavily on simulations to interpret data, optimise detector design, and test theoretical models. Traditionally, simulations involve Monte Carlo event generators and detailed particle interactions with detectors. For the LHCb experiment, 90 % of computing resources are used for simulations, with the calorimeter simulation being the most computationally intensive part. In order to address this challenge, fast and flash simulations based on Generative AI have been implemented. In particular, CaloML is the ML-based, CaloChallenge-compatible fast simulation applied to the electromagnetic calorimeter at the detector transport level. Lamarr is an in-house flash simulation framework that reduces CPU time of the whole simulation phase by bypassing the entire simulation and reconstructions steps with AI-generated output. In this talk, we will present the most recent updates on these fast simulations options. In particular, we will show how the computing efficiency, the precision of the reconstruction steps, and the general performance of the CaloML has improved by incorporating more advanced architectural designs. We will also show preliminary benchmarks of the CaloML on the Upgrade 2 geometry.
Speaker: Michał Mazurek (National Centre for Nuclear Research (PL)) -
208
CMS FlashSim: how an end-to-end ML approach speeds up simulation in CMS
Detailed event simulation at the LHC is taking a large fraction of computing budget. CMS developed an end-to-end ML based simulation framework, called FlashSim, that can speed up the time for production of analysis samples of several orders of magnitude with a limited loss of accuracy. We show how this approach achieves a high degree of accuracy, not just on basic kinematics but on the complex and highly correlated physical and tagging variables included in the CMS common analysis-level format, the NANOAOD. We prove that this approach can generalize to processes not seen during training. Furthermore, we discuss and propose solutions to address the simulation of objects coming from multiple physical sources or originating from pileup. Finally, we present a comparison with full simulation samples for some simplified analysis benchmarks, as well as how we can use the CMS Remote Analysis Builder (CRAB) to submit simulation of large samples to the LHC Computing Grid. The simulation takes as input relevant generator-level information, e.g. from PYTHIA, while outputs are directly produced in the NANOAOD format. The underlying models being used are state-of-the-art continuous Flows, trained through Flow Matching.
With this work, we aim to demonstrate that this end-to-end approach to simulation is capable of meeting experimental demands, both in the short term and in view of HL-LHC; and update the LHC community of recent developments.
Speaker: Filippo Cattafesta (Scuola Normale Superiore & INFN Pisa (IT)) -
209
PARSIFAL: a parametrized simulation for µ-RWELL detectors
PARSIFAL (PARametrized SImulation) is a software tool designed to reproduce the complete response of Gaseous Detectors. It models the involved physical processes through simple parametrization, thus achieving a fast processing times. Existing software, such as GARFIELD++, while robust and reliable, is highly CPU time-consuming. The development of PARSIFAL is motivated by the need to significantly reduce processing time without compromising the precision of a full simulation. A set of parameters, extracted from GARFIELD++ simulations, are used as input, allowing PARSIFAL to run independently. By sampling from parameterized distributions, PARSIFAL can rapidly simulate high-statistics samples, integrating various steps including ionization, diffusion, multiplication, signal induction, and electronics. For the µ-RWELL, the effect of the resistivity layer on the charge spread on the readout has been incorporated, following the treatment proposed by M.S. Dixit and A. Rankin.
PARSIFAL is used to simulate MPGDs, specifically triple-GEM and µ-RWELL chambers. Initially, the results were tuned to match experimental data from test beams using the APV-25 frontend readout by the SRS system, which was simulated within the code. The same procedure was later applied to more modern electronics, namely the TIGER ASIC and the GEMROC system. This new electronics package was implemented in PARSIFAL, and a tuning of the simulated data to real experimental results was performed. This contribution will present the full code, focusing on its latest implementations and a comparison with experimental data obtained from the $\mu$-RWELL.The tuned simulations are now being used for the optimization of new readout electronics. This involves simulating and analyzing the impact of key electronic parameters, such as the dynamic range, noise level, readout segmentation, and the FFT analysis of the waveform.
The final application of the $\mu$-RWELL detector is the muon system for the IDEA experiment proposed for FCC-ee. Optimizing the detector performance in conjunction with the electronics is a key task for the future experiment's design. The main challenge for the muon system is to ensure a good spatial resolution of the order of O(200 $\mu$m) and achieve high granularity to minimize the channel count.
Speaker: Riccardo Farinelli (INFN Bologna (IT))
-
204
-
Track 6 - Software environment and maintainability: Programming Models and Software DesignConveners: Gaia Grosso (IAIFI, MIT), Ruslan Mashinistov (Brookhaven National Laboratory (US))
-
210
Compile-Time Metaprogramming with C++26 and beyond
C++ compile-time metaprogramming techniques - commonly known as “templates” - are extensively used in HEP code to write reusable code and perform optimisations at compile-time. In 2026, the new C++26 standard will be released, including major new compile-time programming features such as reflection,
template for, constexpr if, constexpr allocations, consteval, etc. Reflection could make compile-time metaprogramming significantly less complex, but it involves a learning curve as it is vastly different from existing techniques such as preprocessor macros and templates. Reflection could also play a role for data serialisation (for instance in the context of ROOT I/O), for optimising memory layouts, and for creating Python bindings.This presentation will showcase compile-time metaprogramming using C++26 reflection, providing examples of how and where it is useful to guide the writing of maintainable C++ code in HEP software. Additionally, we preview some exciting new code generation features proposed for future standards.
Speaker: Jolly Chen (CERN & University of Twente (NL)) -
211
Standardization for sustainable and reusable software: std::simd in C++26
For HEP software, longevity is a core requirement: code often outlives several hardware generations. Using a standardized solution for data‑parallelism is therefore the most direct path to sustainable, reusable optimizations. As the lead author of std::simd in the C++ standard and the libstdc++ implementation, I will show how C++26’s std::simd provides a concrete, standards-based illustration of that principle and offers a practical migration pathway.
std::simd replaces fragmented, architecture-specific implementations (intrinsics, brittle auto-vectorized loops) with a single, maintainable code path. Standardization directly tackles HEP’s sustainability challenge by eliminating the maintenance-risk penalty that makes many SIMD projects untenable today.
A standardized solution transforms vectorization from a liability into a sustainable asset:
-
std::simd isn’t merely portable—it enables ecosystem cohesion by introducing new vocabulary types. A stable abstraction for data-parallelism eliminates integration barriers between libraries, allowing composition of vectorized components.
-
It future-proofs multi-decade projects by decoupling algorithms from hardware; new architectures (RISC-V, GPUs) require only compiler updates—not rewrites.
-
CI/CD efficiency: correctness can be tested on one architecture, simplifying pipelines.
-
Knowledge transfer: new developers learn one portable pattern instead of architecture-specific intrinsics, accelerating onboarding.
-
Reduction of technical debt: refactoring to std::simd eliminates architecture-specific, non-portable SIMD code.
Speaker: Matthias Kretz (GSI Helmholtzzentrum für Schwerionenforschung) -
-
212
Performance and integration challenges of using Julia language for trigger and reconstruction
Julia has gained attention in high-energy physics (HEP) as a programming language that combines high-level expressiveness with competitive performance. This work explores its potential as a replacement for C++ in HEP applications, in particular in the context of trigger and reconstruction. The studies reported here include ahead-of-time compilation of jet reconstruction packages, a scheduling demonstrator for an event-processing framework, and a Julia port of the CMS Patatrack standalone pixel tracking.
While Julia offers attractive ergonomics, the studies revealed multiple limitations. Ahead-of-time compilation is still slower and produces significantly larger binaries than C++ alternatives, and residual just-in-time (JIT) compilation may persist in compiled binaries, limiting their suitability for latency-sensitive applications. Although Julia delivers good single-threaded performance for isolated tasks, significant scaling limitations in multi-threaded throughput were observed under frequent memory allocation due to its stop-the-world garbage collector. These issues arise from fundamental language design decisions rather than ecosystem immaturity and may pose significant challenges for adopting Julia in large-scale HEP code-bases or in latency-sensitive, highly parallel, high-throughput applications.
Speaker: Mateusz Jakub Fila (CERN) -
213
Arbitrary Python Execution in C++ with Dynamic Bindings: How Far Can We Go?
High Energy Physics (HEP) software environments make extensive use of blended C++ and Python workflows, combining performance and simple interfaces. In this context, a C++ compiler stack comprising technologies such as Clang, Cling, and cppyy provides generic dynamic Python-C++ bindings and powers many of the Python interfaces used in the field, including those of the ROOT software framework.
Several techniques exist to bridge C++ and Python in mixed-language applications including just-in-time compilation of Python functions to native code, such as Numba-based approaches, as well as dynamic bindings relying on automatic type conversion mechanisms provided by cppyy. Both approaches are effective in many use cases, but constrained by the need to operate on a restricted subset of Python.
In this contribution, we introduce an experimental execution model that leverages cppyy’s dynamic converters to invoke pure Python code directly from within the C++ runtime. This approach enables the execution of Python constructs and libraries that were previously incompatible with existing integration techniques.
We explore how far this alternative model can be pushed: what types of Python functions, objects, and external packages can be used inside a C++ execution environment? What are the performance and usability implications? We then evaluate the limitations and potential of this model using the generic high level interface provided by ROOT, RDataFrame, as a notable example of the wide range of applications enabled by this approach.
Speaker: Silia Taider (CERN) -
214
A modernized interface to ROOT files
The ROOT file is the most widely used format for storing data in HEP. ROOT's TFile, alongside its ancillary
classes, is the main interface to ROOT files, featuring a large number of functionalities both basic and advanced. TFile was designed in the 90's and evolved organically during the past 3 decades and it is still one of the pillars of any interaction with ROOT. However, 30 years of backward-compatible evolution naturally resulted in a large and somewhat intimidating interface, often featuring outdated programming practices and sometimes unintuitive behavior, especially for new users. RFile is ROOT's new experimental interface to ROOT files, designed to take advantage of the evolution of both C++ and general programming practices to expose a more robust and succint API that covers almost all use cases of TFile while foregoing implicit object ownership.Speaker: Giacomo Parolini (CERN) -
215
Transparent integration of RNTuple into FairRoot
FairRoot integrates RNTuple and allows users seamless transition to ROOT’s novel I/O backend, resulting in significant performance gains and file size decrease.
This contribution details the incorporation of RNTuple into the FairRoot framework. The RNTuple novel columnar data storage is applied subsequently to experiment simulation and data reconstruction and allows for comparison between the legacy TTree- and RNTuple-based data storage and retrieval. The adoption of RNTuple within FairRoot particularly enhances multithreaded Geant4 simulations. Previously, parallel runs generated per-thread output files requiring post-processing merging. RNTuple supports concurrent writing from multiple threads to a single output file, removing post-processing merging overhead.
The integration leaves the FairRoot workflow unmodified, giving users immediate benefits in data processing speed and storage footprint with minimal code modifications.
Speaker: Radoslaw Karabowicz (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
-
210
-
Track 7 - Computing infrastructure and sustainability
-
216
Exploiting LHCb trigger CPU and GPU resources as Analysis Facilities
The LHCb collaboration relies on powerful GPU and CPU clusters for real-time data processing, but these resources can be idle outside data-taking periods. While the trigger CPU farm has already been used for offline processing, specifically for Monte Carlo production, no efforts have been made to repurpose these resources for physics analysis, including ML training and inference.
Through a collaboration between the CERN IT SWAN team and LHCb, we have adapted the LHCb trigger resources to be exploitable as Analysis Facilities, extending computational capacity for LHCb analysts and the SWAN service.
This presentation describes the preparation and execution of a one-year pilot project demonstrating the technical feasibility of this approach. In particular, it highlights how LHCb resources were integrated to extend SWAN’s Kubernetes cluster, providing users with a seamless and efficient analysis environment. The pilot serves as a prototype for a future production-ready Analysis Facility, paving the way for more flexible, scalable, and effective use of LHCb computing resources.
Speakers: Diogo Castro (CERN), Eduardo Rodrigues (University of Liverpool (GB)) -
217
Auto-Scaling GPU Inference as a Service for HEP on HPC at NERSC
Machine learning (ML) models are increasingly central to High Energy Physics (HEP) workflows, spanning simulation, reconstruction, and analysis. In parallel, large language models (LLMs) are being adopted for documentation, software development, and workflow orchestration. While training typically relies on institution-specific resources, production deployment of these models poses a growing challenge, as most HEP experiments lack sufficient local access to high-end GPUs.
This presentation describes a scalable inference service deployed at the National Energy Research Scientific Computing Center (NERSC) to address this gap. The system provides a shared inference infrastructure that decouples model deployment from experiment-local resources while enabling on-demand GPU provisioning on HPC systems. It consists of a network entry point for client requests, a load balancer, and a monitoring framework deployed on the NERSC SPIN platform, with GPU-backed inference services running on NERSC GPU nodes using the NVIDIA Triton Inference Server.
To bridge the separation between SPIN and GPU resources, we introduce a resource management service that dynamically allocates GPU nodes, reports resource availability to the load balancer, and enables automatic scaling based on monitoring metrics. We demonstrate that this architecture supports sustainable, scalable inference for HEP ML workloads by dynamically matching GPU allocation to demand, including both experiment-specific models and open-weight LLMs, enabling efficient reuse of shared HPC resources.
Speaker: Xiangyang Ju (Lawrence Berkeley National Lab. (US)) -
218
Operating a Large GPU HPC Farm With a Small Team: Lessons from the ALICE EPN Project
The ALICE Event Processing Nodes (EPN) farm is a high-density GPU HPC system designed primarily for real-time reconstruction of 50 kHz Pb-Pb collisions during LHC Run 3. It is the largest computer farm at CERN in terms of compute capacity. Comprising 350 nodes and 2800 GPUs, with a peak performance of ~42 PFLOP/s single precision, the HPC infrastructure has been operated throughout Run 3 by a dedicated team of 2 to 3 individuals at a time. This contribution presents the organisational, technical, and architectural choices that enabled this 24/7-supported, high-reliability, low-maintenance operational model.
The team operates the full stack of the HPC environment: electrical and cooling systems, networking, servers and GPUs, firmware management and orchestration layers. Automation of provisioning, configuration management, monitoring, and recovery procedures ensures reliable operations with minimal manual intervention. The software stack: OS and driver management, InfiniBand-based high-throughput data distribution,orchestration tools, and integration with experiment and detector control systems; all provide a base for reliable operations and a separation between online and asynchronous processing modes, and maintenance and testing environments. These design principles enable sustainable operation by reducing manpower requirements and optimizing hardware utilization.
The talk summarizes lessons learned from several years of continuous operation of a physics-critical, high-throughput, GPU-accelerated HPC facility and highlights principles applicable to other large-scale scientific computing facilities aiming for sustainability and low operational overhead.
Speaker: Dr Lubos Krcal (CERN) -
219
Dynamic provision of GPUs at HPC centers for HEP
The upcoming High-Luminosity Large Hadron Collider (HL-LHC) era will present significant computational challenges, demanding a substantial increase in data processing for the WLCG experiments at CERN. To meet these needs the WLCG is exploring strategies for resource optimization. This includes a paradigm shift towards heterogeneous hardware, recognizing that GPUs are superior to CPUs for certain applications especially if they are highly parallelizable. A key strategy is the opportunistic on-demand integration of High-Performance Computing (HPC) centers, which host vast pools of GPUs. This approach allows for dynamic, on-demand access to powerful resources, avoiding the cost of hosting and maintaining the GPU infrastructure as well as inefficient utilization during low demand time periods.
New developments in this direction are the successful integration of GPU resources from the HoreKa HPC center at KIT, part of Germany's National High Performance Computing Alliance (NHR). Using orchestration tools like COBalD and TARDIS, which now support CPU and GPU allocation, we have demonstrated the capability to run ATLAS and CMS workloads on HoreKa's powerful GPU partitions. Initial tests, including centrally submitted workflows and user analysis jobs, have confirmed the technical feasibility of this approach.Speaker: Nikita Shadskiy (KIT - Karlsruhe Institute of Technology (DE)) -
220
From CPU-Centric to Accelerator-Aware: Operational Deployment of MIG and vGPU Partitioning in WLCG
The traditional WLCG computing model has been optimised for high-throughput processing of large numbers of small, independent pp-collision event workloads. This CPU-centric paradigm matched naturally with homogeneous multi-core nodes, where resources could be presented as uniform job slots to Grid middleware. As WLCG sites increasingly deploy modern GPUs, and HEP generator, simulation, and reconstruction teams develop GPU-enabled workflows, long-standing assumptions about resource structure, scheduling, and workload behaviour no longer hold.
GPU architectures introduce fundamentally different concurrency patterns, memory hierarchies, and sharing limitations compared with CPUs, raising operational challenges for distributed resource provisioning. To explore these issues in practice, we report on the integration of GPUs supporting Multi-Instance GPU (MIG) technology at RAL-LCG2. MIG allows a single physical GPU to be partitioned into multiple isolated compute instances, and similar vGPU mechanisms in new GPU classes provide comparable capabilities. We describe the commissioning steps, the exposure of MIG instances to batch and Grid interfaces, and early validation using ATLAS and other exploratory HEP workflows.
In this talk, we highlight the core operational issues encountered when introducing GPUs into an ecosystem historically optimised for CPU-only workloads. We then present concrete pathways for improving GPU integration across WLCG, including clearer resource-advertising conventions, accelerator-aware scheduling policies, and workflow adaptations designed to exploit MIG/vGPU-partitioned resources efficiently. These insights aim to support the community as it transitions toward heterogeneous compute infrastructures while maintaining compatibility with established WLCG operational practices.
Speaker: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory) -
221
A user-level network overlay to enable offloading of scientific payloads from cloud-native interfaces
The development of the ecosystems for high energy physics analysis is experiencing a strong push towards the exploration of cloud-native frameworks, especially for what is considered the most interactive and plotting based “last-mile”. Along with the increasing adoption and R&D around ML-based algorithm, these are opening a request for ways to extend a Kubernetes cluster over a range of existing resources that, as matter of fact, are remote and most likely managed by a batch system of any sort (SLURM, HTCondor etc.). One of the main limitations for different models that try to address this issue is that many of the frameworks may rely on Kubernetes internal pod network to orchestrate and manage the workflows. When executed remotely (especially on a supercomputer) those containers, as per default of singularity/apptainer runtime, share the host network namespace and are only allowed to perform user-level operations, preventing the creation of any sort of network interfaces.
In this presentation, we will show our experience with offloading, with InterLink, a pod execution from production-level Kubernetes clusters to a EuroHPC center like Leonardo at CINECA. All of that while preserving the pod overlay network for workflow coordination tasks, leveraging Linux Kernel network namespaces. We will show how some of the adopted frameworks (like Ray, Kubeflow, Argo workflows) can leverage such a solution, expanding the range of possibilities for a distributed exploitation of non-Kubernetes resources through a Kubernetes orchestration.
Speakers: Daniele Spiga, Diego Ciangottini (INFN, Perugia (IT)), Giulio Bianchini (Universita e INFN, Perugia (IT)), Lucio Anderlini (Universita e INFN, Firenze (IT)), Massimo Sgaravatto (Universita e INFN, Padova (IT)), Mauro Gattari (INFN (National Institute for Nuclear Physics)), Mirko Mariotti (Universita e INFN, Perugia (IT)), Rosa Petrini (Universita e INFN, Firenze (IT))
-
216
-
Track 8 - Analysis infrastructure, outreach and education: Analysis Facilities
-
222
The CERN Analysis Facility: Consolidation, Evolution and Strategy
CERN IT started providing the capability of an Analysis Facility (AF) in late 2023, initially as a pilot. The AF supports columnar workloads through RDataFrame and Coffea within SWAN, CERN’s web-based analysis environment. Dask provides the computing backend, managing concurrent resources from the CERN batch farm.
Since then, the AF has evolved beyond the pilot phase. The latest developments focus on integrating data management and access tools, such as Rucio and XCache. Significant effort is now employed in developing a unified deployment model for Jupyter-based facilities, like SWAN and the ESCAPE VRE; this includes monitoring and storage solutions, and is in partnership with other European research infrastructures, such as the Einstein Telescope. Work also continues on enabling access to established CERN platforms and resources through the AF, for example REANA, the machine learning ecosystem at CERN (ml.cern.ch), and studies on the use of HPC centres, which provide an increasing fraction of WLCG resources. This contribution will outline the current status, collaborations, and next steps in the evolution of the AF service.Speaker: Giovanni Guerrieri (CERN) -
223
Lessons learned running Integration Challenge at CMS Coffea-Casa facility
As part of the IRIS-HEP software institute effort and U.S. CMS activities, the Coffea-Casa analysis facility team has executed an Integration Challenge. One goal of this challenge was to demonstrate a full CMS analysis running on the facility and to integrate the IRIS-HEP software stack into a production environment. We describe the solutions deployed at the facility to support and execute the challenge tasks.
The Nebraska facility provides more than 2,000 cores for fast-turnaround, low-latency analysis for analysts. To achieve the highest event-processing rates, multiple scaling backends were evaluated, including HTCondor and Kubernetes resources, using both Dask and TaskVine schedulers. This setup also enabled a comparison of two Dask-cluster management services—Dask LabExtension and Dask Gateway—under demanding conditions.
A robust set of XCache servers with a redirector, previously deployed and tested during the 200 Gbps Challenge, were used to cache CMS Integration Challenge datasets and reduce wide-area network traffic. In addition, the Integration Challenge explored several approaches for delivering skimmed physics data, including the use of different analysis frameworks and data formats. This required enabling multiple storage solutions at the facility, such as S3, and evaluating ServiceX for data skimming—a data-delivery system for high-energy physics designed to provide fast access to large datasets stored in ROOT and other HEP-specific formats.Speaker: Oksana Shadura (University of Nebraska Lincoln (US)) -
224
Purdue Analysis Facility: An Interactive Platform for HL-LHC Era Analyses at CMS
The Purdue Analysis Facility (Purdue AF) is an interactive, Kubernetes-based computational platform that provides CMS researchers with a comprehensive set of tools and services for end-to-end development and execution of physics analyses. It serves both as a primary development environment for ongoing CMS Run 3 analyses and as a sandbox for testing novel software and data infrastructure solutions under realistic conditions. Purdue AF is also used as a testbed for HL-LHC benchmarking campaigns such as the IRIS-HEP Integration Challenge.
In this presentation, we will describe the key design decisions that make Purdue AF a robust, scalable, and observable platform from both user and admin perspectives. We will discuss the cloud-native technologies used to deploy Purdue AF on Kubernetes, analysis software distribution using Pixi, and in-depth monitoring using the Grafana observability suite. We will also demonstrate how Purdue AF integrates with LLM-enabled IDEs, leveraging generative models and agentic tools to assist with code development and debugging.
Speaker: Norbert Neumeister (Purdue University (US)) -
225
Offloading CMS detector performance analysis with RNTuple and RDataFrame on an Analysis Facility
The upcoming High-Luminosity phase of the LHC will significantly increase the computational demands of CMS detector performance studies, particularly for workflows that process multi-year datasets and explore high pile-up conditions. In this context, modern data formats and scalable analysis paradigms are essential. This contribution presents an upgrade of a representative CMS detector performance workflow - a Tag-and-Probe study for the Drift Tubes (DT) muon system - leveraging the state-of-the-art RNTuple I/O backend together with distributed execution through RDataFrame, on a distributed Analysis Facility. The work is integrated within the computing ecosystem of the Italian “National Center for High-Performance, Big Data and Quantum Computing” (ICSC). Offloading the workflow is made possible through interLink, a technology that enables heterogeneous resources to be integrated as virtual nodes within the Kubernetes facility orchestration. The combination of RNTuple’s improved columnar data access and RDataFrame’s high-level programming model enables the use of multi-year datasets and significantly accelerates time-to-analysis. The resulting trends of DT segment hit multiplicity versus integrated and instantaneous luminosities can be used to study detector aging and to extrapolate detector performance to extreme pile-up conditions. Preliminary results show meaningful gains in execution time and scalability, demonstrating a viable approach for future CMS performance analyses in the HL-LHC era.
Speaker: CMS Collaboration -
226
ServiceX Update
ServiceX is an experiment-agnostic service that extracts columnar data from HEP datasets at scale. Its Python SDK enables researchers to efficiently access complex experimental data by implementing best practices for large-scale dataset processing. Users submit requests using high-level query languages, which generate code that executes within experiment-approved container images, with horizontal scaling provided by Kubernetes.
This presentation will provide an overview of ServiceX’s technical architecture and focus on key developments since CHEP 2024 in Kraków. Major advances include: significant improvements in reliability and performance at scales exceeding hundreds of terabytes; native support for experiment-specific processing frameworks; RDataFrame integration; and enhanced features for production deployments. We will present Purdue’s deployment as a case study, demonstrating how ServiceX preprocesses large datasets into efficient caches for typical analyses, and how those can be shared and stored via EOS for longer term use by analysis teams. Finally, we will discuss recent efforts to lower barriers to entry through improved documentation and onboarding workflows.
Speaker: Benjamin Galewsky (Univ. Illinois at Urbana Champaign (US)) -
227
Point, Click, Analyze: Enabling Zero-Install Data Exploration with Browser-Based Tools and GitLab CI at LHCb
Modern HEP analysis workflows are becoming increasingly complex and challanging. For LHCb, with its expanded Run 3 data volumes and growing analysis user base, reducing these barriers has become essential for efficient physics output. More recently, LHCb has moved to a declarative system for allowing analysts to filter datasets on WLCG resources for further analysis, known as "Analysis Productions". This system enables researchers to use Continuous Integration workflows to develop and validate analysis pipelines at scale. However, quick inspection and validation of the resulting filtered datasets traditionally required users to download data locally or maintain complex grid authentication, creating friction in the analysis cycle.
This talk presents LHCb's deployment of browser-based analysis tools that eliminate installation requirements while maintaining security and performance. We describe our implementation of JSROOT for interactive ROOT file visualisation directly in web browsers, enabling users to quickly make histograms from Analysis Productions outputs. For more complex checks, a JupyterLite deployment provides a full Python analysis environment with popular HEP libraries running entirely client-side via WebAssembly, allowing quick initial data exploration without local environment setup or dedicated infrastructure.
Central to these capabilities is our EOS token-based authentication system, which generates time-limited, scope-restricted URLs that provide secure data access without exposing long-lived credentials. This same mechanism provides GitLab CI pipelines with limited data access for automated validation, testing, and continuous analysis workflows without credential sprawl or security compromises.
The deployment has successfully reduced friction in Analysis Productions workflows, with browser-based inspection replacing local downloads for initial validation tasks. CI integration enables automated checks without credential management overhead. Performance is sufficient for exploratory analysis, though large-scale work continues on traditional compute infrastructure.
This demonstrates that token-based authentication combined with browser-native tools can substantially improve analysis accessibility. The approach requires minimal infrastructure—static file hosting for JupyterLite/JSROOT and token-capable storage—making it practical for other experiments to adopt.Speaker: James Connaughton (University of Warwick (GB))
-
222
-
Track 9 - Analysis software and workflows
-
228
Optimizations and New Strategies for Native ROOT Data Loading for ML
Machine learning (ML) techniques are increasingly adopted in the High Energy Physics (HEP) field from large-scale production workflows to end-user data analysis. As such, we see datasets growing in size and complexity, making data loading a significant performance bottleneck, particularly when training workloads access large, distributed datasets with sparse ML reading patterns.
In HEP, such datasets are commonly stored in the ROOT data format. ROOT provides a native data loading tool designed to integrate seamlessly with ML training workflows, exposing data in batches suitable for model training. However, the data access patterns typical of ML applications can interact suboptimally with ROOT's storage layouts which are optimized for compression and throughput. Additionally, practical training scenarios often involve imbalanced datasets which can bias models toward majority classes if not properly handled, a bias that can be mitigated through several resampling techniques.
In this contribution, we investigate I/O optimization opportunities and new strategies for ROOT’s native data loading tool. We evaluate the impact of these optimizations through performance benchmarks on representative open-data use cases.
Speaker: Silia Taider (CERN) -
229
ML-based Flavour Tagging for LHCb Run 3
Flavour tagging (FT) is essential in heavy-flavour physics for determining the production flavour of neutral B mesons in time-dependent CP-violation and mixing parameter measurements, where it significantly impacts the sensitivity. For Run 3 of the LHC, the LHCb experiment has redesigned its FT strategy, exploiting recent advances in algorithm methodology and machine learning, including modern deep neural-network architectures. This contribution presents the status of the Run 3 FT developments
Speaker: Borja Sevilla Sanjuan (La Salle, Ramon Llull University (ES)) -
230
Measurement of Quantum Correlations in $Z \to \tau^+\tau^-$ at DELPHI Using Machine Learning
Precision studies of $\tau^+\tau^-$ production in $e^+e^-$ collisions at LEP provide a clean environment for investigating spin correlations and quantum information observables. In the DELPHI experiment, the process $e^+e^- \to Z \to \tau^+\tau^-$ is well measured, but reconstruction of the $\tau^+\tau^-$ rest frame is challenged by the presence of multiple neutrinos in the final state. This limits the precision of spin-dependent measurements and quantum correlation studies.
We present a diffusion-based generative approach for reconstructing the $Z \to \tau^+\tau^-$ rest frame from detector-level inputs. The method performs conditional generation of neutrino momenta using visible objects and kinematic constraints, producing event-level kinematic hypotheses that can be used for further analysis. We demonstrate that this approach enhances the resolution of $\tau$-pair kinematics, enabling the accurate reconstruction of spin-sensitive observables. This work demonstrates a scalable strategy for multi-neutrino reconstruction at colliders and provides a computational foundation for quantum information studies using archived LEP data.
Speaker: Ting-Hsiang Hsu (National Taiwan University (TW)) -
231
Foundation Model for Physics Analysis at BESIII
While Foundation Models have revolutionized natural language processing and computer vision, their potential in high-energy physics remains underutilized. In this work, we introduce Bes3T, a Transformer-based Foundation Model tailored for BESIII data analysis, and present a publicly released benchmark Monte Carlo dataset comprising 100 distinct $\mathrm{J}/\psi$ decay channels. Bes3T employs a hybrid self-supervised pre-training strategy: masked modeling is used to capture fine-grained inter-particle correlations, while contrastive learning is integrated to enforce robust global event-level representations, thereby significantly enhancing accuracy in downstream classification tasks. We demonstrate that Bes3T achieves competitive performance on multi-class classification, matching the fidelity of conventional physics-motivated analysis techniques without relying on explicit physics priors or labor-intensive event selection cuts. Furthermore, the learned representations exhibit strong versatility, extending effectively to other downstream tasks such as clustering, anomaly detection and event reconstruction, as well as to other high-energy physics experiments beyond BESIII.
Speaker: Jingde Chen (Institute of High Energy Physics) -
232
Re-discovery of $Z_c(3900)$ at BESIII Based on Quantum Support Vector Machine (QSVM)
Quantum Machine Learning (QML) is an advanced data analysis technique, which can detect data structures, building models to achieve data prediction, classification, or simulation, with less human intervention. However, for data analysis of high-energy physics (HEP) experiments, the practical viability of QML still remains a topic of debate, requiring more examples of real data analysis with quantum hardware for its further verification. Based on this background, our research focuses on the application of QML in the re-discovery of $Z_c(3900)$, which was first observed by BESIII collaboration in 2013 while analyzing the decay process of $Y(4260)$. The dataset is high-quality and supported by a mature analysis framework, making it well-suited for testing new approaches such as QML.
Using the same $525 \ \mathrm{pb}^{-1}$ data collected at $\sqrt{s} = 4.26 \ \mathrm{GeV}$, this study applies Quantum Support Vector Machine (QSVM) method to event selection criteria, using classical cut-based and ML-based analysis strategy as references. To evaluate the impact of the realistic hardware environment, the analysis will also be performed on real quantum hardware via cloud services. A 1-D fit will be applied to the selected dataset to extract the parameters of $Z_c(3900)$ in order to evaluate the selection efficiency. The invariant mass distribution will then be plotted and compared with the results of traditional analysis.
Speaker: Siyang Wu (Shandong University) -
233
Future-Ready Restoration: A Case Study on AI RAG-Enhanced Agentic Revival of a Run-2 2016 Λb → Λγ Analysis
We present a prototype Retrieval-Augmented Generation (RAG) and agentic LLM tool designed to accelerate and support high-energy physics analyses. As a case study, we applied the system to the published 2016 Λb → Λγ Run 2 analysis. Reproducing legacy workflows is often slow and error-prone due to fragmented code, dispersed documentation, personnel turnover, and software evolution over multiple data-taking periods. Our tool integrates semantic retrieval, code-understanding capabilities, and autonomous agent actions to navigate these challenges.
The system ingests heterogeneous knowledge sources—including archived analysis scripts, notes, published documentation, and the modern LHCb software stack—and uses a vector database to build a unified retrieval layer. Built on general-purpose LLMs (e.g., Claude, GPT-x), the assistant can propose analysis strategies, generate or repair scripts, and autonomously perform several reconstruction and analysis tasks. In internal tests, the prototype demonstrated the potential to compress workflows that normally require months or years of manual effort.
We discuss the tool’s architecture, agentic capabilities, and safeguards, as well as its current limitations. Verification is ongoing to evaluate reliability, detect errors, and benchmark performance against non-RAG LLM baselines. This early case study highlights the promise of RAG-augmented, agent-driven systems in restoring, verifying, and accelerating complex HEP analyses, pointing toward future end-to-end AI-assisted analysis pipelines.Speaker: Cilicia Uzziel Perez (La Salle, Ramon Llull University (ES))
-
228
-
19:00
Welcome Reception
-
-
-
Plenary
-
234
Beyond Code Generation: Building an LLM Ecosystem for Physics Analysis
Large Language Models (LLMs) are increasingly used in particle physics as coding agents, but their role is expanding from software assistance to building the scientific analysis workflow itself. This talk examines how LLMs can function as connective elements across the stages of a modern high-energy physics analysis, from dataset discovery and metadata retrieval to analysis specification, plotting, and possibly workflow orchestration. We survey emerging applications beyond code generation, briefly highlighting operational and experiment-support use cases, and then focus on analysis as a structured, multi-step process with distinct opportunities for automation. Using open-source agent interfaces and composable Model Context Protocol (MCP) tools, we discuss how reusable “skills” can expose experiment services in a controlled and deterministic way, enabling LLMs to generate validated plotting and analysis workflows rather than ad hoc snippets. We discuss open questions around termination criteria, validation, and the boundary between framework and agent, and outline the requirements for a field-wide shared ecosystem—common interfaces, skill libraries, and evaluation practices—that supports rigorous, reproducible deployment of LLMs for physics in the HL-LHC era.
Speaker: Gordon Watts (University of Washington (US)) -
235
Towards AI scientists: Human-AI Collaboration in the Future of HEP
The integration of AI, especially Large Language Models (LLMs) and autonomous agents, is reshaping the way data-intensive research is conducted in HEP. This talk presents a vision of this transformation through Dr.Sai, a pioneering LLM-powered multi-agents system developed at BESIII. Dr.Sai interprets physicist's natural language queries, autonomously decomposes them into subtasks (e.g. data skimming, fitting), orchestrates scientific tools, and executes full physics analysis workflow with high reliability, full traceability, and reproducibility. It provides a practical and scalable blueprint for what an "AI scientist" could look like in real experimental environments.
Building on this experience, we will present IHEP's strategic roadmap for "AI+HEP", including a new AI-centric computing platform and the establishment of an open collaboration aimed at generalizing this paradigm across multiple HEP facilities. Finally we will discuss how AI can evolve from a productive tool into an active partner in scientific reasoning, outlining a human-AI collaboration to acclerate discovery in HEP and beyond.
Speaker: Ke LI -
236
Good AI for Physics Needs AI Ethics and Good AI Ethics Needs Physics
Artificial intelligence is rapidly becoming inextricable from physics research, with growing attention now turning to the semi-autonomous roles AI agents might play in scientific discovery. Yet many of the measurements and evaluations underpinning AI research lack the rigor and reliability typically expected to support knowledge production in physics. In this talk, I will explore the risks that flawed system and model design, mismeasurement, and misaligned metrics pose to the epistemic foundations of physics and to broader society. I will also present recent research and initiatives aimed at bringing physics and scientific methodology to bear on AI development, public technical literacy, and sociotechnical systems design.
Speaker: Savannah Thais (Hunter College)
-
234
-
10:30
Break
-
Plenary
-
237
AI and Cybersecurity: Power, Risk, and Responsibility
This talk explores the security implications of adopting AI in operational environments, with a strong focus on the often-underestimated risks of data leakage, model misuse, and over-trust in automated outputs. It emphasizes why human review and human-in-the-loop controls remain essential, specially when AI systems are integrated into security-critical workflows. The talk also introduces how AI is reshaping the threat landscape, from attacker-driven automation and social engineering at scale to the impact of aggressive scraping on data ownership and model behavior. AI can be a broad force multiplier, including in security, when guided by human judgment and used appropriately.
Speaker: Jose Carlos Luna Duran (CERN) -
238
Panel Discussion: Opportunities and Responsibilities for Using AI in Particle Physics
This panel discussion will allow CHEP 2026 attendees to hear different perspectives on how Artificial Intelligence technologies and tools can impact researchers in high energy and nuclear physics. As will be discussed in the plenary and parallel program of the conference, AI is already impacting the way HEP researchers write code; formulate analysis workflows; make hardware purchasing decisions; operate systems; and disseminate results. This panel aims to discuss open questions across these areas and issues that researchers should consider given the rapid growth of AI in science and industry
Speaker: Harris Tzovanakis (CERN)
-
237
-
12:30
Lunch
-
Track 1 - Data and metadata organization, management and access: Data integrity, storage reliability and evaluation
-
239
ODGDFS: A High-Performance User-Space Object Storage Engine for HEP Data Challenges
Currently High Energy Physics (HEP) faces increasingly severe data storage challenges. Next-generation particle collider experiments are expected to generate unprecedented data volumes and acquisition rates, demanding continuous I/O capabilities with sub-milliseconds PB/s-level throughput. Traditional kernel-based file systems, burdened by context switching, interrupt handling, and heavy metadata overheads, struggle to fully unleash the performance potential of emerging NVMe SSD hardware, becoming a critical bottleneck in experimental data processing and analysis pipelines.
To address this, we present ODGDFS, a user-space object storage engine optimized for HEP data access patterns, originating from the JwanFS project at IHEP. Built upon the SPDK Blobstore, the system minimizes software stack overhead by completely bypassing the OS kernel and employing lock-less polling I/O with a lightweight metadata architecture. Its core innovations include:
- Flat Metadata Organization: Designed a custom superblock-backed metadata scheme that utilizes in-memory hash indexing to achieve O(1) complexity for object localization, effectively eliminating the overhead of multi-level directory lookups found in traditional file systems;
- Zero-Copy Tail Cache: Proposed a tail cache aggregation mechanism to optimize small-scale asynchronous write patterns common in HEP experiments, significantly reducing write amplification while boosting sequential write throughput;
- Stream Decoupling & Lazy Loading: Implemented the logical decoupling of data and index streams alongside a lazy-loading architecture, maintaining efficient memory and CPU utilization while supporting thousands of concurrent data volumes.
Preliminary benchmark based on rigorous stress testing has confirmed the system's stability and correctness under high-concurrency simulated workloads. it is foreseeable that when handling typical HEP workloads, ODGDFS will demonstrate significant improvements in I/O throughput and stable low-latency performance, providing a scalable and efficient storage solution for managing massive datasets in future large-scale experimental data centers.
Speaker: zhuo meng (Institute of High Energy Physics) -
240
Storage tests for the German Tier-2 WLCG contribution with the NHR HPC cluster Emmy: ATLAS use case
German computing sites play a vital role in the Large Hadron Collider (LHC) job processing and data storage as part of the Worldwide LHC Computing Grid (WLCG). The storage and computing contributions of university-based Tier-2 centres in Germany are transitioning to the Helmholtz Centres and National High Performance Computing (NHR) sites, respectively, to meet the growing data and computational demands of the High-Luminosity LHC (HL-LHC). This transition requires scalable strategies for data storage, data distribution, and their integration with computing centres. Research towards this transition is conducted in Goettingen, where both the Tier-2 WLCG centre GoeGrid and the NHR HPC cluster Emmy are located. The GoeGrid batch system is extended with containers, allowing Emmy nodes to act as virtual worker nodes for LHC job processing. Since large local mass storage is not planned at NHR centres for WLCG operations, alternative storage stolutions are being evaluated. Two promising approaches are being tested: pre-caching with a small storage instance of the order of 100 terabytes at GoeGrid, and direct data access via WAN to the Helmholtz Centres DESY and GridKa. Their performance have been benchmarked on Emmy using the current WLCG workflow management system. While results are promising, ongoing studies are exploring potential bottlenecks to enable scalability and applicability at other computing centres without large local storage. The concepts, performance results, and limitations of these approaches are presented, offering insights into future LHC computing.
Speaker: Inga Katarzyna Lakomiec (Georg August Universitaet Goettingen (DE)) -
241
Enhanced Data Integrity for Reliable WLCG Third-Party Copy Transfers
Large-scale scientific collaborations such as WLCG need reliable and secure data transfers that optimize the available bandwidth and resources of the grid. HTTP-based third-party copy (TPC) transfers follow a de-facto community standard for moving files directly between storage endpoints (peer-to-peer). Here we report on an extension to that standard promoting improved data integrity through implementation of the IETF RFC 3230 (Instance Digest) standard for end-to-end checksum verification. The protocol is designed to integrate transparently with existing TPC workflows, enabling automatic digest negotiation and validation without affecting current operations.
Early implementations of this updated protocol include storage backends such as EOS and CERN Tape Archive (CTA) and transfer orchestration via FTS. Adopting this new standard introduces its own challenges, such as computing checksums on the fly during large-scale transfers and ensuring consistent validation across heterogeneous storage systems. At the same time, it creates opportunities, including stronger token-based security, improved verification mechanisms, and better reproducibility of distributed workflows.
Finally, in this work we explore the lifecycle of this new development, from the original idea to ongoing work and to future plans to settle it as the default data integrity mechanism for LHC Run 4.
Speaker: Hugo Gonzalez Labrador (CERN) -
242
Data Integrity and Recovery System for Distributed Storage in ALICE
Author: Andreea Prigoreanu (University Politechnica Bucharest)
on behalf of the ALICE collaborationThe processing of ALICE experiment data relies on high-quality and reliable storage. The central file catalogue serves as the database that tracks over 2.6 billion files and their locations across more than 50 storage elements on the ALICE Grid. It is essential that the physical storage contents remain consistent with the catalogue to prevent errors when users or jobs access data and to avoid data loss. One of the solutions ALICE currently uses to monitor the consistency is a distributed file crawler that periodically evaluates samples of files from each storage element to gather statistics about the number of corrupted or inaccessible files. While effective for detecting inconsistencies, the method is constrained by its sampling-based nature, and addressing the issues requires manual investigation and intervention on the storage itself.
EOS is the most widely deployed storage system across the ALICE disk-based storage sites, managing approximately 275 PB of the total 360 PB of ALICE disk-based data. By leveraging the powerful internal checking tools provided by EOS, the ALICE-wide EOS Integrity and Recovery System is a new solution to address data integrity for files stored on EOS instances throughout the ALICE Grid. It complements the existing strategy by collecting and analyzing error information from EOS FSCK reports, accessed through the HTTP interface available in recent EOS versions. This approach enhances data monitoring across ALICE storage systems, providing a comprehensive view of storage health for EOS deployments. In addition to identifying issues and generating file consistency reports, the Integrity and Recovery System automates the recovery process by invoking the experiment's recovery procedures to reconcile the contents of the storages with the central file catalogue.
I will present the system's key aspects: storage configuration requirements, automated FSCK report retrieval and analysis, the decision algorithm for flagging files requiring recovery, and the integration with the recovery procedures.Speaker: Andreea Prigoreanu (IT-SD) -
243
Design and Implementation of an AI-Driven Disk Failure Prediction System for Large-Scale HEP Storage Clusters
In high energy physics (HEP) experiments, large-scale storage clusters typically comprise tens of thousands of disks, and their reliability is essential for continuous data acquisition, processing, and long-term preservation. Traditional rule-based disk failure detection approaches are increasingly insufficient for such environments due to heterogeneous device types, complex workload patterns, and dynamically changing operational behaviors. To address these challenges, this paper presents the design and implementation of an AI-driven disk failure prediction system tailored for large-scale HEP storage clusters.
The system leverages SMART low-level telemetry and introduces a quasi-online feature selection mechanism capable of automatically identifying key indicators strongly correlated with disk failures, thereby enabling lightweight and scalable online feature updates. By combining historical failure statistics with real-time operational metrics through a multidimensional threshold fusion strategy, we develop an intelligent prediction model capable of forecasting disk failures on a daily basis. Deployment in a production environment with over ten thousand disks demonstrates that the system significantly improves early failure detection and reduces unplanned storage downtime.
The system adopts a microservice architecture with standardized RESTful APIs, supports elastic deployment via Docker and Kubernetes, and integrates seamlessly with Prometheus-based monitoring stacks. These capabilities enable automated inference, anomaly alerting, and system state visualization. Experimental results show that the proposed AI-based disk failure prediction system achieves strong prediction accuracy, real-time responsiveness, and scalability, providing a practical and effective solution for enhancing reliability in future exabyte-scale HEP storage infrastructure.
Speaker: LI Haibo lihaibo
-
239
-
Track 2 - Online and real-time computing
-
244
ATLAS Event Filter muon reconstruction for the HL-LHC
The High Luminosity Large Hadron Collider (HL-LHC) is scheduled to begin operation in 2030 and will increase the number of proton-proton collisions per bunch-crossing from around 60 to 200. The upgraded trigger system of the ATLAS experiment will record around 10kHz of the collisions to disk for physics analysis and this reduction is achieved with an L0 trigger that will feed the Event Filter (EF) at a rate of up to 1MHz. An important signature in the EF is high transverse momentum muons. The HL-LHC conditions require significant improvements to the existing muon reconstruction algorithms used in the EF. In this talk we will present developments for the ATLAS EF muon reconstruction, including the use of Machine Learning algorithms. The performance of the updated software is tested in simulated samples and on ATLAS run-3 data. Significant improvements in the speed of the algorithms are seen, while maintaining high efficiency to select muons.
Speaker: ATLAS Collaboration -
245
Filtering hits for speeding up track reconstruction at hadron colliders
Trigger systems enable to quickly inspect the reconstructed physical quantities obtained from collisions at hadron colliders, in order to decide whether to save the corresponding detector data for offline analysis. The processing of the data coming from pixel detectors is a crucial challenge for the experiments running at the Large Hadron Collider (LHC) at CERN, because of the large number of secondary collisions per bunch crossing, so-called pile-up vertices, which give rise to extremely high hit occupancies. Track reconstruction is a combinatorial problem for which the processing time strongly depends on the average pile-up per event; considering the future accelerator-complex upgrade to the High-Luminosity LHC, the computational cost of the current trigger strategies is expected to exceed the available computing resources. To address this issue, a new approach to assist the track reconstruction by filtering out unnecessary pile-up hits is presented and characterized. The algorithm is based on a convolutional neural network (CNN) architecture, which can be easily deployed on accelerator cards, with the goal of receiving as input a 2D representation of signal and pile-up hits overlaid and return as output an image with only signal hits. Training and testing the algorithm on a independently generated synthetic dataset with signal tracks in the range of 20 to 50 GeV, we show a background rejection factor of order 1500 for the 99\% efficiency working point, hence proving the potential of this approach in terms of both the physics performance and the computational gain.
Speaker: Alessandro Zaio (INFN e Universita Genova (IT)) -
246
Real-Time event reconstruction for Nuclear Physics Experiments using Artificial Intelligence
Charged-particle track reconstruction is a central component of nuclear physics experiments, providing the foundation for identifying and analyzing particles produced in high-energy interactions. While traditional techniques—such as pattern-recognition algorithms and Kalman-filter–based tracking—have long been the standard, modern machine learning (ML) methods are increasingly addressing the challenges posed by complex detector geometries, high occupancies, and significant noise. Neural networks, graph neural networks (GNNs), and recurrent architectures have demonstrated improved accuracy, robustness, and scalability by learning directly from simulated and experimental data. These models can classify and select track candidates, resolve ambiguities from overlapping or missing hits, and predict full particle trajectories, all with the potential to operate in near-real-time. As computational capabilities advance, ML-driven tracking is becoming a transformative component of large-scale experiments, from the LHC to Jefferson Lab.
In this talk, we present recent progress in AI-enhanced charged-track identification within the CLAS12 detector, where machine-learning methods deliver significant gains in usable statistics over conventional reconstruction. We also demonstrate real-time event-reconstruction capabilities, including fast inference of particle momentum, direction, and species identification at data-acquisition speeds. These developments enable physics observables to be extracted directly from the experiment in real time, opening new paths toward high-precision and high-throughput nuclear science.Speaker: Gagik Gavalian (Jefferson National Lab) -
247
Development of streaming data reconstruction for ePIC experiment at EIC
The Electron-Ion Collider (EIC) will introduce new paradigms in large-scale nuclear physics experiments. With luminosities reaching up to 10³⁴ cm⁻²s⁻¹, the ePIC experiment must process extremely large data volumes and therefore adopts a flexible, scalable, and efficient streaming data acquisition model. This system replaces custom level-1 trigger electronics, enables the use of commercial computing resources, achieves virtually deadtime-free operation, and allows detailed, software-based event selection.
In the streaming readout scheme, continuous detector signals are digitized and filtered by FPGAs from O(100 Tbps) to O(10 Tbps) and assigned precise time stamps (Echelon-0).
These signals are aggregated into “time frames,” representing the full detector signal view within a given time window. These time frames are transferred to online computing farms for prompt reconstruction and software event filtering (Echelon-1). In Echelon-1, data reduction from O(10 Tbps) to O(10 Gbps) is required. EICrecon is the software framework used to achieve this.
Filtered data are subsequently streamed to distributed computing facilities worldwide (Echelon-2) for full reconstruction.We present the development of streaming reconstruction for the ePIC experiment, focusing on event-selection algorithms implemented in the EICrecon framework. Performance studies on efficiency, background rejection, purity, tracking, and vertexing will be discussed.
Speaker: Takuya Kumaoka (University of Tsukuba (JP)) -
248
Regional reconstruction of tracks for the ATLAS Event Filter using GPU accelerators
The upcoming high-luminosity phase of the LHC (HL-LHC) presents several challenges for the ATLAS experiment's Trigger and Data Acquisition system, necessitating a
full upgrade of the system. A key challenge for the Event Filter, where high-level event reconstruction and final event selection will run at 1 MHz, lies in the computational demand for online track reconstruction within the Inner Tracker in selected regions
of interest at the full trigger rate and of the full tracker acceptance at 150 kHz. Over the past few years, extensive research has been conducted into utilising hardware accelerators in the ATLAS Event Filter system to improve tracking throughput and reduce
full-system power consumption. Various end-to-end track reconstruction pipelines have been developed using GPUs and FPGAs. These pipelines demonstrate their capabilities by offloading different amounts of the computing load to the accelerators.
This contribution focuses on developments and optimizations for GPU-based track reconstruction in regions of interest. The scaling of throughput and latency with the size and occupancy of regions of interest has been studied for GPU-based tracking pipelines
originally designed for efficiently reconstructing full tracker acceptance simultaneously. Different approaches for improving the utilization of the GPU resources for the smaller regions are presented and compared to the full acceptance algorithms as well
as the CPU counterparts.Speaker: ATLAS Collaboration
-
244
-
Track 3 - Offline data processing: Tracking 2
-
249
Memory Layouts in the ALICE Track Reconstruction
In high performance computing, we strive for algorithms on large arrays to be as performant as possible. However, the performance of such an algorithm is also affected by the memory layout of these arrays. The most natural memory layout is Array-of-Structures (AoS), which performs well for strided access patterns and for large classes. On the other hand, Structures-of-Array (AoS) allows for efficient vectorization upon sequential access.
Switching between different memory layouts usually requires significant changes to the surrounding code. Thus, as part of CERN’s “Next Generation Triggers” project, we have implemented two lightweight C++ libraries to abstract the memory layout from the algorithms operating on it. The first approach uses C++17 template metaprogramming while the second approach uses “reflection”, a new C++26 metaprogramming feature.
The goal of both approaches is to incur zero runtime overhead. Thus, we present benchmarks comparing these approaches to hard-coded memory layouts. Moreover, we use the second approach to show the impact of memory layouts on the performance of the ALICE O2 TPC track reconstruction on CPU and GPU.
Speaker: Dr Oliver Gregor Rietmann (CERN) -
250
Forward decay chain reconstruction with displaced vertices inside intense magnetic fields
Precise reconstruction of particle decay chains is an essential tool for a wide range of analyses in particle physics experiments, particularly those focused on flavour dynamics and CP violation. We present a novel decay tree reconstruction framework designed to handle complex topologies with deeply constrained particle decays, trajectory extrapolations over long distances inside regions with intense and inhomogeneous magnetic fields, and limited momentum resolution. Implemented in C++, the framework is modular, extensible, and optimised for both configurability and performance. It has been integrated into the LHCb software environment with the primary purpose of reconstructing decay chains featuring very displaced vertices located between 2.5 and 8 metres from the LHC primary proton-proton interaction point, but it is flexible enough to handle any type of decay chain. While developed with LHCb in mind, it is suitable for any experiment with a forward geometry. We benchmark its performance using simulated data, demonstrating large improvements in reconstruction efficiency, invariant mass resolution, and vertex position resolution compared to currently existing tools. This framework paves additional possibilities for precision measurements and searches in present and future accelerator experiments.
Speaker: Izaac Sanderswood (Univ. of Valencia and CSIC (ES)) -
251
ACTS Integration for ATLAS Phase-II Track Reconstruction
The ATLAS experiment is undertaking a major modernisation of its reconstruction software to meet the demanding conditions of High-Luminosity LHC (HL-LHC) operations. A key element of this effort is the use of the experiment-independent ACTS toolkit for track reconstruction, which requires a major redesign of several parts of the current ATLAS software. This contribution will describe the ACTS integration work, with a focus on the inner tracker and muon reconstruction, and will present the expected track-reconstruction performance, highlighting the improvements gained by adopting the ACTS toolkit.
Speaker: Noemi Calace (CERN) -
252
First functional prototype for the reconstruction of charged particle tracks using machine learning in the ATLAS experiment at the HL-LHC
The High-Luminosity LHC (HL-LHC) will bring large increases in collision rate and pile-up. This represents a significant surge in both data quantity and complexity. In addition to excellent physics performance, a high computational efficiency is critical to fully exploit the HL-LHC
datasets. In response, substantial R&D efforts in machine learning (ML) have been initiated by the ATLAS collaboration to develop faster and more efficient algorithms capable of managing this deluge of data.
Charged particle tracking is the most computationally costly aspect of the reconstruction of data from the ATLAS detector. We present the first functional prototype of an ML-based track reconstruction algorithm for the ATLAS experiment at the HL-LHC. It is fully integrated into the
software stack of the ATLAS collaboration (“athena”), and can be run on heterogeneous GPU clusters via a technique we call “tracking-as-a-service”.Charged particle reconstruction is performed using a graph neural network, combined with custom algorithms for high-throughput graph generation and graph segmentation. The functional prototype that deploys this pipeline is the result of a sustained and coordinated R&D
effort over the past seven years. After a brief summary of the physics performance, we report a standardized suite of metrics.Speaker: Jan Stark (Laboratoire des 2 Infinis - Toulouse, CNRS / Univ. Paul Sabatier (FR)) -
253
Self-distillation of Reusable Sensor-level Representations for High Energy Physics
Liquid argon time projection chambers (LArTPCs) provide dense, high-fidelity 3D measurements of particle interactions and underpin many current and future neutrino and rare-event experiments. Event reconstruction typically relies on complex detector-specific pipelines that use tens of hand-engineered pattern recognition algorithms or cascades of task-specific neural networks that require extensive, labeled simulation.
We introduce Panda, a training paradigm that learns reusable sensor-level representations directly from raw unlabeled TPC data. Panda couples a hierarchical sparse 3D encoder with a multi-view, prototype-based self-distillation objective. On a simulated dataset, we show that Panda substantially improves label efficiency and reconstruction quality, beating the previous state-of-the-art semantic segmentation model with 1,000$\times$ fewer labels. We also show that a single set-prediction head 5% the size of the backbone with no physical priors trained on frozen outputs from our pre-trained network can result in particle identification that is comparable with state-of-the-art reconstruction tools.
We introduce this model and training method as a step towards general purpose sensor-level foundation models for high energy and nuclear physics.
Speaker: Sam Young
-
249
-
Track 4 - Distributed computing
-
254
A generalized Workflow Manager for Continuous Gravitational-wave search
We describe a set of tools developed to ease the execution of large computing campaigns over multiple and different computing resource providers. The tool suite has been adopted to perform the All-sky Continuous GW search on the data of the fourth LIGO-Virgo-KAGRA Observation cycle (O4), running CPU payloads on the IGWN Grid, INFN-CNAF, ICSC Grid (based on HTCondor , with different behaviors/capabilities) and GPU jobs on HPC resources (CINECA/Leonardo HPC, based on Slurm). Several features have been implemented to enable a uniform job submission pattern: input data retrieval from every possible source, checkpointing support where available, resilient output file delivery, automatic resubmission of failed jobs, support for all needed authentication/authorization methods such as X509, voms-proxy, scitokens and IAM tokens. Furthermore, a secure method to provide jobs with fresh credentials when needed has been implemented. An internal accounting system has also been put in place, which allows for custom metric collection; thanks to these data an accurate profiling of how efficiently the different computing resources are used is possible, which enables further configuration tuning. The tool proved to be robust, reliable, and general enough to perform unattended, with only minimal routine checks for maintenance. It has been adopted to run several other searches from within the Virgo Rome CW Group, by simply modifying a few configuration files. We outline a description of the main components of the tool, how these have been configured to run an All-sky search, how the different resource providers performed, and what possible improvements are foreseen.
Speaker: Stefano Dal Pra (INFN) -
255
Aligning DIRAC Workflows with CWL: A Unified and Reproducible Workflow Model for Grid-Scale Computing
Delivering reproducible computational workflows across heterogeneous and distributed computing infrastructures remains a significant challenge for many scientific communities. Workflow standards such as the Common Workflow Language (CWL) offer a portable and declarative means to describe complex pipelines but their integration into large-scale, data-driven workload management systems remains an open and evolving area.
DIRAC is a workload and workflow management system used by scientific collaborations to operate distributed computing resources spanning grids, clouds, and high-performance computing systems. While DIRAC provides mature mechanisms for job scheduling, data management, and large-scale productions, it has historically relied on a combination of DIRAC-specific workflow descriptions expressed through Python APIs, XML payloads, and Job Description Language (JDL) files. This fragmentation complicates interoperability with external workflow tools and limits the reuse of workflows outside the DIRAC ecosystem.
In this paper, we present the current state of the integration of CWL into DIRAC as a unified workflow specification. Rather than using CWL as a simple submission or translation layer, we progressively align the DIRAC workflow model with CWL semantics. CWL is used to express workflow structure, execution steps, resource requirements, and containerized execution environments, while DIRAC retains responsibility for data handling and large-scale execution on distributed resources. This work is conducted in the context of DiracX, the next-generation evolution of DIRAC, and builds on early technical exchanges with CWL maintainers.
We report on the implementation status and initial feedback from scientific communities experimenting with CWL-based workflows within DIRAC. These early results highlight both the benefits and the remaining challenges of operating CWL workflows at grid scale. They also illustrate how adopting a standard workflow language can improve portability, interoperability, and reproducibility across distributed computing environments.
Speaker: Ryunosuke O'Neil (CERN) -
256
IceCube Takes Flight with Pelican - A First Experience
After a long delay and false starts, the IceCube Neutrino Observatory has removed GridFTP and x509 certificate authentication. We have migrated to using the Pelican Platform, the Open Science Data Federation, and WLGC Tokens. While this is a common solution, we required several customizations to work with our existing data warehouse structure and make it easier for scientists to use. We wrote a custom WLCG token issuer to support our POSIX filesystem with custom user and group permissions across the entire filesystem. We also made several modifications to HTCondor to ease job submission using tokens, including a custom credmon. After an initially bumpy transition due to several now-resolved Pelican issues. The Pelican-based system has already proven superior to GridFTP in several ways.
Speaker: David Schultz (University of Wisconsin-Madison) -
257
Evolving CRIC for the HL-LHC Era: Supporting WLCG and experiment Operations
The Computing Resource Information Catalogue (CRIC) is a central element of the WLCG information ecosystem and a key operational tool for ATLAS Distributed Computing, providing authoritative, experiment-oriented views of sites, services, data-management endpoints and configuration parameters across distributed infrastructures. In preparation for HL-LHC, CRIC has undergone a major evolution: a full migration from a legacy stack to a modern Python 3 implementation, with the creation of FWKWEB — a lightweight web framework that now serves as the foundation for CRIC and future web services. This evolution enabled a cleaner modular structure around the CORE-CRIC components and VO-specific plugins, significantly improving maintainability, performance, and flexibility. New developments include enhanced interfaces for pledge and accounting data, Functional Test–driven automation in ATLAS-CRIC, extended topology and network metadata in WLCG-CRIC, and a prototype for unified FTS configuration. This contribution outlines the evolution of CRIC architecture, its growing role as a shared backbone for WLCG and experiment operations, and the roadmap ensuring its scalability and sustainability towards the HL-LHC era.
Speaker: Panos Paparrigopoulos (CERN) -
258
Workflow Architecture and Implementation for Simulation Data Production in HERD
The High Energy cosmic Radiation Detection facility (HERD) is a long-term space-based high-energy physics experiment onboard the China Space Station, expected to produce large and heterogeneous datasets, including flight data, simulation data, and multi-version reconstructed data. To efficiently support large-scale computing and long-term physics analysis, a unified data management and workflow system, HERD DOM(HERD Dataflow Management and Operation Monitoring ), is under development.
This contribution presents the workflow-driven automation of simulation data production in HERD. A visual DAG-based workflow engine is employed to orchestrate simulation tasks, enabling an end-to-end automated pipeline covering parameter configuration, job submission, distributed resource scheduling, data validation, distributed storage registration, and metadata cataloguing. The workflow system is tightly integrated with local clusters environments, and simulation data are managed through Rucio, ensuring full traceability, reproducibility, and scalability of the data production process.
A prototype of the simulation workflow system has been deployed and is operating stably, supporting large-scale automated simulation production, job monitoring, and data management. The workflows for flight data processing, calibration data management, and integrated operational monitoring have been fully designed and will be progressively validated with real data in the next development stages. This work provides a practical solution for large-scale data processing in long-running space-based high-energy physics experiments.Speaker: qi luo (中科院高能物理所计算中心)
-
254
-
Track 5 - Event generation and simulation: Event Generation 1
-
259
Hardware Acceleration of NLO Event Generation with MadGraph4GPU and PEPPER
The High-Luminosity LHC will reach unprecedented precision in the measurements of key observables in proton-proton collisions. To accurately predict the rates of such collision events, the simulation of the hard scattering event must include higher-order corrections, in particular Next-to-Leading Order (NLO) terms in the perturbative expansion of the cross section.
The computational complexity of evaluating these terms contributes significantly to the overall CPU hours spent by ATLAS and CMS experiments, and they are projected to surpass the available budgets in the near future. This calls for speeding up NLO event generation by a more efficient use of available, and new, hardware, such as datacentre GPUs and vector CPUs.
MadGraph5_aMC@NLO and SHERPA are event generation frameworks heavily used by ATLAS and CMS. Recently, MadGraph4GPU and PEPPER were introduced, which offload the key bottlenecks of LO calculations in both frameworks to hardware accelerators, respectively, achieving considerable speed-ups. Both tools are being validated for deployment on the Worldwide LHC Computing Grid and on HPC resources.
In this contribution we will present the latest results and plans concerning the hardware acceleration of NLO event generation in MadGraph4GPU and PEPPER.Speaker: Daniele Massaro (CERN) -
260
New GPU developments in the Madgraph CUDACPP plugin: kernel splitting, helicity streams, cuBLAS color sums
The first production release of the CUDACPP plugin for the Madgraph5_aMC@NLO generator, which speeds up matrix element (ME) calculations for leading-order (LO) processes using a data parallel approach on vector CPUs and GPUs, was delivered in October 2024. This was described at CHEP2024 and in other previous publications by the team behind that effort. In this CHEP2026 contribution, I present my work on some additional developments and optimizations of CUDACPP, mainly but not exclusively for GPUs. A detailed description of this work is available in https://arxiv.org/abs/2510.05392. The new approach, which represents a major restructuring of the CUDACPP computational engine, primarily consists in splitting the ME calculation, previously performed using a single large GPU kernel, into many smaller kernels. A first batch of changes, involving the move to separate “helicity streams” and the optional offloading of QCD color sums to BLAS, has already been merged into a new CUDACPP release, in collaboration with my colleagues. Since then, I have completed a second batch of changes, involving the possibility to split the calculation into groups of Feynman diagrams in separate source code files, which has also been submitted as a pull request. This new feature makes it possible to compute QCD matrix elements for physics processes with a larger number of final state gluons: in particular, I present the first performance results from CUDACPP for the $2\!\rightarrow\!6$ process $gg\!\rightarrow\!t\bar{t}gggg$ on CPUs and GPUs and the $2\!\rightarrow\!7$ process $gg\!\rightarrow\!t\bar{t}ggggg$ on CPUs, which involve over 15k and 230k Feynman diagrams, respectively.
Speaker: Andrea Valassi (CERN) -
261
Event Generation Acceleration on AI Engine Cores: A Case Study
The generation of hard-scattering events in high-energy physics, is one of the computational bottlenecks in collider phenomenology. MadGraph provides a flexible framework to evaluate these matrix elements, but the sheer scale of Monte Carlo event production required at the LHC drives both execution time and power consumption to critical levels. In this work, we explore the use of Adaptive Compute Acceleration Platforms (ACAPs) and, in particular, their AI Engine (AIE) cores to accelerate the evaluation of matrix elements for the different processes We design and map the helicity-amplitude and color-summation structure of the computation onto clusters of AIE cores, exploiting both vectorized arithmetic and dataflow pipelining across tiles. Preliminary results indicate that the AIE-based implementation can significantly reduce latency while offering superior power efficiency compared to CPU and GPU architectures. While the complexity of multi-leg processes presents challenges for full FPGA acceleration, our study demonstrates the viability of AIE-based event generation as a scalable approach for next-generation Monte Carlo simulations at the LHC.
Speaker: Pelayo Leguina (Universidad de Oviedo (ES)) -
262
Generative AI for hadron physics
Modern accelerator facilities operating at the intensity frontier—such as CERN, Jefferson Lab, and the forthcoming EIC—produce petabyte-scale datasets that probe the structure of visible matter at the femtometer scale. Fully exploiting and preserving this information requires new AI-driven strategies for data analysis and modeling. We present a program to develop Machine-Learning-based Physics Event Generators (MLEGs) using state-of-the-art Generative Adversarial Networks (GANs) and Diffusion Models trained directly on scattering data.
The goals of this work are threefold: (i) reproduce reaction dynamics through data-driven generative modeling; (ii) unfold detector effects to recover underlying truth-level distributions; and (iii) enable model-independent analyses that yield new insights into the mechanisms of elementary scattering processes. A rigorous uncertainty-quantification framework is incorporated to characterize data, model, and generative uncertainties. This international collaboration between experimental and theoretical physicists, as well as computer scientists, aims to demonstrate how generative AI—specifically, GANs and Diffusion Models—can transform data analysis and long-term data preservation in nuclear and particle physics.
In this contribution, I will present the general framework and its validation in inclusive and semi-inclusive deep-inelastic electron–proton scattering and in exclusive hadron photoproduction.Speaker: Marco Andrea Battaglieri (INFN e Universita Genova (IT)) -
263
Real-Time Dynamics in a (2+1)-D Gauge Theory: The Stringy Nature on a Superconducting Quantum Simulator
Understanding the confinement mechanism in gauge theories and the universality of effective string-like descriptions of gauge flux tubes remains a fundamental challenge in modern physics. We probe string modes of motion with dynamical matter in a digital quantum simulation of a (2+1) dimensional gauge theory using a superconducting quantum processor with up to 144 qubits, stretching the hardware capabilities with quantum-circuit depths comprising up to 192 two-qubit layers. We realize the Z2-Higgs model (Z2HM) through an optimized embedding into a heavy-hex superconducting qubit architecture, directly mapping matter and gauge fields to vertex and link superconducting qubits, respectively. Using the structure of local gauge symmetries, we implement a comprehensive suite of error suppression, mitigation, and correction strategies to enable real-time observation and manipulation of electric strings connecting dynamical charges. Our results resolve a dynamical hierarchy of longitudinal oscillations and transverse bending at the end points of the string, which are precursors to hadronization and rotational spectra of mesons. We further explore multi-string processes, observing the fragmentation and recombination of strings. The experimental design supports 300,000 measurement shots per circuit, totaling 600,000 shots per time step, enabling high-fidelity statistics. We employ extensive tensor network simulations using the basis update and Galerkin method to predict large-scale real-time dynamics and validate our error-aware protocols. This work establishes a milestone for probing non-perturbative gauge dynamics via superconducting quantum simulation and elucidates the real-time behavior of confining strings.
Speaker: Enrique Rico Ortega (CERN)
-
259
-
Track 5 - Event generation and simulation: Full Simulation 1
-
264
Geant4 electromagnetic physics for future experiments at colliders
In this presentation we review recent updates in Geant4 electromagnetic (EM) physics sub-libraries in view of Run4 and other collider experiments. The evolution of EM sub-libraries is performed in order to make code more robust, compact, and compatible with requirements of Run4 detectors at LHC and other future collider experiments. A significant role in this respect is taken by the G4HepEm sub-library, which is external to Geant4 but fully compatible and may be used as an alternative to the default EM configuration. Results with this library are with high accuracy in statistical agreement with the default EM physics. A new development on simulation of EM physics in bending crystals will be discussed as well as application of these methods to various components of particle accelerator. Additionally, EM physics group develop new approaches for simulation of X-Ray production processes and X-Ray interaction with elements of machine and their surfaces. A new design for Cerenkov and Scintillation processes is introduced to the recent Geant4 version 11.4. A new method how combine condense history standard processes and track structure code for the Geant4-DNA project will be presented.
Speaker: Prof. Vladimir Ivantchenko (CERN) -
265
Full-chain Simulation and Data-driven Refinement of the BESIII CGEM Detector Response
To meet the requirements of enhanced radiation tolerance and sustained tracking performance, the BESIII inner tracker has been upgraded to a Cylindrical Gas Electron Multiplier (CGEM). We have developed a comprehensive simulation framework for the CGEM response, featuring a realistic digitization model refined with experimental data. The framework simulates the full signal-formation chain (ionization, drift, multiplication, induction, and readout) and explicitly incorporates detector structural effects such as GEM foil sectorization and support grids. Key model parameters have been tuned against both cosmic-ray and collision data, and computational efficiency has been optimized. The tuned model shows good agreement with measurements, providing a reliable basis for detector calibration, alignment, tracking studies, and subsequent physics analyses.
Speaker: Xinnan Wang (IHEP) -
266
New developments in the experiment-independent Gaussino simulation software and its use in LHCb
Gaussino is an experiment-independent HEP simulation code built on top of the Gaudi software framework. It provides generic components and interfaces for event generation, detector simulation, geometry, monitoring and output. In this talk we give an overview of recent developments in Gaussino, and some examples of their adoption in the LHCb Simulation since our previous report at CHEP2024. In the generation phase, a recent restructuring of components and interfaces allows an easier integration of Matrix Element generators like Powheg and MadGraph and is being ported to all other generators, notably including Pythia. The move to the new Gaudi infrastructure for user configuration, which is more flexible and robust, is being completed. Progress is also being made towards the integration and testing of the AdePT component, which offloads part of the detector simulation to GPUs. More generally, the separation of the experiment-independent and LHCb-specific components is being finalised, to ease the adoption of Gaussino by other HEP experiments.
Speaker: Wojciech Krupa (CERN) -
267
Muon simulation in JUNO
The Jiangmen Underground Neutrino Observatory (JUNO) is a 20-kt liquid-
scintillator neutrino detector in China, ~53 km from two nuclear power plant complexes. It aims to determine the neutrino mass ordering and precisely measure neutrino oscillation parameters, while enabling studies on solar, atmospheric, geoneutrino, and supernova neutrino physics. The detector construction was completed, and physics data-taking began in August 2025.
Cosmic-ray muons dominate muon-induced backgrounds in JUNO by producing MeV-scale secondaries and radioactive spallation products, including neutrons, in the low-energy region. The surrounding PMT instrumented water-Cherenkov veto (water pool veto) provides efficient muon tagging, enabling the suppression of background and the characterization of performance. With limited high-energy calibration sources, accurate muon simulation is also essential for developing and validating high-energy event reconstruction, including direction and energy reconstruction in atmospheric neutrino analyses.
In this contribution, we present the current status and recent developments of the muon simulation in JUNO. We summarize the full workflow starting
from the mountain overburden model and the MUSIC-based propagation of cosmic muons through rock, followed by muon tracking within the detector volume using a customized Geant4-based detector simulation. We further cover downstream steps
relevant to water pool detector response studies, including the simulation of optical photon production and transport, trigger and electronics effects, and the reconstruction and analysis of muon events in the water pool.Speaker: Jing Chen (Sun Yat-sen University) -
268
Integration and Performance of the ATLAS FastChain Workflow towards Run 4
FastChain is a key component of ATLAS preparations for Run 4, providing a unified, configurable framework that integrates simulation, reconstruction, and downstream data reduction into a single end-to-end workflow. By eliminating intermediate data formats and enabling tight coupling between workflow stages, FastChain improves resource utilization efficiency and reduces disk I/O.
To improve scalability, the unified FastChain workflow incorporates and benchmarks advanced fast-simulation and reconstruction components, such as Fast Track Simulation (Fatras) and FastCaloSim. Ongoing R&D focuses on improving fast-simulation accuracy and robustness under Run-4 conditions using the Geant4 fast-simulation interface and simplified detector descriptions.
We report on ongoing efforts to integrate FastChain into the ATLAS production system, focused on establishing an optimal single end-to-end processing configuration to replace the traditional multi-step production chain. Current studies focus on benchmarking and characterizing CPU, memory, and I/O usage across both Grid and HPC environments, supported by emerging tools for workflow-level monitoring and performance analysis of the different combinations of fast-simulation and reconstruction techniques.
Together, these developments demonstrate how FastChain serves not only as a workflow consolidation effort but also as a flexible platform for integrating and validating fast-simulation technologies at scale. This approach is expected to play a central role in enabling sustainable and performant ATLAS production workflows for Run 4 and beyond.
Speakers: Martina Javurkova (University of Massachusetts (US)), Rui Wang (Argonne National Laboratory (US))
-
264
-
Track 6 - Software environment and maintainability: Ecosystems, Collaboration, and WorkflowsConveners: Arantza De Oyanguren Campos (Univ. of Valencia and CSIC (ES)), Gaia Grosso (IAIFI, MIT)
-
269
Building International Research Software Collaborations in Physics
Due to their scale, complexity and cost, large physics/astrophysics projects are very often international “team-science” endeavors. These scientific communities have been learning how to build collaborations that build upon regional capabilities and interests over decades, iteratively with each new generation of large scientific facilities required to advance their scientific knowledge.
While much of this effort has naturally focused on collaborations for the construction of hardware and instrumentation, software is now a critical element to design and maximize the physics discovery potential of large data intensive science projects. To fully realize their discovery potential a new generation of software algorithms and approaches is required. Building these research software collaborations is challenging and inherently international, matching the international nature of the experimental undertakings themselves.
We present the work of the HSF-India project to implement new and impactful research software collaborations between India, Europe and the U.S. The experimental scope of this project is relatively broad, aiming to bring together researchers across facilities with common problems in research. Beyond pursuits of general interest, HSF-India has initiatives specifically for LHC experiments, DUNE, EIC/ePIC, Belle-II and LIGO. By exploiting national capabilities and strengths, established mutual benefits of the HSF-India international collaboration have been fostered through hackathons, experiment-specific workshops and a training network. Together these have enabled early-career researchers to pursue impactful research software initiatives in ways that advance their careers in experimental data-intensive science. In this presentation, we will describe the scope of the HSF-India initiative, its mechanisms for fostering new collaborations, ways for interested research groups to get involved, and project accomplishments to date.
Speaker: Peter Elmer (Princeton University (US)) -
270
The European Virtual Institute for Research Software Excellence (EVERSE)
The EU-funded EVERSE project aims to establish a framework for research software and code excellence, collaboratively designed and championed by five European research communities, including physics and astronomy.EVERSE’s ultimate ambition is to contribute towards a cultural change where research software is recognized as a first-class citizen of the scientific process and the people that contribute to it are credited for their efforts.
This contribution will outline the network's goals and present the achievements of the network so far, specifically the foundation of a European Network of Research Software Quality and the steps toward a future Virtual Institute for Research Software Excellence, including the creation of the EVERSE Network.A major project output is the Research Software Quality Toolkit (RSQKit - https://everse.software/RSQKit/), which contains curated best practices, supporting tools, and resources for improving research software quality. RSQKit is designed for researchers, research software engineers, and those involved in research infrastructure or policy. Its practices are grounded in software excellence and quality within a research context, with a focus on FAIR software, Open Research, and effective engineering practices across different tiers of research software (analysis scripts, prototype tools, and infrastructure).
Finally, the contribution will detail how the EVERSE project interacts with the European Open Science Clusters (ENVRI-FAIR for environmental sciences, Life Sciences RI, ESCAPE for Particle physics and astrophysics, PaNOSC for Photon and neutron science, and SSHOC for social sciences and humanities) through real-world software use cases. These use cases test the software excellence framework, drawing best practices from various communities, and where the various elements of the software excellence framework are tested and implemented prior to their release to the wider community. Specifically, we will describe the three ESCAPE use cases for High Energy Physics, highlight the field’s contributions to EVERSE, and outline the expected pathway for improvements by the project's conclusion in 2027.
Speaker: Caterina Doglioni (The University of Manchester (GB)) -
271
HEP Packaging Coordination: Distributing the HEP software ecosystem on conda-forge
The packaging of high energy physics software with robust, yet flexible, distribution methods is a complicated problem that has been met with multiple approaches by the community. The HEP Packaging Coordination community project expands packaging of the HEP software ecosystem through building and distributing language-agnostic conda packages on the conda-forge package index. Through use of the conda-forge community build cyberinfrastructure, computing platform specific optimized builds of packages can be created for selections of Linux, macOS, and Windows across x86-64, AArch64/ARM64, and ppc64le architectures. In addition to supporting builds of ROOT, this work provides multi-platform packaging of a wide array of low-level-language phenomenology tools, the broader simulation stack, end-user-analysis tools and statistical frameworks, and the reinterpretation ecosystem. Ongoing work is also supporting builds of LHCb experiment software and distributions of community software with experiment-specific patches applied for use in LHC physics analyses.
This process significantly lowers technical barriers across tool development by providing automatic packaging systems with source code, distribution through secure and transparent build cyberinfrastructure, and enables use through multi-platform optimized binary builds. When combined with next generation scientific package management and manifest tools, the creation of fully specified, portable, and trivially reproducible multi-language software environments becomes easy and fast, even with the use of development platforms for hardware accelerators (e.g. CUDA on NVIDIA GPUs). This talk provides an overview of the work, gives practical recommendations for adoption and best practices for both software maintainers and end-user analysts, and demonstrates examples of new distribution methods that are complementary to existing community technologies, such as CernVM-FS.
Speaker: Matthew Feickert (University of Wisconsin Madison (US)) -
272
Franklin - Your passport to running custom code on the grid
The LHCb Analysis Productions system provides a large scale, centralised, and reproducible framework for executing analysis workflows on the grid using officially released LHCb software. However, some analyses require prototyping or development of custom modifications to core packages, which cannot easily be deployed within the standard release cycle. It is therefore desirable to enable customised software stacks to be executed within the Analysis Productions framework while retaining reproducibility and fair usage.
Franklin addresses this need by leveraging the lb-dev suite of tools to allow analysts to make modifications across LHCb software projects and build them in a consistent, reproducible environment. Through GitLab merge requests tightly integrated with Analysis Productions, the testing and release of these custom packages is managed systematically: CI pipelines first verify that the modified projects build successfully, after which Analysis Productions can be run on the resulting test artifacts to validate physics output and workflow behaviour. Once these checks pass, the custom build is merged and deployed to CVMFS under a versioned Franklin namespace, enabling seamless use in full scale productions.
Full reproducibility is ensured through GitLab releases that record the exact project versions, modifications and metadata used for each build, mirroring what is done for official LHCb software. Automated daily tests further guarantee that Franklin remains compatible with the latest releases as core software evolves.
Franklin enables rapid development of customised software, providing clearly tagged and reproducible releases, and supports robust output validation and execution with Analysis Productions.
Speaker: George Hallett (University of Warwick (GB)) -
273
NOvA Software and Computing: Evolution, Lessons, and Impact; Supporting a Decade of Physics
The NOvA experiment has delivered world-leading neutrino physics results over ten years, enabled by an evolving software and computing infrastructure that has adapted to major technical transitions while maintaining operational stability. This talk discusses how NOvA has integrated modern AI/ML workflows into traditional HEP pipelines and balanced innovation against the demands of continuous physics production.
NOvA's computing evolution demonstrates the importance of flexible architectures that accommodate new methodologies, from traditional reconstruction algorithms to machine learning-based event classification, without disrupting ongoing physics programs. The experiment's experience highlights strategies for planning infrastructure when future requirements cannot be fully anticipated, managing technical debt while pursuing innovation, and maintaining continuity across framework transitions.
This talk will present representative examples of NOvA's computing evolution and discuss lessons applicable to other long-duration experiments navigating similar challenges in rapidly changing computing environments.
Speaker: Dr Gavin Davies (University Of Mississippi)
-
269
-
Track 7 - Computing infrastructure and sustainability
-
274
Toward an IPv6-native, cloud-native ATLAS site: Scalable production-ready grid storage on Kubernetes
The increasing computational scale and complexity of frontier scientific experiments, such as the ATLAS experiment at the Large Hadron Collider, continues to motivate a drive toward operational models that are resilient, automated, reproducible, and scalable. The University of Victoria (UVic) remains at the forefront of advancing cloud-native deployment patterns to address these challenges. Previous work established a cloud-native architecture for a complete ATLAS Tier 2 site on Kubernetes, including a functional prototype EOS storage element, but relied on a basic IPv4-only network design for the Kubernetes cluster. To overcome scalability and performance limitations associated with load balancing and software-defined routing in Openstack, and to satisfy ATLAS inter-site connectivity requirements, we designed a new cluster network architecture using direct-attached IPv6 addresses. We also improved performance, scalability, observability and robustness in the container network plane, and streamlined service routing, by switching to eBPF-based technology. Moreover, we migrated to an advanced load balancer capable of locality-aware address assignment, reducing latency and eliminating redundant lateral traffic flows within the cluster. Following these enhancements, we conduct an assessment of bandwidth scalability and benchmarks and demonstrate a significant performance optimization using the EOS shared filesystem redirection feature for direct CephFS access. Finally, we describe additional improvements to the EOS Helm chart, and the operational benefits of a fully containerized cloud-native deployment based on production experience.
Speaker: Ryan Taylor (University of Victoria (CA)) -
275
The successful removal of IPv4 from WLCG wide-area network links
The use of the networking protocol IPv6 on the Worldwide Large Hadron Collider Computing Grid (WLCG) storage is very successful and has been presented at earlier CHEP conferences. The campaign to deploy IPv6 on CPU services and worker nodes is going well. Dual-stack IPv6/IPv4 is not, however, a viable long-term solution; the ultimate goals include allowing WLCG sites to move completely to IPv6. Several WLCG sites have stated their wish to move soon to IPv6-only. We are close to being able to allow this, with the agreement of the experiments they support. We also continue to aim for all WLCG WAN traffic to use IPv6.
This paper reports on work since CHEP2024. Firstly, we report on the deployment of IPv6 on CPU services and worker nodes. Then, we present further work to identify and correct the use of IPv4 between two dual-stack endpoints. We then describe our plans and proposed timescales for moving WLCG wide-area network links on the LHCOPN and LHCONE networks to “IPv6-only”. Since September 2025, the Tier1 centres in the USA, followed by some other Tier1 centres have successfully removed IPv4 from their LHCOPN links to CERN with no major problems observed so far. We present plans for the remaining LHCOPN links including the aim to complete work on all links by the end of 2026 in time for the WLCG 2027 data challenge (DC27). Longer term the aim is to cease use of IPv4 on WLCG data transfers over LHCONE before the start of HL-LHC Run4. We present the steps required to make this possible.
Speaker: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES)) -
276
A phased phase-out of IPv4 at a WLCG Tier-1
The driver for phasing out IPv4 in the Nordic Tier-1 site (NT1, aka NDGF-T1) sooner rather than later is that we forsee a significant risk of running out of IPv4 addresses when scaling storage servers horizontally in order to handle the High Luminosity LHC (HL-LHC) data rates. We expect to have a data rate of 10-20 times when HL-LHC comes online in 2030, and the most cost-effective way to serve this is to have a larger number storage servers than today. And in order to prove that we are ready for HL-LHC data taking in 2030, it would be good to finish the bulk of the phase-out of IPv4 by Data Challenge 2027
This move comes with lots of constraints though. Since it is only "almost all" services that understand IPv6 today, we cannot completely shut IPv4 down without considerations on how the legacy systems can access data. There might also be unknown dependencies on IPv4 in access or management of services, that we will only detect in testing or production. Individual scientists might want to access the data outside of the grid, for instance from their own laptop which might not have IPv6 yet. There are even reasons that the physics experiments might want to run legacy software for reproducibility, some of it too old for IPv6 support.
Together this indicates a phased approach, and this presentation will talk about the current progress and the remaining plans, the tradeoffs we have had to make, and the progress towards the end goal of IPv6 only.
Speaker: Mattias Wadenstein (University of Umeå (SE)) -
277
Workflow-Aware Traffic Classification for HEP Data Movement
High-energy physics experiments routinely perform petabyte-scale file transfers across distributed grid sites while simultaneously streaming data for interactive analysis, making traffic type differentiation critical for network orchestration, bandwidth forecasting, and responsiveness to operational demands. We present a machine learning–based traffic classification system that requires no payload inspection and operates directly on raw packet headers. At its core is the Workflow Identification Window (WIW) abstraction, which groups packets from multiple flows into short temporal sequences, preserving timing gaps and directionality. These sequences are fed into deep neural models such as CNN and LSTM, removing the need for manual feature engineering and allowing automatic discovery of discriminative patterns. Using traffic collected between Fermilab and U.S. storage sites as our use case, our system achieves over 94% accuracy in controlled tests and maintaining 84% performance on previously unseen traces, demonstrating that tightly spaced bursts in early flow phases provide stable classification signals.
For HEP operations, this capability brings multiple benefits: first, it enables predictive bandwidth allocation; second, it supports early traffic shaping for bulk transfers; third, it ensures responsive prioritization of streaming sessions; and fourth, it lays the foundation for self-driving network services where workflows dynamically trigger adaptive QoS and routing. We are using the CMS experiment at the LHC as a use case, concentrating on U.S. sites that are connected via ESnet. ESnet's High-Touch service provides the essential packet-level visibility for capturing the fine-grained timing and flow patterns that enable our classification approach. For our CMS use case across U.S. ESnet-connected sites, this classification capability is the first step toward more efficient data distribution and analysis responsiveness. But the approach potentially has wide applicability to other HEP projects and beyond. In the long term, the approach aims at improving HEP-wide data movement performance by reducing transfer delays, balancing bandwidth utilization across distributed sites, and providing real-time workflow visibility to guide experiment planning and resource optimization.Speaker: Anna Giannakou -
278
Scitags at Scale: Terabit Packet Marking and WLCG Data Challenge Readiness
Research and Education Networks (RENs) transport vast amounts of scientific data, but gaining granular visibility into this traffic is difficult. Understanding the composition of this traffic is essential for enabling efficient network use, traffic steering, future provisioning, and capacity planning. Traditional network flow data offers only limited insight into the specific activities driving bandwidth. The Scitags (Scientific Network Tags) initiative addresses this by providing an open platform to identify the "owner" and "purpose" of science flows. Scitags uses a public registry of standardized identifiers and implements generic packet and flow marking to enable precise traffic accounting and visibility for overlay networks like LHCOPN and LHCONE.
In this presentation, we focus on the deployment status and technical evolution of the Scitags ecosystem, specifically regarding readiness for the upcoming WLCG Data Challenge. We will detail the current support within data management systems (Rucio, DIRAC, Alice), storage infrastructures (XRootD, EOS, Storm, dCache, Echo, Pelican), as well as the readiness of REN collectors and dashboards. A key highlight of the talk is the introduction of flowd-go, a newly developed packet marking service designed for high-performance environments. We present an overview of the packet marking approach, the details of the 1.1 Tbps WAN Network Research Exhibition (NRE) demonstration at the ACM/IEEE International Conference on High Performance Computing, Networking, Storage, and Analytics (SC25), and the prospects for implementing and using packet marking on production networks.
Speakers: Marian Babik (CERN), Tristan Sullivan (University of Victoria (CA))
-
274
-
Track 9 - Analysis software and workflows
-
279
A study of CMS analysis pipelines through the Integration Challenge
The upcoming High-Luminosity Large Hadron Collider (HL-LHC) at CERN will deliver an unprecedented volume of data for High Energy Physics (HEP). This wealth of information offers significant opportunities for scientific discovery, but its scale challenges traditional analysis workflows. In this talk, we present CMS analysis pipelines being developed to meet HL-LHC demands. These pipelines build on the broader scientific Python ecosystem, complemented by solutions specifically designed for HEP.
A central focus of the talk is the Integration Challenge, an IRIS-HEP led effort aimed at assessing the readiness of developed software stack to be used in real world physics analysis and improving the readiness of analysis facilities for the HL-LHC era. The Integration Challenge acts as an end-to-end integration test: by implementing a complete physics analysis pipeline, it evaluates tool interoperability and the overall user experience for analysts. The current pipeline includes columnar data processing, machine learning, statistical inference, and visualization tasks covering a variety of CMS analysis scenarios.
In addition, the Integration Challenge explores efficient strategies for delivering skimmed data using diverse tools and data formats, as well as evaluating the ServiceX data-delivery system for HEP analyses. Throughout the testing phase, we also investigated several prototype services—such as histogram-as-a-service capabilities—along with other emerging services that may support future HL-LHC analysis workflows.Speakers: Mohamed Aly (Princeton University (US)), Oksana Shadura (University of Nebraska Lincoln (US)) -
280
The ATLAS Integration Challenge: studying physics analyses towards the HL-LHC
The last few years have seen a wide range of developments towards scalable solutions for end-user physics analysis to meet the upcoming HL-LHC computing challenges. The IRIS-HEP software institute has created projects in a “Challenge” format to checkpoint the progress. The “Analysis Grand Challenge” probes analysis workflows and interfaces with a limited dataset size, while the “200 Gbps Challenge” focuses on throughput at large scale. A new Challenge has recently been created to complement these, combining aspects from both: the “ATLAS Integration Challenge”. It defines a physics analysis task that captures the scale of available ATLAS data and the complexity of ATLAS analysis needs.
This contribution provides an overview of the analysis task in the Integration Challenge and of the pipeline developed for it. The implementation features two stages. Starting from datasets in a lightweight format for ATLAS physics analysis called PHYSLITE, NTuples are produced on the WLCG using ATLAS CP algorithms. These NTuples are then further processed into histograms for statistical inference at the University of Chicago Analysis Facility using the Scikit-HEP ecosystem of libraries. The Challenge focuses on three aspects in particular, which will be discussed: ensuring feature completeness for the physics analysis task, quantifying computational needs and performance to identify and address bottlenecks, and providing solutions for rapid turnaround for physics analysis development.
Speakers: Alexander Held (University of Wisconsin Madison (US)), Artur Cordeiro Oudot Choi (University of Washington (US)) -
281
ServiceX in the ATLAS Integration Challenge: Data Delivery for HL-LHC Analyses
As the HL-LHC prepares to produce increasingly large volumes of data, the need for efficient data extraction and access services is growing. To address this challenge, the ServiceX toolset was developed to connect user-level analysis workflows to remotely stored datasets. ServiceX functions as a query-based sample delivery system, where client requests trigger Kubernetes-distributed workloads running at facilities with high-bandwidth connectivity to the WLCG. Additionally, ServiceX provides a simple user interface that leverages declarative syntax to define data extraction queries, enabled by a server-side architecture that includes code-generation and data-finder services. Modern analysis frameworks can use ServiceX as the first step in event selection, efficiently reducing file sizes and accelerating data access with minimal boilerplate. This talk presents the ServiceX toolset and demonstrates its use within a modern, full-scale ATLAS analysis pipeline from the IRIS-HEP Integration Challenge.
Speaker: Artur Cordeiro Oudot Choi (University of Washington (US)) -
282
Data Processing Challenges and Framework Solutions at the High Energy Photon Source
The High Energy Photon Source (HEPS) is a fourth-generation, high-energy synchrotron radiation facility scheduled to enter its early operational and commissioning phases by the end of 2025. With its significantly enhanced photon brightness and detector performance, HEPS is expected to generate over 200 petabytes (PB) of experimental data annually across 14 beamlines in Phase I, with data volumes rapidly approaching the exabyte scale. HEPS supports a wide variety of experimental techniques, including imaging, diffraction, scattering and spectroscopy, which produce data with highly diverse characteristics in terms of throughput, volume and latency. The increasing complexity of experimental methods introduces unprecedented challenges for large-scale data processing.
To address the future EB-scale experimental data processing demands of HEPS, we have developed DAISY (Data Analysis Integrated Software System), a general scientific data processing software framework. DAISY is designed to enhance the integration, standardization, and performance of experimental data processing at HEPS. It provides key capabilities, including high-throughput data I/O, multimodal data parsing, and multi-source data access. It supports elastic and distributed heterogeneous computing to accommodate different scales, throughput levels, and low-latency data processing requirements. It also offers a general workflow orchestration system to flexibly adapt to various experimental data processing modes. Additionally, it provides user software integration interfaces and a development environment to facilitate the standardization and integration of methodological algorithms and software across multiple disciplines.
Based on the DAISY framework, we have developed multiple domain-specific scientific applications, covering imaging, diffraction, scattering and spectroscopy, while continuously expanding to more scientific domains. Furthermore, we have optimized key software components and algorithms to significantly improve data processing efficiency. At present, several DAISY-based scientific applications have already been successfully deployed on HEPS beamlines, supporting online data processing for users. The remaining applications are scheduled for deployment according to the project plan, further strengthening the data analysis capabilities of HEPS.Speaker: Dr Yu Hu (IHEP, CAS) -
283
Framework for vectorized computations for the Final Daya Bay data release
The Daya Bay Reactor Neutrino experiment has released its full dataset of neutrino interactions with the final-state neutron captured on gadolinium, collected during 9 years of operation. The dataset was complemented by a model of the experiment in Python and a few analysis examples, reproducing the final measurement of neutrino oscillation parameters sin²2θ₁₃ and Δm²₃₂ with any of the four released file formats.
The model and the light version of the data are available as PYPI packages. A dedicated framework
dag-modellingis developed to provide sufficient numerical performance for a highly involved pythonic model. The framework implements lazily evaluated direct acyclic graphs with vectorized data and persistent caching. The nodes of the graph are boosted by usingnumba. Key highlights include scalable graph initialization with broadcasting based on multidimensional indexing, implemented with nested dictionaries; detailed graph annotation, separated from the initialization code; and seamless support of multiple input data formats.dag-modellingis an early-stage successor of GNA framework (C++), used by one of the Daya Bay groups internally.The talk covers the Daya Bay data release and the modelling requirements, the design of the framework, implementation of the Daya Bay model and performance indicators. The future of the support and development is discussed.
Speaker: Dr Maxim Gonchar (Joint Institute for Nuclear Research)
-
279
-
Poster
-
Track 1 - Data and metadata organization, management and access: FAIR data, metadata standards and preservation
-
284
The Paradox of persistence: Strategies for metadata versioning and FAIR publication in NAPMIX
The NAPMIX project aims to establish a cross-domain FAIR-compliant metadata schema for the Nuclear, Astro, and Particle (NAP) physics communities. A core challenge is reconciling the evolving nature of experimental metadata, enriched progressively from proposal through analysis, with the immutability required by Persistent Identifiers (DOIs) for findability and interoperability. This contribution presents the NAPMIX's architectural strategy for resolving this “Paradox of Persistence”: the tension between dynamic metadata and immutable DOIs.
Inspired by GANIL's DOI workflow, we evaluate two integration patterns widely discussed in the FAIR community: (A) Serialization, embedding NAPMIX metadata into standard DataCite fields; and (B) Decoupling, issuing separate DOIs for the metadata objects. To address their limitations, NAPMIX implements a "Live Metadata Layer" via a Django-based service.
This system distinguishes Static Infrastructure Metadata (e.g. accelerator configurations) from Dynamic Experimental Parameters (e.g., reaction kinematics), enabling granular versioning. Dataset DOIs reference immutable metadata snapshots, while the NAPMIX backend maintains an evolving canonical record tracked via stable Property IDs.
Finally, we outline the integration path with community catalogues such as OpenNP and SciCat, illustrating how versioned metadata objects can interconnect Digital Research Products across repositories . This approach ensures that, while data streams remain immutable, their descriptive metadata evolves in a controlled FAIR-compliant manner, benefiting researchers and institutions for cross-repository discoverability.
Speaker: Mr Ivan Knezevic (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE)) -
285
Multi-Layer Metadata Architecture for Cross-Community Metadata Management in PUNCH4NFDI
The PUNCH4NFDI consortium (Particles, Universe, NuClei and Hadrons for German National Research Data Infrastructure) comprises astro-, astroparticle, particle and nuclear physics—communities historically employing computationally-intensive research on big data. The data life cycles are characterized by worked out data curation practices; highly diverse metadata, being embedded in custom file headers, storage hierarchies, or analysis frameworks; and a lack of explicit machine-actionable representations of data-related domain knowledge. This limits cross-community data discoverability, and reuse. Achieving these while preserving compatibility with existing community standards necessitates innovative solutions for management of digital research outputs at the PUNCH4NFDI Science Data Platform (SDP) level as well as supporting the full cycle of FAIR (Findable, Accessible, Interoperable, Reusable) metadata curation.
The basic units to capture research outcomes in an interoperable and reproducible way are Digital Research Products (DRPs). By bundling digital resources with rich contextual and provenance metadata, DRPs are designed to enable reproducible and modifiable analyses across the federated PUNCH4NFDI infrastructure and support blind discovery, live analysis, and live peer review.
DRP metadata constitutes the central layer of PUNCH4NFDI multi-layer metadata architecture. It builds on the core DataCite-based layer, and extends DRP terms through the layer of discipline-specific metadata. Schema alignment activities define which concepts belong to the cross-community core and how they map to provider-specific conventions. The resulting hierarchical model enables consistent metadata reuse and full traceability across the research process.
We present PUNCH4NFDI SDP multi-layer metadata architecture and underlying services, covering integration of diverse use cases and outlining ongoing efforts toward FAIR data in fundamental physics.
Speaker: Dr Victoria Tokareva (Karlsruhe Institute of Technology) -
286
Evolution of the ATLAS EventIndex towards HL-LHC
The ATLAS EventIndex is the global catalogue of all real and simulated data produced and processed by ATLAS. The current implementation, developed and deployed for LHC Run 3 (2022-2026) has to evolve in order to be able to ingest, store and serve the much larger amount of data that will be produced during the High-Luminosity LHC operation years, starting in 2030. The modular architecture of the EventIndex system allows the progressive replacement of components as needed, but the replacement of the core storage system can be seen as a phase transition for the whole environment. By design the EventIndex components are based on open source software developed and used for BigData projects, and this foundation will remain for the future implementation too. Currently the data are stored in HBase tables with a Phoenix interface that allows SQL queries in addition to the native HBase commands; studies are in progress to evaluate the scalability of this solution to HL-LHC rates (three to five times the current rates of data ingestion and query), as well as the possibility to move to alternative data storage solutions. In addition to the ingestion and query performance, a significant factor is the long-term sustainability of the storage solution we will choose, as the system designed in the next couple of years and deployed in 2029 will have to be operational till after the end of LHC Run 4 (2035).
Speaker: Dario Barberis (University of California Berkeley (US)) -
287
Making Astroparticle Physics Data Reusable beyond Its Original Context
High-energy, nuclear and astroparticle physics operate at comparable scales of data volume and complexity and face closely related challenges in data preservation, metadata management, and long-term reuse. While these communities have developed robust experiment-specific data curation practices, metadata remains highly specific and heterogeneous, tightly coupled to custom formats, frameworks and executional environments. This reduces interoperability and the reusability of valuable datasets beyond their original analysis context. In particular, open data on cosmic-rays collected by astroparticle physics experiments are valuable for addressing interdisciplinary research questions, such as studies of cosmic-ray-induced backgrounds for collider experiments or validation and tuning of Monte Carlo simulations (GEANT4, FLUKA).
KASCADE (KArlsruhe Shower-Core and Array DEtector) made one of the first use cases in astroparticle physics for providing open and full public access to its reconstructed datasets and digital resources through the KASCADE Cosmic-Ray Data Centre (KCDC). Today KCDC is a long-running open data archive that provides access to a variety of digital research outputs by number of experiments in high-energy astroparticle physics and undertakes constant efforts in improving its data curation procedures.
In this contribution, we present recent developments in FAIR-oriented metadata management for KCDC digital resources. We focus on metadata records enrichment to improve domain knowledge representation and metadata-driven discovery, integration, and reuse in federated computing environments. The approach is aligned with infrastructure-level initiatives such as PUNCH4NFDI (Particles, Universe, NuClei and Hadrons for German National Research Data Infrastructure) and NAPMIX (Nuclear, Astro, and Particle Metadata Integration for eXperiments) and demonstrates how metadata solutions developed in astroparticle physics can support scalable, interoperable data reuse in the broader high-energy and nuclear physics computing ecosystem.Speaker: Victoria Tokareva -
288
Unified Cloud-Native Metadata Interface with Trino, Superset, and Ibis in the LZ Dark Matter Experiment
LUX-ZEPLIN (LZ) is the world’s most sensitive WIMP dark matter direct-detection experiment, acquiring petabytes of data per year using a dual-phase xenon time projection chamber (TPC) with a seven tonne active mass. User-facing metadata related to TPC conditions and data processing environments are stored in six different SQL and NoSQL databases, which historically were accessed by five bespoke programmatic and graphical interfaces that connect to unique subsets of these databases. The fragmented nature of these interfaces was difficult to maintain and confusing for end-users.
This work details how we used Helm to deploy Trino and Superset to respectively provide a unified SQL and graphical interface to access LZ metadata databases. This allowed us to immediately deprecate one graphical interface we maintain, and improve response times by up to a factor of 100 for common queries. We will also discuss our adoption of Ibis to complement Trino and Superset through its ability to create composable SQL queries with a dataframe-like python API. With minimal effort, Ibis has superseded a number of our other programmatic database interfaces, improving query expressiveness with a smaller code footprint.
Speaker: Eli Mizrachi (SLAC National Accelerator Laboratory) -
289
ILDG 2.0: FAIR data management in Lattice QCD and beyond
In this contribution we report on the re-factoring and re-configuration of main components of the International Lattice Data Grid (ILDG) in order to realize a modern data management framework which is fully FAIR-compliant and has a completely token-based access control.
ILDG started 20 years ago as an effort of the Lattice QCD community to organize and enable the worldwide sharing of large data sets from expensive numerical simulations. At that time, it leveraged grid technologies, like those used for the LHC grid. Two crucial aspects of the modernization of ILDG during the past three years were the setup of a new global user management through a dedicated INDIGO IAM instance, and the re-implementation of the metadata and file catalogs.
The new IAM was pivotal to the transition to a (capability-) token-based and fine-grained access control for all catalog and storage services. This completely eliminates the use of grid certificates, facilitates the use of data stores based on cloud technologies, and enables collaboration-internal data sharing with rigorous handling of embargo restrictions.
The metadata catalog supports multiple and freely configurable metadata schemata. Together with file catalogs this provides the essential building blocks for a modular, flexible, and FAIR-compliant data management framework.
With an updated revision of the rich QCDml metadata schema, ILDG 2.0 is now fully operational and FAIR-compliant. Moreover, an ILDG-like setup is considered as favorable solution for use-cases beyond Lattice QCD, e.g. for axion experiments or radio-astronomy.
Speaker: Hubert Simma (DESY)
-
284
-
Track 1 - Data and metadata organization, management and access: Storage systems and file system protocols
-
290
Towards a standardized language-neutral StoRM Architecture
The StoRM system provides storage services for scientific communities relying on distributed computing infrastructures through multiple loosely-coupled components developed in different programming languages at INFN-CNAF, including StoRM WebDAV and StoRM Tape. StoRM WebDAV is a StoRM component which provides HTTP/WebDAV access to distributed storage systems, while StoRM Tape is an implementation of the WLCG Tape REST API, which allows users to recall files stored on tape libraries.
Although beneficial for service-specific optimizations, this heterogeneity of StoRM components introduces challenges in observability, authorization and architectural consistency. This contribution describes an ongoing effort towards a standardized, language-neutral StoRM architecture based on the adoption of widely used technologies. Shared functionalities are externalized from the core services by introducing NGINX as a reverse proxy for request handling, OpenTelemetry for unified observability and Open Policy Agent (OPA) as a centralized authorization engine.
The introduction of OpenTelemetry has significantly improved the ability to identify and analyze performance bottlenecks, helping to optimize critical operations. At the same time, the combination of NGINX and OPA has allowed authorization logic to be moved out of the core StoRM services, simplifying their internal design and improving request handling efficiency. By relying on a single policy engine, policy definition and enforcement are unified, avoiding duplication across services.Speaker: Francesco Giacomini (INFN CNAF) -
291
dCache project status and updates
The dCache project provides an open-source, highly scalable distributed storage system deployed at numerous laboratories worldwide. Its modular architecture supports high-rate data ingestion, WAN data distribution, efficient HPC access, and long-term archival storage. Although initially developed for high-energy physics, dCache now serves a broad range of scientific communities with diverse performance and consistency requirements.
This contribution presents recent technical developments in dCache, including on-demand hot file replication to mitigate hotspots and increase I/O throughput, as well as support for multi-namespace deployments that enable sharing of data-serving resources across independent filesystem hierarchies. Additional advances comprise deeper integration with the CERN Tape Archive (CTA), enhanced metadata handling, support for token-based authorization, the bulk QoS transition API, and a REST interface for fine-grained tape interaction. We conclude with an outlook on upcoming developments relevant for HL-LHC and other data-intensive scientific workflows.
Speakers: Dmitry Litvintsev (Fermi National Accelerator Lab. (US)), Marina Sahakyan, Mr Tigran Mkrtchyan (DESY) -
292
Toward HPC with Recent Improvements in the dCache NFSv4.1 Interface
POSIX access remains the de facto dominant access mechanism in HPC environments, defining how applications and workflows interact with large-scale storage systems. With its NFSv4.1/pNFS protocol implementation, dCache provides a native integration into HPC environment supporting a large number of scientific applications.
The recent development efforts in dCache have concentrated on strengthening NFSv4.1/pNFS protocols implementation to better meet HPC-workload expectations.
This contribution presents the latest advancements in dCache’s NFS stack, highlighting two major improvements. First is NFS open delegation, which significantly reduces the number of network round-trips for some applications, thereby reducing file-opening latency. And the second is zero-copy reads, which allow data to flow directly from the backend storage device to the network stack without CPU-bound copying, thereby significantly reducing CPU consumption and increasing sustained read bandwidth.
Together, open delegation and zero-copy reads are a substantial step toward making dCache more efficient, scalable, and HPC-oriented. This presentation will detail the design decisions, implementation experience, performance results, and deployment considerations that demonstrate how these protocol-level enhancements strengthen dCache as a robust, standards-compliant storage platform for the global scientific and HPC communities.
Speaker: Mr Tigran Mkrtchyan (DESY) -
293
CERN Storage technology explorations: Adding NFS 4.2 as a Strategic Protocol for EOS
EOS, CERN’s large-scale storage system, is continuously evolving to support increasingly diverse and performance-critical scientific workflows. As part of this evolution, we are considering NFS 4.2 as a strategic new protocol for EOS in order to extend its interoperability, leverage kernel-level client performance, and open a path for community collaboration based on open standards.
cern-nfs is an R&D project implementing a user-space NFS server in C++. By integrating user-space NFS 4.2 with EOS through a dedicated cern-nfs Virtual File System (VFS) plug-in, we can even leverage a performant kernel client that natively supports erasure-coded (EC) back-ends via the EOS gateway FST I/O layer. The work builds upon the current cern-nfs NFS v4.0 → v4.1 → v4.2 evolution and ongoing performance optimization and measurement campaigns.
Planned extensions include proposing a session-based authentication mechanism as a potential contribution to the Linux NFS kernel, and the development of an asynchronous, parallel-socket POSIX C++ client to overcome I/O pattern inefficiencies arising from VFS I/O page alignment. The architecture also allows a cern-nfs federator layer to interconnect NFS-based storage systems, supporting broader federations and collaborative deployments.
This effort not only strengthens EOS’s protocol stack but also creates a framework for external contributions and cross-community engagement. In the long term, NFS 4.2 could evolve into a drop-in replacement for the eosxd FUSE filesystem (FUSE-over-XRootD) in local-area environments, while wide-area scenarios will have to be evaluated given the protocol’s chatty nature.
Speaker: Andreas Joachim Peters (CERN) -
294
A Scalable Architecture for Metadata-Intensive Distributed NFS
As part of the CERN Storage Group’s technology investigations, we are exploring future-proof, scalable interactive service architectures that meet demanding requirements for performance and maintainability.
To achieve this, we are focusing on storage solutions that provide Linux-native filesystem access using open, standards-compliant technologies capable of securely supporting tens of thousands of clients.One possible new design combines NFS 4.1 with a CephFS backend, offering a flexible and robust storage solution that meets the diverse requirements of modern HEP computing environments. The service architecture is based on four fundamental concepts:
Scalability & Resilience: A fully scalable architecture that leverages kernel nfsd and CephFS to ensure high throughput, redundancy, and fault tolerance.
Security & Compliance: End-to-end protection through Kerberos (krb5/krb5p), encryption, and certificate-based authentication to meet CERN’s community security standards.
Automation & Efficiency: Puppet-based provisioning and configuration management, integration of high availability and automated backup mechanisms for continuous service operation.
Interoperability: Complete solutions for both Linux and macOS clients, to ensure seamless access across heterogeneous computing environments.This initiative seeks to identify a storage platform that provides transparent failover and strong support for both current and future metadata-intensive filesystem-based workflows, including batch processing, interactive analysis, and containerized (Kubernetes) workloads.
The outcome of this technology investigation will be a critical step toward building a secure, maintainable, and scalable filesystem ecosystem that supports the next generation of data-intensive HEP computing applications for Run-4 and beyond.
Speaker: Andreas Joachim Peters (CERN) -
295
JWanFS: A WAN-Oriented Distributed File System for Multi-Data Center Collaboration in High Energy Physics
With the continuous advancement of HEP detectors and online reconstruction capabilities, the scale of experimental data is growing rapidly. The data pattern is increasingly characterized by "massive small files distributed across multiple data centers." On one hand, the surge in small files creates bottlenecks in metadata and directory operations; on the other hand, cross-data center access often relies on complex cross-domain operational strategies, making it difficult to balance performance with scalability.
To address these issues, this paper proposes JWanFS, a distributed file system designed for HEP experiments. It provides a unified namespace and nearest-access capabilities for multi-site users, with optimizations specifically for Wide Area Network (WAN) environments.
The key designs of JWanFS include:
- Storage & Interface Optimization: Enhances small file organization and access strategies based on SeaweedFS, and supports multi-protocol access (NFS, S3, XRootD) via a gateway layer to seamlessly integrate with data analysis and AI training workflows.
- Metadata Synchronization: Utilizes MongoDB's asynchronous replication (Oplog) mechanism for efficient cross-site metadata distribution and minimizes directory traversal overhead through range query optimization.
- Access Acceleration: Combines a "nearest data center" policy with client-side multi-level caching to significantly reduce WAN Round-Trip Time (RTT) and cross-domain jitter.
JWanFS demonstrates stable and efficient throughput and scalability under typical small-file workloads and cross-domain access scenarios. We plan to deploy and iterate the system in further HEP experiments (such as LHAASO and JUNO) to provide reliable and efficient cross-domain storage infrastructure support for the next generation of high energy physics experiments.
Speaker: 隗立畅 weilc (IHEP)
-
290
-
Track 2 - Online and real-time computing
-
296
The backward realtime tracking strategy in the LHCb experiment
In Run 3 data taking, the LHCb experiment at CERN operates with a fully software-based first level trigger (HLT1) on GPUs that processes 30 million collision events per second with a data throughput of 4 TB/s. Realtime track reconstruction is essential for HLT1 because most trigger decisions rely on reconstructed tracks or on higher level objects built from them, such as secondary vertices.
The baseline HLT1 tracking algorithm, known as Forward Tracking, reconstructs tracks by starting from the tracking detector closest to the collision point (the VELO) and extending forward to the last tracking detector (the SciFi). Alongside this approach, an alternative strategy has been developed and is currently operating in LHCb. This strategy begins in the SciFi detector and reconstructs tracks backward toward the VELO.
This talk will focus on this backward realtime tracking strategy. It consists of a set of tracking algorithms designed to reconstruct different categories of tracks, including the Hybridseeding algorithm for SciFi standalone tracks, the Matching algorithm for Long tracks, and the Downstream algorithm for downstream tracks.
Speaker: Jiahui Zhuo (Univ. of Valencia and CSIC (ES)) -
297
Studies of FPGA accelerated track reconstruction for the ATLAS Event Filter
The upcoming high-luminosity phase of the LHC (HL-LHC) presents several challenges for the ATLAS experiment's Trigger and Data Acquisition system, necessitating a full upgrade of the system. A key challenge for the Event Filter, where high-level event reconstruction and final event selection will run at 1 MHz, lies in the computational demand for online track reconstruction within the Inner Tracker. Over the past few years, extensive research has been conducted into utilising hardware accelerators in the ATLAS Event Filter system to improve tracking throughput and reduce full-system power consumption. Various end-to-end track reconstruction pipelines have been developed using GPUs and FPGAs. These pipelines demonstrate their capabilities by offloading different amounts of the computing load to the accelerators.
This contribution focuses on developments in FPGA-based track reconstruction pipelines integrated into the ATLAS software framework, Athena. A high-throughput FPGA accelerator for hit clustering and data preparation has been implemented in hardware, and various algorithmic extensions have been studied. The results will be compared with those of the CPU and GPU counterparts.Speaker: ATLAS Collaboration -
298
Real-time 40 MHz Track Reconstruction: The Readout and Processing Chain of the CMS Outer Tracker for HL-LHC
The High Luminosity LHC (HL-LHC) presents an unprecedented computing challenge, characterized by a pile-up of up to 200 interactions per bunch crossing and extreme data rates. To cope with these conditions, the CMS experiment is replacing its tracking system with a novel Outer Tracker capable of contributing to the Level-1 (L1) Trigger. This upgrade introduces a paradigm shift in data processing, moving high-momentum particle selection to the detector front-end to enable real-time tracking at 40 MHz.
This contribution presents the comprehensive readout and reconstruction architecture of the CMS Outer Tracker. We detail the "pT-module" concept, which utilizes correlation logic between closely spaced sensor layers to perform on-detector data reduction, rejecting hits from low-momentum tracks and reducing the data volume by an order of magnitude. We describe the resulting dual-stream data path: a high-bandwidth, continuous 40 MHz stream of "stubs" utilized for track finding, and a standard triggered readout stream where full event data is extracted from front-end buffers only upon Level-1 acceptance.
Special focus is placed on the downstream real-time reconstruction chain for the 40 MHz stream. We discuss the back-end processing performed by the Data, Trigger, and Control (DTC) boards and the subsequent Track Finding system based on the "Tracklet" algorithm. We present the implementation of this algorithm on high-performance FPGAs (Apollo boards), featuring a massive pipelined architecture of memory and processing modules designed to execute track seeding, projection, matching, and fitting within a strictly bounded latency of 4 µs. Finally, we report on the status of the firmware development, validation via software emulation, and integration tests with detector prototypes.
Speaker: Marco Riggirello (Scuola Normale Superiore & INFN Pisa (IT)) -
299
Neural network cluster finding for the ALICE TPC online computing
The ALICE time projection chamber (TPC) is the main tracking and particle identification device used in the ALICE experiment at CERN. With a 900 GB/s data rate and a fully GPU-based online reconstruction, the online processing is capable of handling even the densest environments of central Pb--Pb interactions at 50 kHz nominal interaction rate (Run 3) and creates an ideal environment for the application of parallelizable machine learning algorithms.
The work to be presented concerns cluster finding, with the first-ever application of neural networks in ALICE online processing. Both a classification network for noise removal and a regression network for cluster property inference can be presented. A 3D charge input to the cluster finding step marks a new approach taken for this challenge. The tuning of this algorithm for physics and computing performance is a major part of the optimizations put in place for the feasibility of deployment. On the technical aspect, design optimizations of the network architecture, floating-point quantization, custom CUDA-streamed implementation, efficient utilization of the ONNX Runtime framework and metrics from the first commissioning runs mark the cornerstones of this project. The achieved performance is a reduction in total number of clusters of up to 18% with maintained or improved physics performance, that is demonstrated on both simulated and real data. Extensions of this work demonstrate the feasibility of extracting a track direction vector before the tracking stage, using the local 3D charge information only.Speaker: Christian Sonnabend (CERN, Heidelberg University (DE)) -
300
A heterogeneous and vectorized sequence for the HL-LHC full tracking reconstruction of the CMS experiment
This talk presents the new baseline strategy for the Phase-2 tracking of the CMS experiment for online event reconstruction, and for the main iteration of offline tracking. This tracking sequence takes advantage of the combination of cutting-edge tracking algorithms that are either optimized for parallel execution on GPUs (Patatrack and LST), or are vectorized for efficient CPU performance (mkFit). Such a combined approach offers an effective solution to deal with the unprecedented computational challenges caused by the large number of simultaneous collisions per bunch crossing at the High Luminosity Large Hadron Collider (HL-LHC). The proposed combination not only reduces the computational resource requirements but also enhances the physics reach by incorporating displaced tracking and increasingly leveraging machine learning techniques.
Speaker: Manos Vourliotis (Univ. of California San Diego (US)) -
301
Studies of track reconstruction performance in the ATLAS Event Filter for the HL-LHC
The instantaneous luminosity
at the High-Luminosity LHC (HL-LHC) will reach unprecedented levels, boosting the physics reach at the LHC. To cope with the resulting challenging pile-up condition and fully exploit the new high-granularity Inner Tracker (ITk), a major upgrade of the ATLAS
Trigger and Data Acquisition (TDAQ) system is ongoing, with track reconstruction in the Event Filter being a critical component. Achieving an online tracking performance close to that of offline algorithms is essential to ensure a successful physics program
at HL-LHC, providing the required trigger efficiency while maintaining sustainable trigger rates. Over the past years, an extensive R&D effort has been carried out to design a heterogeneous computing system, exploring possible integrations of CPU cores with
GPU or FPGA accelerators at different stages of the tracking workflow, to identify the technology with the highest potential in terms of throughput, power consumption, cost, and tracking performance. This contribution will focus on the remarkable tracking
performance achieved across the different technologies, demonstrating the strong potential of tracking at the Event Filter level.Speaker: ATLAS Collaboration
-
296
-
Track 3 - Offline data processing: Core Software and Frameworks 2
-
302
The development and upgrades of BESIII Offline Software
The BESIII experiment has been operating since 2009 and has received several upgrades, to study the tau-charm physics utilizing the BEPCII accelerator. Both the BEPCII accelerator and BESIII detector have been upgraded during these years. The BESIII offline software system is developed based on Gaudi framework, provides the fundamental basis for physics analysis.
This talk focuses on the development and upgrades of the offline software, explaining how it meets challenges such as the long lifetime of BESIII, rich physics topics, various upgrades to both detector and accelerator, continuous upgrades to be compatible with new operating system and external libraries.Speaker: Prof. Ziyan Deng -
303
Offline Data Processing for JUNO's first-year commissioning and physics data taking
The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment designed to determine the neutrino mass ordering and to achieve high-precision measurements of neutrino oscillation parameters. Construction of the JUNO detector was completed at the end of 2024, followed by commissioning of the water phase and the subsequent liquid scintillator filling phase. Physics data taking began on 26 August 2025, and JUNO released its first oscillation results based on the initial 59.1 days of physics data.
This contribution reports on the design, deployment, and operational experience of the offline data processing system supporting JUNO’s first-year commissioning and physics data taking. Detector data produced at 40 GB/s are reduced online to approximately 90 MB/s of byte-stream RAW data and transferred to the Tier-0 site via a dedicated high-bandwidth network. At Tier-0, an automated processing pipeline converts RAW data into a ROOT-based data model (RTRAW) and performs prompt event reconstruction. All data products are subsequently distributed to Tier-1 sites for large-scale reprocessing and analysis. RTRAW data are reprocessed with refined calibration constants and reconstruction algorithms to produce the final datasets used for physics analyses.
Despite extensive data-challenge campaigns prior to data taking, several nontrivial challenges emerged during commissioning and early physics data taking. These include event sorting within the automated pipeline to support time-correlation analyses, dynamic reconstruction steering to accommodate multiple event types and algorithms, and significant file-system pressure caused by large-scale concurrent analysis jobs. To address these issues, optimized workflow orchestration, flexible reconstruction control, and compact analysis-oriented data formats retaining access to hit-level information were developed and deployed.
Finally, plans for near-term evolution of the offline system are presented, including increased data aggregation to reduce file counts and the deployment of multi-threaded reconstruction algorithms to improve processing efficiency in production.
Speaker: Tao Lin (Chinese Academy of Sciences (CN)) -
304
RNTuple Integration in the Flexible Object Read/write Model for DUNE
The Deep Underground Neutrino Experiment (DUNE) will deploy four 10 kt fiducial mass liquid argon-based tracking calorimeters to study neutrino oscillation properties, supernova neutrinos, and beyond the standard model physics. To accomplish its diverse physics program, DUNE must read out over 1000 time-samples of waveforms for each of its nearly 400,000 channels. Therefore, a DUNE data record for a single readout contains several GB of data: orders of magnitude more than the MB event sizes for which collider experiments' reconstruction frameworks are optimized.
DUNE is developing the Phlex framework to facilitate processing detector data in a more flexible manner than traditional event-based computing models. The Flexible Object Read/write Model (FORM) provides the I/O infrastructure for Phlex with support for on-demand reading and eager, flexible-grained writing of DUNE's data products. The modular design of FORM allows transparent support of I/O for ROOT TTree and RNTuple, HDF5, and evolution to more formats over the lifetime of DUNE.
This presentation will specifically focus on FORM's utilization of RNTuple to address the I/O needs of DUNE. RNTuple's performance when splitting one readout record into multiple entries and multiple RNTuples in the same file is of particular interest to DUNE. The architecture of FORM's RNTuple implementation will be discussed, and benchmarks comparing RNTuple and TTree operation for realistic data samples will be presented.
Speaker: Andrew Paul Olivier (Argonne National Laboratory) -
305
L'Arlesienne de ROOT: How to lift a 30 years old limit in the heart of ROOT I/O.
Over many years, ROOT users have repeatedly stumbled over—and loudly rediscovered—the infamous 1 GB limit on individual I/O operations, a constraint that somehow survived long past the era when anyone thought a gigabyte was “a lot.” As experiments embraced ever-larger objects and collections, this limit became an increasingly unavoidable rite of passage. This contribution recounts the sustained, multi-year quest by ROOT I/O developers to finally retire this relic, navigating a maze of legacy APIs, memory-management assumptions, and integer boundaries that seemed determined to preserve the status quo. We describe how internal interfaces were carefully modernized to introduce fully 64-bit–capable code paths without breaking the mountains of existing user code that would definitely have noticed. With the limit now lifted, ROOT can finally handle multi-gigabyte objects in a single read or write operation, even when splitting them into an RNTuple is not an option (we’re looking at you, large RooWorkspaces and giant histograms), liberating users from yet another “fun” debugging adventure and clearing the way for the massive analyses of the HL-LHC and beyond.
Speaker: Philippe Canal (Fermi National Accelerator Lab. (US)) -
306
Status of the ROOT Project: not only LHC, not only HENP!
In this contribution we discuss the status of the ROOT project right before the LHC Long Shut Down 3.
We highlight the usage of ROOT by non-LHC communities, for example gravitational waves physics, nuclear physics, neutrino physics as well as experiments at electron colliders. In addition, the usage of ROOT in contexts such as market regulation will be discussed.
The processes by which the Project engages with its user communities will be reviewed, highlighting how the ROOT Project pursues its main overarching goal: make its users even more successful.
Moreover, the most relevant highlights are presented of the features integrated in the last 18 months, and the ones planned for the next few years.Speaker: Danilo Piparo (CERN) -
307
Recent developments in Key4hep
In this contribution, we highlight several recent developments within Key4hep, the turnkey software stack for future collider studies. These developments cover a variety of topics, most importantly a first stable release of the common event data model format, EDM4hep, and related developments. We have also significantly enhanced the integration with external software packages such as ACTS for tracking and Pandora for Particle Flow reconstruction. We will also report on the evolving needs on usability, performance and feature set stemming from the ever growing user community of essentially all currently studied future projects, including FCC, EIC, ILC, CLIC, MuonCollider, and CEPC.
Speaker: Juan Miguel Carceller (CERN)
-
302
-
Track 4 - Distributed computing
-
308
Gravitational-wave computing for O4 and beyond
The LIGO–Virgo–KAGRA (LVK) Collaboration closed its fourth observation period (O4) in November 2025, its longest and richest to date. During O4, the detectors observed roughly 250 gravitational-wave candidate signals in real time, and more are extracted from the data by offline analysis. Outstanding results were, for example, the first detection of “second generation” black holes, in which the primary objects are likely to be themselves the result of previous mergers, or the observation of the most massive black hole merger to date.
The LIGO, Virgo and KAGRA interferometers are now preparing for a new phase of technological upgrades and testing over the next few years, that will likely be implemented in stages interleaved by periods of science data taking: a new observation campaign is currently being planned to run for six months in 2026-27.
Over the years, the LVK computing model evolved to cope with higher event rates from more advanced detectors and a growing community of researchers, moving from custom-made tools running on a few dedicated clusters to distributed computing based on widely used technologies.
We provide an updated overview of the cyberinfrastructure that enabled O4 operations, spanning data management and distribution, low-latency analyses, alert management, offline data processing and support for Open Science activities. Also, operational experience from O4 is discussed, together with performance metrics and lessons learnt.
Finally, plans are presented for a further evolution of the infrastructure in view of continued growth in data volume, analysis complexity, and collaboration size.
Speaker: Dr Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare, Torino) -
309
The Einstein Telescope Computing Model
The Einstein Telescope (ET) will be the next-generation European underground Gravitational Wave (GW) observatory, designed to open a new observational window on the Universe starting in the mid to late 2030s. Building upon the experience of current GW detectors such as LIGO and Virgo, ET will achieve a significant increase in sensitivity, enabling the detection of a much larger number of GW sources and the exploration of new astrophysical and cosmological regimes. This ambitious goal entails the generation, processing, and long-term preservation of data volumes far exceeding those produced by current GW facilities. More challenging though will be the computing power required to process the data, which scales with the rate of detections; ET will detect ~1000 times more GW events per year than LIGO and Virgo. Multi-messenger science requires that ET data, together with data from any other GW detector, be processed with low latency i.e. within seconds of being recorded.
ET Computing and Data requirements, based primarily on scaling the computing requirements of LIGO and Virgo, indicate that ET will require computing resources similar in scale to LHC experiments. The latest understanding of these requirements, informed by Mock Data Challenges, will be presented. Based on these requirements, the ET Computing Model concept has been defined as an evolution of the computing model used by LIGO and Virgo, incorporating best practices and solutions from industry and WLCG and the HSF. Significant work lies ahead to realise the ET computing model, which also represents opportunities for collaboration. In addition, the case for professionalising the software and computing workforce for ET is made, recognising the 50 year lifespan of ET using continually evolving and heterogeneous computing and software solutions.
Speaker: Paul James Laycock (Universite de Geneve (CH)) -
310
ATLAS Distributed Computing towards LHC Run4
ATLAS Distributed Computing (ADC) is the set of infrastructure, software stack and experts that handle up to 1 million computing slots and over 1 EB of stored data in order to facilitate the computing needs of the ATLAS experiment at the LHC accelerator. After short description of the ADC structure and operational performance, this contribution focuses on the latest ADC innovations as well as future plans to meet the challenges that the luminocity increase expected from the LHC accelerator for Run4 represent for distributed computing.
Speaker: Ivan Glushkov (Brookhaven National Laboratory (US)) -
311
The ePIC Streaming Computing Model
The ePIC collaboration is developing a highly integrated, multi-purpose detector for the upcoming Electron-Ion Collider (EIC). A co-design approach between the detector and the computing enables a seamless data flow from detector readout to physics analysis, using streaming readout and AI. This system is aimed at accelerating scientific discovery and improving measurement precision through enhanced control of systematic uncertainties.
The ePIC computing model is designed to operate across a globally distributed network of resources, spanning the host laboratories, domestic institutions, and international partners. To organize this infrastructure effectively, the ePIC computing model introduces the concept of “Echelons,” which is informed by, but distinct from, the Tier model used at the LHC.
A defining feature of the model is its dual-host architecture, in which raw data is simultaneously streamed to two geographically distinct sites, forming symmetric and complete replicas of the full dataset in near real time. This “butterfly model” ensures that either site is technically capable of performing full downstream processing independently or in coordination, depending on the needs of the collaboration.
The nature of streaming data presents distinct challenges that shape the design and implementation of workflow and workload management. The system must be capable of operating under time-critical constraints, consuming fine-grained, quasi-continuous data streams. It must also respond flexibly to changing data-taking conditions, shifting availability of computing resources, and the possibility of faults across the distributed system. The result is an adaptive and highly automated computing model that emphasizes robustness and efficiency in a distributed environment.
With the design phase now substantially complete, the effort is transitioning to implementation, supported by active testbeds and functioning prototypes. This presentation will describe the computing model’s conceptual foundations, its unique features, and the progress made in realizing it.
Speaker: Holly Szumila-Vance (Florida International University) -
312
JUNO distributed computing production experience for first-year data taking
The Jiangmen Underground Neutrino Observatory (JUNO) commenced physics data taking in August 2025, marking the transition from commissioning to full-scale operation of its Distributed Computing Infrastructure (DCI) system for real physics data. This contribution presents the Monte Carlo production and physics production experience accumulated during the first year of data taking.
We provide an overview of the production workflow, detailing the integration of the DCI system with the JUNO offline processing pipelines. We report production features and status, and describe the system’s operational mechanism. Moreover, we will mention the adjustment of the computing model to meet the need of physics analysis in this period. We will highlight the challenges and bottlenecks we met during reconstruction, including unforeseen network constraints, and higher-than-expected I/O demands, and the targeted mitigations are made accordingly.
These improvements have significantly boosted system reliability and throughput. The lessons learned offer critical guidance for sustaining long-term JUNO computing operations and optimizing service deployment strategies.Speaker: Xiaomei Zhang (Chinese Academy of Sciences (CN)) -
313
Distributed computing system for the SPD experiment
The SPD (Spin Physics Detector) facility is currently under construction as part of the NICA complex at JINR. In parallel with the physical infrastructure, the experiment’s software ecosystem is being developed to meet the growing need for large-scale simulation of physical processes.
As an international collaboration, SPD leverages the distributed computing resources contributed by its member institutes for data processing. To unify these geographically dispersed resources, SPD has implemented a suite of systems and services. These enable the organized storage and processing of experimental data across both JINR and partner computing facilities, forming a coherent distributed computing environment for the experiment.
This environment is already operational, supporting full-scale production simulations requested by physics working groups. Throughout 2025, the system has simulated over 1 billion physics events and produced several hundreds terabytes of data. This talk presents an overview of the SPD distributed offline data processing framework.
Speaker: Artem Petrosyan (Joint Institute for Nuclear Research (RU))
-
308
-
Track 5 - Event generation and simulation: Full Simulation 2
-
314
Energy efficiency of GPU-based Monte-Carlo simulation using AdePT
The use of heterogeneous CPU–GPU architectures is becoming an increasingly important consideration for LHC experiments in view of the growing computing demands of the HL-LHC era. WLCG sites and LHC experiments must make decisions in the short to medium term on the deployment and integration of GPUs, in order for these resources to be available and effectively exploited for HL-LHC operations. A key factor in this decision process is the cost-effectiveness of GPUs when running HEP software.
The AdePT project has demonstrated that full Monte-Carlo simulations can be efficiently adapted to GPUs and has been integrated into the ATLAS, CMS and LHCb experiments software frameworks. This integration provides simulation capabilities that are close to production use, with excellent physics agreement with established CPU-based workflows. The energy efficiency of GPU-accelerated simulation has been evaluated in realistic, production-like environments using modern hardware, enabling quantitative comparisons of cost and energy consumption relative to traditional CPU-based simulation. In addition, mitigation strategies such as power and frequency capping have been investigated to further optimize physics throughput per Watt.
Speaker: Juan Gonzalez Caminero (CERN) -
315
Enabling LHC experiments' detector simulations on GPU with AdePT
The drastic increase in detailed detector simulation costs for the LHC experiments caused by the high-luminosity upgrades presents a significant challenge. At the same time, high-performance computing is shifting toward heterogeneous architectures that employ accelerators such as GPUs, offering substantial gains in computational power and energy efficiency. Enabling full Monte Carlo simulations on GPUs could therefore help meet the demands of upcoming simulation campaigns. For GPU-based simulations to be adopted in production workflows, seamless integration with the LHC experiment software frameworks is essential.
In this work, we report how AdePT — a Geant4 plugin that offloads the transport of electrons, positrons, and gammas to GPUs — integrates into the software frameworks of ATLAS, CMS, and LHCb. By decoupling particle transport on the GPU from experiment-specific code executed on the CPU, AdePT accelerates simulations with minimal modifications to the existing simulation software. For the first time, we demonstrate excellent physics agreement between simulations using AdePT on GPUs and the baseline CPU simulations within the LHC experiment frameworks. This represents an important milestone toward deploying GPU-accelerated production simulations for the LHC experiments.
Speaker: Severin Diederichs (CERN) -
316
Full Simulation of the CMS experiment progress: from Run3 to Run4
During CERN LHC Run 3 data taking, the CMS Geant4-based full simulation was upgraded a few times. The Geant4 version was changed from 10.7.2 to 11.2.2. Other libraries used for the Monte Carlo simulation of CMS---CLHEP, DD4hep, VecGeom---were also updated. A new library G4HepEm was adopted for the CMS simulation, improving CPU performance both for Run 3 and Run 4. In this work, we discuss physics validation results, CPU performance, and memory consumption of the CMS full simulation, which were obtained using test beam, detector simulation, and CMS data. During Run 3, some subdetectors of CMS were added or modified mechanically. We describe these modifications and methods of geometry adoption and maintenance. The CMS geometry of Run 4 has not yet been finalized. We discuss software development for the geometry evolution and the status of the CMS Run 4 geometry description. For the Run 4 CMS full simulation, several R&D projects are ongoing, including AdePT and Celeritas GPU-based simulation of electromagnetic particle transport. We will present progress and validation results for these projects.
Speaker: CMS Collaboration -
317
Geometry tracking in Celeritas for EM and optical physics
Computational geometry for high energy physics detector simulation is notoriously complex, and indeed it is the primary performance bottleneck in the GPU Monte Carlo codes Celeritas and AdePT.
Detector descriptions contain millions of distinct physical parts with length scales spanning over five orders of magnitude.
Electromagnetic physics simulations must contend with curved particle trajectories that may have abrupt direction changes near boundaries and local displacement due to multiple scattering.
Optical photon simulation has a different set of difficulties at boundary crossings, which must account for microfaceted surface roughness treatments and multi-material coatings approximated as infinitesimal layers.The Celeritas detector simulation code implements optical and EM physics, on CPU and GPU, with multiple geometry tracking engines.
This presentation will describe the challenges inherent to general and edge cases, demonstrate the Celeritas implementations of EM field propagation and optical surface crossings, and analyze performance characteristics and limitations of the methods.
Addressing these challenges, thereby improving the accuracy and performance of geometry navigation, is essential to unlocking the power of Celeritas for HEP experiments and beyond.Speaker: Seth Johnson (Oak Ridge National Laboratory (US)) -
318
Decoupled Interface for Concurrent Gaudi and Geant4 Event Loops in ATLAS Athena
Gaudi, the event processing framework underlying the ATLAS software framework “Athena”, manages the event loop through a thread pool and a dataflow-driven scheduling system. Each event is processed by a sequence of user-defined algorithms that declare explicit data dependencies. Geant4, the toolkit used for detector simulation, likewise manages its own event loop and thread pool, exposing a well-defined API for user customization. Integrating these two libraries is challenging, as both require control over the event loop and employ distinct concurrency models.
The current Athena implementation addresses this by replacing core Geant4 components with custom ATLAS extensions to run Geant4 within the Gaudi event loop. While functional, this approach tightly couples the two projects, complicates maintenance, and hinders upgrades to newer Geant4 versions.
In preparation for Run4 Simulation, ATLAS is moving toward a decoupled integration model that allows Gaudi and Geant4 to run concurrently in independent thread pools. The two frameworks will then synchronize only at event boundaries, exchanging data through the Geant4 user interface. This design separates the control flow of both frameworks, simplifies maintenance, enables straightforward Geant4 upgrades, and improves long-term sustainability of simulation within Athena.
Speaker: Julien Esseiva (Lawrence Berkeley National Lab. (US)) -
319
Impact of Siloed Geometry Data on LHC Experiments. The ATLAS Experiment Case Study.
LHC experiments rely on highly complex detector geometries that support multiple phases of the experiment's lifecycle, including engineering design, manufacturing, installation, physics analyses, and outreach. Although the underlying detector components are the same across these tasks, the requirements differ significantly. For example, engineering integration typically needs only the external boundary surfaces, without detailed material information, whereas physics analyses require full internal descriptions along with accurate representations of overall mass, volumes, and materials. Consequently, the methods, techniques, and tools used to create these geometries also differ, resulting in a wide variety of geometry descriptions. Moreover, because LHC experiments involve enormous international collaborations, different partners often adopt different approaches and tools. This diversity has led to the proliferation of numerous heterogeneous geometry descriptions, so-called silo geometries. These siloed geometries introduce several challenges: 1. Complex and error-prone geometry migration between platforms 2. Difficulty implementing upgrades 3. Data-versus-Monte-Carlo discrepancies in physics analyses 4. Reduced clarity and interpretability in outreach visualizations This study is based on the ATLAS experiment. It examines the impact of isolated geometry descriptions on engineering design, simulation accuracy, data synchronization, version control, and cross-platform interoperability.
Speaker: Roger Jones (Lancaster University (GB))
-
314
-
Track 6 - Software environment and maintainability: AI for Software and OperationsConveners: Gaia Grosso (IAIFI, MIT), Ruslan Mashinistov (Brookhaven National Laboratory (US))
-
320
Advancing AccGPT: Agentic RAG for Scientific Knowledge Retrieval
Efficiently retrieving knowledge from particle physics research and documentation within CERN presents significant challenges due to specialized terminology and complex structural dependencies. This work presents the evolution of AccGPT, a CERN internal knowledge retrieval system, moving beyond baseline Retrieval-Augmented Generation (RAG) to address these limitations. We introduce a composite architecture that integrates GraphRAG for capturing structural relationships, Agentic workflows for multi-step reasoning, and the Model Context Protocol (MCP) for accessing real-time external data sources. Additionally, we explore domain-specific embedding fine-tuning to better handle scientific vocabulary.
We evaluate these pipelines using a framework that combines traditional retrieval metrics with head-to-head LLM-judge comparisons. This work discusses the trade-offs between system complexity, latency, and answer quality, offering practical insights for deploying advanced AI assistants in large-scale scientific environments.
Speakers: Dr Florian Rehm (CERN), Mr Luke Jason van Leijenhorst (CERN) -
321
SciBot: A Secure, High-Performance AI Assistant for Long-Term Preservation of RHIC Knowledge and beyond
Large-scale nuclear and particle physics experiments face a dual preservation challenge: maintaining long-term access to vast data volumes and the tacit scientific knowledge embedded in internal, often private or restricted, collaboration records. Public large language models (LLMs) cannot address this need for private data. To solve this, we developed SciBot, a locally deployed, domain-specific AI assistant within the RHIC Data and Analysis Preservation Program (DAPP), which provides secure natural-language access to preserved RHIC knowledge.
SciBot uses a Retrieval-Augmented Generation (RAG) architecture and the Model Context Protocol (MCP) to integrate local and remote LLMs, ensuring provenance and access control. Crucially, private documentation and communications are processed by locally hosted LLMs, guaranteeing the data remains strictly local and secure even when integrating external LLMs. Production-readiness efforts include performance and scalability testing of inference engines (vLLM, LlamaCpp, Ollama) on various GPUs (A6000, A100, H100) to optimize deployment cost. A key change is the migration from ChromaDB to Qdrant for the vector database, enabling scalable metadata filtering and strict segregation by collaboration level. Federated authentication (OIDC via CILogon) enforces experiment isolation while supporting shared knowledge.
These advances establish SciBot as a secure, performant, and extensible AI-assisted knowledge preservation service, offering a practical blueprint for facilities like the Electron–Ion Collider.
Speaker: Dr Jerome LAURET (Brookhaven National Laboratory) -
322
Root-cause Analysis of Data Discrepancies in the ATLAS Software Stack with CelloAI
In the ATLAS experiment, physics reconstruction and validation workflows produce large collections of histograms that must be compared across software versions to detect unexpected changes. Tracing these discrepancies back to their origins in complex codebases like Athena is time consuming and error prone. We present an approach to automate this root-cause analysis by combining vision-enabled LLMs with CelloAI, a locally hosted assistant for scientific software development. CelloAI integrates codebase-wide callgraphs into a RAG pipeline, enabling LLMs to reason about function interactions within broader software architecture while providing Doxygen-style comment generation, file-level summaries, and an interactive chatbot. We extend CelloAI with vision capabilities, exploiting reasoning-optimized LLMs to identify statistical outliers across histogram collections produced in ATLAS workflows. We also use CelloAI to add Doxygen-style comments identifying inputs and outputs of each routine in Athena and embed them in a searchable RAG database. Detected deviations, including both unexpected outliers and expected changes from deliberate algorithm modifications, are then correlated with this database to potentially identify their likely origins within the software stack. We will present our ongoing work demonstrating how combining callgraph-aware retrieval systems and multi-modal reasoning can significantly improve developer productivity and enable AI-powered root-cause analysis.
Speaker: FNU Mohammad Atif (Brookhaven National Laboratory) -
323
Adaptive Fault Management at CERN using Large Language Models
CERN’s compute farm must sustain 24/7 operation across thousands of worker nodes, a scale that will further expand for LHC Run 4 and beyond. Faults are frequent, both hardware- and software-related, and while some downtime is acceptable, extended recovery periods lead to measurable loss of throughput and operational efficiency. The existing automation system, based on hard-coded decision logic, efficiently addresses common issues but struggles with the growing number of edge cases observed at scale. Updating this system has become increasingly complex, especially when recovery actions depend on historical context rather than static state-transition logic.
To address these limitations, we investigate the use of large language models (LLMs) as an adaptive decision-making layer within the automation framework. In this design, existing components continue to perform information gathering and provide a bounded set of available actions, while the LLM interprets system states and selects appropriate tools for remediation. We present the integration architecture, mechanisms for control and observability, as well as early results from live deployment. We hope to demonstrate that this system can more efficiently identify and mitigate pathological cases, such as repeated reboot cycles, that tend to cause prolonged downtime.
Speaker: Panagiotis Gkonis -
324
INSPIREHEP Search and Discovery with AI driven Retrieval and MCP Server
INSPIREHEP is evolving toward a new search and discovery platform that combines AI assisted retrieval with a unified service for metadata and content processing. This contribution presents the design and planned deployment of two core components. The first is an AI based retrieval pipeline that enriches records with embeddings, improves ranking behaviour, and supports natural language queries. The second is the MCP server, a central service for metadata transformation, content normalization, and cross service coordination within the INSPIREHEP ecosystem.
The architecture introduces a single service that centralises the RAG pipeline and the MCP functions that connect records, metadata, and tooling across the platform. It improves discoverability, enables natural language queries, and provides a consistent entry point for INSPIREHEP services and collaborators to access the same capabilities with minimal integration effort.
This work strengthens the long term maintainability of the INSPIRE software ecosystem and prepares the platform for upcoming AI driven developments.
Speaker: Harris Tzovanakis (CERN) -
325
From Clicks to Conversation: A Dialogic and Collaborative Software Interaction Paradigm
At large-scale scientific facilities such as High Energy Photon Source (HEPS), diverse experimental techniques and detection methods have led to a proliferation of highly specialized data processing software. These tools often feature heterogeneous interfaces and complex parameters, imposing significant cognitive and operational burdens on users, software developers, and technical support staff across various disciplines.
To address these challenges, we propose a Dialogic and Collaborative Software Interaction Paradigm, aiming to transform traditional graphical user interface (GUI)-centric workflows into a new experience driven by natural language and characterized by real-time human-machine collaboration. The core architecture of this paradigm consists of three key modules: (1) a Dynamic Context-Awareness Module, responsible for capturing software state, operation history, and data semantics in real time; (2) a Structured Tool-Calling Module, which accurately maps user instructions in natural language into executable sequences of software functions; and (3) an Interface Synchronization and Execution Module, which drives the native graphical interface to respond step-by-step, creating a transparent interactive loop of "what you say is what you see."
To validate this paradigm, we have implemented the "Data Conversion Smart Assistant" software. This assistant allows users to express data processing intents—such as format conversion and batch filtering—in natural language. It then automatically parses the instructions, invokes the corresponding functional modules, and executes each step synchronously and visually within the software's main interface. This enables users to observe, confirm, and participate in the entire workflow in real time.
By shifting the user's focus from "how to operate" to "how to discover", this work provides both a practical tool for reducing interaction barriers and a conceptual framework for designing more human-centric scientific software.
Speaker: FU Shiyuan fusy
-
320
-
Track 7 - Computing infrastructure and sustainability
-
326
SCOPE: A Sustainable Computing Prototype for the Einstein Telescope
The rapidly growing energy demand of large-scale scientific computing infrastructures could significantly impact the environmental footprint of future experiments. For the Einstein Telescope (ET), sustainability is therefore a key design criterion from an early stage. The SCOPE project (Sustainable Computing Prototype for the Einstein Telescope) addresses this challenge by developing and operating a prototype computing center designed for tight integration with renewable energy systems.
The central goal of SCOPE is to explore an integrated architecture for sustainable scientific computing. Renewable energy production and storage are jointly designed together with the computing infrastructure as a single coherent system. The prototype will be designed for operation with a fully renewable energy supply and will explicitly account for the inherent variability of renewable energy sources. Rather than developing new computing paradigms, SCOPE builds on energy-aware workload management concepts currently being developed in other projects to reduce the required energy storage capacity.
This talk presents the motivation, architectural design, and key goals of the SCOPE prototype. The project serves as a practical testbed for sustainable computing infrastructures and aims to provide guidance for the design of future computing facilities for large-scale scientific experiments such as the Einstein Telescope.
Speaker: Stefan Krischer (RWTH Aachen University) -
327
How CERN openlab is supporting the experiments at the HL-LHC by unlocking new sustainable technologies through partnerships with industry
The High-Luminosity LHC (HL-LHC) era will confront particle physics experiments with unprecedented challenges in data volume, computational complexity, and real-time decision making. Preparing for this paradigm shift requires innovation across the full computing and triggering stack. Within this context, CERN openlab plays a central role in exploring and validating emerging technologies in close collaboration with industry partners.
This talk presents recent CERN openlab activities aimed at integrating AI techniques into both high-performance computing (HPC) workflows and low-latency trigger systems. On the offline side, we discuss the deployment of machine learning for large-scale simulation, reconstruction, and analysis, including studies of heterogeneous architectures, accelerators, and novel computing models to efficiently process exabyte-scale datasets. On the online side, we highlight R&D efforts to bring AI into real-time environments, enabling inference within strict latency and determinism constraints imposed by the trigger requirements at the HL-LHC.
We review prototyping results, performance studies, and lessons learned from deploying AI across diverse hardware platforms, from GPUs and CPUs to FPGAs and emerging accelerators. Finally, we outline how CERN openlab’s collaborative approach is helping to bridge the gap between cutting-edge research and production-ready solutions, laying the groundwork for scalable, intelligent computing and triggering systems for the HL-LHC and beyond.
Speaker: Thomas Owen James (CERN) -
328
SRCNet Distributed Computing: Architecture, Progress, and Lessons Learned
The Square Kilometre Array (SKA) telescopes, currently under construction in South Africa and Australia, are due to enter Science Verification at the end of 2026. The SKA Regional Centre Network (SRCNet) is federating distributed, heterogenous regional centres into a coherent global infrastructure to store and process SKAO data. This contribution presents the distributed computing challenges of the SRCNet project, introducing the software stack for distributed, federated computing, focusing on the deployment and operation of PanDA as the workload management system.
We describe the role of the Global Execution API (GE API) as the SRCNet abstraction layer between user workflows and site-level execution backends, and outline the core stack comprising PanDA, the GE API, and federated identity and access management (IAM). We also describe SRCNet’s use of the Site Capabilities service as a lightweight alternative to centrally curated resource catalogues such as CRIC.
We report on deployment progress, lessons learned, current limitations, and next steps (particularly, integration with federated data management) in the context of evolving toward production-ready distributed execution for early SKA science.
Speaker: Rohini Joshi -
329
SPECTRUM: A Strategic Framework and Technical Blueprint for European Exascale Research Data and Compute Infrastructure
The SPECTRUM project (https://spectrumproject.eu/), funded under Horizon Europe, presents its final deliverables: the Strategic Research, Innovation and Deployment Agenda (SRIDA) and the Technical Blueprint for a European compute and data continuum serving data-intensive science communities.
The SRIDA is structured around four pillars encompassing 13 strategic priorities spanning technical enablement, scientific operations, and strategic governance. Each priority includes implementation pathways with short-, medium-, and long-term milestones aligned with European research infrastructure strategies.
The Technical Blueprint presents a capability map defining eight areas for the compute and data continuum (compute resources, data resources, software distribution and execution, orchestration and workflows, AI/ML and HPC applications, resource federation, monitoring and observability, and security and trust) and identifies key technical challenges with recommended actions to address them.
Together, the SRIDA and Technical Blueprint provide a coherent strategic and technical foundation for coordinated infrastructure development across European research communities.
Both documents respond to the unprecedented data processing demands facing High Energy Physics and Radio Astronomy as they enter the Exascale era with next-generation instruments including HL-LHC upgrades and SKA. The contribution discusses how these strategic and technical frameworks can guide European infrastructure investments and inform future funding programmes targeting the 2030s research landscape.
Speaker: Sergio Andreozzi -
330
An Integrated Management System for Data Processing of the AliCPT Primordial Gravitational Wave Telescope
The efficient and stable operation of the data processing pipeline is fundamental to the success of primordial gravitational wave telescopes like the Ali CMB Polarization Telescope (AliCPT). However, the management of its heterogeneous computing and hardware ecosystem—servers, virtual machines, storage systems, and the remote observatory environment at the high-altitude site in Tibet—poses a significant challenge. Traditional approaches lack a unified platform for integrated resource provisioning, real-time status visualization, coordinated environmental monitoring, and remote control, leading to operational inefficiencies and potential risks to data integrity. To address this, we have designed and implemented an integrated management system specifically for AliCPT. The system is built upon a Proxmox-based virtualization layer that abstracts physical resources. Its core comprises three management modules: (1) a self-service portal with approval workflows for user resource application and administrative governance; (2) an integrated data panel that visualizes performance indicators of virtual machines, physical machines, and environmental sensor data, and implements outlier grading alarms based on intelligent rules to achieve centralized monitoring and management of multi-source heterogeneous state data; and (3) a unified control interface that integrates a precise commanding interface for issuing control signals to physical facilities, and a interface for the full lifecycle management of virtualized resources. Currently under active deployment and testing on the AliCPT platform, this system is poised to become a central tool for enhancing the operational coherence, reliability, and management efficiency of the entire data processing infrastructure.
Speaker: Siqi Hou -
331
CMS Tier-0 Performance in Run-3 under Increased Luminosity and Throughput
The CMS Tier-0 system is responsible for the prompt processing and distribution of data collected by the CMS experiment. During Run 3, the LHC delivered almost twice the luminosity of Run 2, while the CMS physics program intensified and diversified year by year, resulting in an average data rate of up to 12 GB/s and a total RAW data volume of 110 PB so far. Higher load places increased pressure on the Tier-0 system, making changes necessary to cope with the higher throughput. The sustained high data rates require increased use of CPU resources up to 150 k cores, using the worldwide computing resources available to CMS in order to promptly reconstruct the data. Furthermore, the higher data volumes require increased utilization of disk and tape resources, as well as improvements to the workload management system. In this work, we demonstrate the CMS Tier-0 performance for proton-proton, light-ion, and heavy-ion collisions during Run 3. We show how Tier-0 scaled up to cope with increased pressure by utilizing record-breaking CPU resources, expanding its disk and tape storage capacities, and adapting its workload management system accordingly.
Speaker: Antonio Linares (CERN)
-
326
-
Track 7 - Computing infrastructure and sustainability
-
332
Can We Fill a 400 Gbps WAN?
The Czech WLCG Tier-2 reliably delivers computing and storage pledges to the LHC experiments through a geographically distributed infrastructure. CZ-Tier-2 resources are deployed across three sites and interconnected by high-capacity links provided by the Czech NREN, CESNET. In addition, significant CPU capacity from the Czech national supercomputing center IT4I is integrated into WLCG operations via the main CZ-Tier-2 hub at FZU, which acts as the central distribution and data-exchange point.
To meet the increasing wide-area network (WAN) demands expected for HL-LHC, the external connectivity of the FZU site has undergone a major upgrade. During the WLCG Data Challenge in February 2024, the site operated with 100 Gbps connectivity to LHCONE complemented by a 40 Gbps generic Internet link. This has since evolved into a 400 Gbps LHCONE connection plus an additional 100 Gbps external link, enabling significantly higher throughput for data movement and remote access workflows.
In this contribution, we present a series of performance and stress tests of the LHCONE connection carried out shortly after the upgrade in 2025. The initial measurements were performed using available storage servers equipped with 2×25 Gbps network interfaces (or slower). We later complemented these tests with dedicated measurements from a single perfSONAR host equipped with a 400 Gbps NIC.
We demonstrate that CZ-Tier-2 can effectively exploit the upgraded connectivity. The results indicate that the site is well prepared for the increased WAN traffic and data distribution patterns anticipated in the HL-LHC era.
Speaker: Jiri Chudoba (Czech Academy of Sciences (CZ)) -
333
Research on Network Performance Anomaly Detection Technology for Raw Data Transmission in High-Energy Physics
High-energy physics experiments generate massive volumes of raw data every day. This data needs to be transmitted to offline data centers for analysis and processing, and the efficiency of data transmission directly affects the output of scientific results. However, data transmission efficiency depends not only on the data transmission system itself but also on multiple subsystems, including the DAQ system, file storage system, network communication system between the data transmission system and DAQ storage, and dedicated network lines between the experimental site and offline data centers. The JUNO experiment officially started data taking at the end of 2024, producing approximately 8TB of raw data per day, all of which needs to be transmitted to the offline computing platform in Beijing in real time. Nevertheless, in actual data transmission processes, unstable data transmission performance may occur, and current monitoring methods cannot quickly locate the root causes. Therefore, there is an urgent need to develop a system capable of rapidly identifying the issues affecting raw data transmission performance. This report proposes a network performance anomaly detection method for raw data transmission in high-energy physics. Through network session collection, collation, and analysis, an anomaly detection model based on ensemble attention and temporal memory identifies abnormal sessions. Additionally, combined with the iterative in-depth analysis method and IHEP’s OpenDrSai, an intelligent agent for network anomaly classification is implemented. Experiments show that this method can achieve accurate identification and diagnosis of network performance problems in JUNO’s raw data transmission.
Speaker: 曾珊 zengshan -
334
Distributed Performance Testing of High-Speed Scientific Networks in Preparation for Exabyte Scale Workflows
The next generation of scientific experiments, particularly those found in high energy and nuclear physics, will produce unprecedented data volumes which will push scientific computing infrastructures to rely on terabit-scale networks for rapid, reliable data movement between globally distributed facilities. In parallel, advances in artificial intelligence continue to significantly increase data transfer requirements between sites. Evaluating whether these networks can meet future demands requires performance measurements that go beyond single-host tests. While existing tools effectively measure maximum throughput between individual servers, they cannot assess end-to-end performance across entire computing centers, which depends on coordinated, parallel traffic from multiple nodes.
To address this limitation, we present a distributed load-generation method designed for large-scale, site-level network evaluation. Our approach dynamically deploys lightweight software containers across multiple computing nodes, each capable of generating, coordinating, and monitoring high-volumes of artificially generated network traffic based on user-specified parameters. This enables realistic, scalable testing that reflects the distributed data movement patterns expected across a wide range of upcoming scientific projects.
We outline the system architecture, orchestration workflow, and mechanisms that allow the framework to scale to large numbers of parallel traffic sources. We demonstrate that this method provides a more accurate characterization of inter-site network performance compared to traditional single-host tools. Preliminary results using USCMS computing sites as a test case highlight its effectiveness in revealing bandwidth limitations, parallel flow behavior, and other wide-area network characteristics that are not observable with existing measurement approaches.
Speaker: Lael Verace (University of Wisconsin-Madison (US)) -
335
Investigating Routing Anomalies and Performance Degradation in WLCG Networks (Case Studies)
We present a series of case studies analyzing real-world network incidents within the WLCG infrastructure using traceroute and performance data from perfSONAR. Our methodology combines path-based anomaly detection with latency and throughput monitoring to identify routing disruptions, topological changes, and their correlation with performance degradation. The approach highlights common patterns such as persistent path inflation, detours via non-optimal transit networks, and silent degradations observable only through structural path analysis.
All case studies are linked to operator-confirmed events, demonstrating how integrated data analytics can support incident diagnosis and monitoring. We also introduce a community-maintained log of known or suspected incidents to foster collaborative validation. This work underscores the operational benefits of proactive, data-driven approaches to network reliability in large-scale distributed infrastructures like WLCG.
Speaker: Petya Vasileva (University of Michigan (US)) -
336
WLCG Mini-Capability Challenge: Host Tuning to Improve WAN Data Transfers
Efficient wide-area data transfers are vital for LHC and multi-site scientific workflows, but host-level configuration, encompassing network, storage, and CPU/memory resources, often constrains end-to-end performance. We present the results of a WLCG mini-capability challenge focused on host optimization using modern systems (RHEL 9, 25+ Gbps NICs, NVMe/SSD storage) across seven ATLAS and CMS production sites: FNAL, UCSD, UNL, BNL, AGLT2, MWT2, NET2, and Vanderbilt. Our approach integrates ESnet Fasterdata best practices for network tuning (TCP buffers, packet pacing, NIC offloads, ring buffers), storage optimization (I/O scheduler, queue depth, NUMA affinity), and automated state management using a new script fasterdata-tuning.sh (JSON-based save/diff/restore).
We conduct controlled baseline and tuned experiments, including synchronized global configuration sweeps, using representative transfer protocols (XRoot, HTTPS), diagnostic tools (perfSONAR, iperf3), and storage benchmarks (fio). Key metrics include transfer throughput, completion time, host CPU utilization, %iowait, and error/retransmit rates, analyzed with statistical confidence. The study prioritizes reproducibility and evaluates operational compatibility with dCache, Xrootd, EOS, and ongoing production workloads.
Outcomes will provide concrete, site-level tuning recommendations and an objective assessment of host optimization as a potential WLCG best practice. Results, methodology, site experiences, and deployment advice will be presented at CHEP 2026.
Speaker: Shawn Mc Kee (University of Michigan (US)) -
337
Entanglement Distribution with Quantum–White Rabbit Coexistence over Metropolitan Distances
Demonstrating the distribution of entangled photon pairs is a key step toward large-scale quantum networks, which could interconnect future quantum computers and form the foundation of a quantum internet. A major challenge in long-distance quantum communication is coping with varying conditions in deployed optical fibers. When a classical signal co-propagates with single photons in the same fiber, it experiences identical transmission conditions and can therefore serve as a real-time probe of the link.
Within CERN’s Quantum Technology Initiative, we extend the role of the White Rabbit Precision Time Protocol beyond sub-nanosecond synchronization, which is critical for coordinating distributed physics experiments and for linking future quantum computers. By using wavelength division multiplexing, classical White Rabbit signals are co-propagated with single photons and used to monitor the fiber conditions such as polarization drifts and timing fluctuations.
Here, we demonstrate the distribution of polarization-entangled photon pairs in the telecom O-band coexisting with a White Rabbit signal in the C-band over a 30 km internal CERN fiber.
In collaboration with partners from the Geneva Quantum Network (UniGe, HEPIA, ID Quantique, and Rolex), this work will be extended to a deployed metropolitan fiber link between CERN and Geneva, exploring multiple White Rabbit wavelengths across the C-band. These results show that precise time synchronization and quantum entanglement distribution can coexist in the same optical fiber, supporting the integration of quantum communication with existing telecommunication infrastructure and enabling the interconnection of future quantum computing nodes.
Speaker: Marian Babik (CERN)
-
332
-
Track 9 - Analysis software and workflows
-
338
ROOT’s New Histograms
Many HEP analyses rely on histograms for statistical interpretation of the experimental data, and use them as data structures that can be computed with, in addition to their visual aspects. ROOT’s histogram package was developed in the 90’s and has been widely used during the past 30 years. Despite its success, the design starts to show limitations for modern analyses and the classes lack some capabilities expected from modern histograms, such as user-defined bin content types and efficient concurrent filling. In this contribution, we will discuss ROOT’s new histograms and compare their features to existing packages. We will present the current status and highlight the key concepts. We will also discuss performance results and its ongoing integration into RDataFrame and experiment frameworks. Finally, we will comment about release timelines and future plans.
Speaker: Jonas Hahnfeld (CERN & Goethe University Frankfurt) -
339
Histogramming as a Service
The community's adoption of Hist and boost-histogram, both part of the Scikit-HEP software stack, leads to increasingly frequent work with dense, high-dimensional histograms. These histograms become a memory bottleneck in modern large-scale high-energy physics (HEP) analyses because they become exceedingly large due to the cartesian product of all axes.
To solve this problem, we propose “Histogramming as a Service” for large-scale HEP analyses. The core idea is to offload the filling of histograms from each worker in a distributed environment (e.g., batch systems) to a single dedicated server. This significantly reduces the overall memory requirements, as not every worker needs to maintain a copy of a histogram; instead, a central histogram is stored on the server.
Histogramming as a Service offers other advantages besides reducing memory usage: Filling the server-side histogram remotely can be a non-blocking operation, allowing the rest of the HEP analysis to continue while the remote histogram is being filled. In Dask workflows, this also eliminates the need to reduce each worker's output histogram, which otherwise could lead to unexpected memory spikes during accumulation. Finally, alternative histogram implementations can be served that, for example, enable direct filling on-disk, thereby effectively eliminating scaling limitations on histogram sizes.
We will introduce the concept of Histogramming as a Service, discuss its implementation design, and present large-scale benchmarks measured at the coffea-casa computing infrastructure.Speaker: Manfred Peter Fackeldey (Princeton University (US)) -
340
Harnessing JAX for a Differentiable Analysis Setup
In high energy physics (HEP), the measurement of physical quantities often involves intricate data analysis workflows that include the application of kinematic cuts, event categorization, machine learning techniques, and data binning, followed by the setup of a statistical model. Each step in this process requires careful selection of parameters to optimize the outcome for statistical interpretation.
This presentation introduces a differentiable approach to the data analysis workflow utilizing the python package evermore for statistical model building. Built on top of JAX, the models created in evermore benefit from automatic differentiation. By leveraging this feature alongside neural networks, we can apply optimization across all stages of the analysis. This method allows for a more systematic selection of parameter values while also ensuring that the optimization process accounts for systematic uncertainties included in the analysis. We apply this approach to a CMS analysis targeting the production of a Higgs boson in association with one or two top quarks and demonstrate how each individual step can be implemented in a differentiable manner. A setup for a differentiable analysis workflow is presented.
Speaker: Felix Philipp Zinn (Rheinisch Westfaelische Tech. Hoch. (DE)) -
341
Doppio: Differentiable Optimization for Pair Peak Identification
Weakly-supervised methods in the CWoLa (Classification Without Labels) family enable anomaly searches without truth labels by training classifiers on proxy objectives in data. However, these approaches require high-purity control regions which place assumptions on the signal and in practice are difficult to obtain. In addition, many include a number of disjoint steps, making it difficult to reach optimal performance or remove intrusive biases. Rather than optimizing region-label classification, we present Doppio (Differentiable Optimization for Pair Peak Identification), an approach where, leveraging only the assumption of a signal shape, the classifier's training objective is derived directly from the physics goal: evidence for a signal bump. Leveraging automatic differentiation in JAX, Doppio constructs differentiable histograms in classifier-defined pass/fail regions and fits parametric signal+background models within the training loop. The loss is the difference in fit quality (χ²) between background-only and signal+background hypotheses—directly measuring "how much does adding a bump improve the fit?" This approach extends to all forms of hypothesis tests and can also be used to mitigate systematic uncertainties in supervised measurements, achieving near-optimal performance.
Speaker: Andrzej Novak (Massachusetts Inst. of Technology (US)) -
342
GRAEP: A framework for smart gradient-based optimization of HEP analyses
We present GRAEP (Gradient-based End-to-End Physics Analysis), a JAX-based framework for building modular, end-to-end differentiable analysis pipelines in high-energy physics. The framework integrates tooling from the Scikit-HEP ecosystem and enables gradient-based optimisation across HEP analysis workflows. We demonstrate an end-to-end differentiable analysis applied to CMS Open Data, covering event selection, observable construction, differentiable histogramming, and likelihood-based inference in a signal extraction setup. The example reflects a realistic CMS-like analysis setup, using structured analysis code and binned statistical models rather than simplified toy problems. We discuss the treatment of non-differentiable operations commonly encountered in HEP analyses, comparing different strategies for making discrete operations differentiable, including continuous relaxations and probabilistic formulations, along with their trade-offs in terms of stability, faithfulness, and computational cost. We also consider the treatment of systematic uncertainties in the workflow, including both their incorporation in the statistical model and uncertainties arising from the gradient-based optimisation procedure. This work provides a concrete reference for end-to-end differentiable analyses in HEP and illustrates how gradient-based methods can complement traditional analysis workflows.
Speakers: Lino Oscar Gerlach (Princeton University (US)), Mohamed Aly (Princeton University (US)) -
343
The latest developments of Combine tool
The Combine tool [1] is a statistical analysis software package developed by the CMS Collaboration for performing measurements and searches in high-energy physics. Originally created for Higgs boson searches and their statistical combination, it has evolved into a comprehensive framework used in the majority of CMS analyses. Built on ROOT and RooFit [2], Combine provides a command-line interface and uses human-readable configuration files called "datacards" to construct complex statistical models for likelihood-based inference.
To address the computational challenges posed by increasingly large datasets and sophisticated analyses in future LHC runs, recent development efforts in Combine have focused on several key aspects including performance optimization, interoperability and ease of use. Performance improvements center on integrating RooFit's automatic differentiation (AD) capabilities through Clad, a source-transformation-based AD tool that automatically generates derivative code for C++ functions [3]. Complementing this, ongoing work extends Combine classes to provide enhanced code generation support. Regarding interoperability, effort is directed toward implementing support for HS3 (HEP Statistics Serialization Standard) [4], an initiative that defines standards for statistical procedures and results in HEP using human- and machine-readable JSON representations, aiming to enable framework-independent analysis and facilitate likelihood publication. On the ease-of-use front, improvements target the API design, expanded test coverage, and particularly installation and distribution workflows, including the recent availability of Combine through conda-forge, thereby simplifying deployment and accessibility for the broader HEP community.
This contribution describes in detail the above-mentioned ongoing developments that make Combine the main statistical tool for the present and future of the CMS collaboration.
[1] https://link.springer.com/article/10.1007/s41781-024-00121-4
[2] https://root.cern/
[3] https://arxiv.org/abs/2304.02650
[4] https://github.com/hep-statistics-serialization-standard/hep-statistics-serialization-standardSpeaker: Tom Runting (Imperial College (GB))
-
338
-
-
-
Plenary
-
344
Next-Generation Triggers: Rethinking Trigger Tracking for the HL-LHC in ATLAS and CMS
The High-Luminosity LHC (HL-LHC) will impose unprecedented demands on event reconstruction, driven by extreme pile-up conditions, increased detector granularity, and stringent latency constraints. In this environment, track reconstruction stands out as one of the most critical and computationally challenging components of future trigger systems, directly impacting physics performance and the efficiency of trigger-level event selection.
To address these challenges, the CERN Next Generation Triggers project is devoting significant effort to rethinking trigger-level track reconstruction algorithms, software architectures, and computing models, with the goal of ensuring scalability, robustness, and sustained physics performance throughout the HL-LHC era. A key aspect is the increased tight coupling between online and offline track reconstruction, which blurs the traditional boundaries between them and have broader implications for WLCG preparation for HL-LHC.
This presentation highlights selected, concrete developments in trigger-level track reconstruction within the ATLAS and CMS experiments, focusing on innovative solutions tailored to HL-LHC conditions. Topics include the evolution of tracking strategies, algorithmic simplifications and refactorings driven by compute resources constraints, the adoption of heterogeneous computing architectures such as GPUs and FPGAs, as well as new approaches based on machine learning techniques.
Speaker: Noemi Calace (CERN) -
345
Status and Prospects of the HEP Statistical Inference Ecosystem
Statistical inference is a crucial part of HEP analyses. Historically based on RooFit and RooStats, the statistical tools used by the experiments are now facing unprecedented challenges, such as the rapidly growing complexity of statistical models - involving hundreds of parameters of interest and thousands of nuisance parameters - the need for scalable performance in large likelihood minimizations, and the demand for interoperability across an increasingly diverse ecosystem of tools, computational hardware, and frameworks and libraries, especially the ones developed and used within the machine learning world.
This talk summarizes status and future plans for the statistical tools used by some of the main LHC experiments (CMS, ATLAS), with a focus on improvements coming from the ROOT world (Roofit automatic differentiation), interoperability with modern libraries (JAX) and communication across frameworks (HS3).Speaker: Massimiliano Galli (Princeton University (US)) -
346
Analysis Productions: an exascale analysis data processing and management service for LHCb
Analysis Productions is a declarative n-tupling service which has processed over 1 exabyte of LHCb data since 2024 with the DIRAC Transformation System. It is the primary method for producing LHCb ntuples for analysis and has produced approximately 50M files.
Since the start of Run 3 the demand for n-tuples increased dramatically, with 22k samples created in 2025 alone, which led to significantly increased workload on Grid resources, notably storage, while intensifying operational requirements. Meeting this scaling challenge has led to increased automation with sensible checks and balances to ensure resources continue to be used efficiently and responsibly.
Analysis Productions rises to the challenge with a comprehensive suite of capabilities. A feature-rich web interface provides a user-friendly platform for browsing continuous integration test results and managing LHCb's extensive collection of n-tuples, substantially reducing prototyping overhead for analysts.
Beyond standard LHCb n-tupling workflows, the service supports configuring and running custom n-tuple filtering and transformation steps at massive scale with ROOT RDataFrame, among countless other possibilities.
To ensure responsible storage utilisation, sample lifecycle management features provide accountability for working groups and sample owners to ensure timely cleanup of obsolete n-tuples, with automated role and permission assignment through LbFence (Glance) integration.Finally, preservation by default ensures permanent metadata retention with automatic archival to tape storage for n-tuples linked to a publication, safeguarding long-term reproducibility.
This talk demonstrates how Analysis Productions functions as a production-ready exascale analysis facility, minimising operational burden while providing an intuitive, continuously evolving interface for analysts and working groups to seamlessly produce, monitor, access, and manage n-tuples for their analysis.
Speaker: Chris Burr (CERN)
-
344
-
10:30
Break
-
Plenary
-
347
Production-quality, high-throughput detector simulation with GPUs
The computational cost of full Monte Carlo simulation in high-energy physics is rapidly increasing, particularly in view of the high-luminosity LHC upgrades. At the same time, modern high-performance computing systems are increasingly based on heterogeneous architectures, motivating efforts to enable full detector simulations on GPUs. The AdePT and Celeritas projects have now accelerated production-quality simulations on GPU, achieving substantial runtime reductions while preserving physics accuracy, and indicating that the usage of GPUs can be more energy-efficient. However, the transition to GPU-based parallel tracking introduces new constraints on user applications, particularly in areas such as Monte Carlo truth handling. We present the latest results, discuss the remaining challenges, and outline the path toward production deployment.
Speaker: Severin Diederichs (CERN) -
348
DUNE Computing & Prompt Processing
The Deep Underground Neutrino Experiment (DUNE) will produce a very large amount of raw data from its Far Detector (FD) as it turns on at the start of the next decade: roughly 30 PB/yr from the first two FD modules. DUNE’s current processing paradigm would necessitate a large amount of both disk space and compute time to process this raw data up to the point at which high-level event reconstruction can be applied. We propose a new “Prompt Processing” paradigm, in which we leverage a mix of High Performance Computing and heterogeneous (CPU/GPU) infrastructure to perform efficient signal processing of DUNE’s raw readout. This signal processing estimates the amount of ionization charge reaching the detectors’ readout elements at a given point in time, and is thus a necessary first step to processing FD readout. A part of this signal processing is the identification of signal Regions-of-Interest (ROIs) within the estimated signal waveform, which would provide on the order of 10-times data reduction. Thus, running this signal processing promptly (within one hour of a file’s lifetime on disk) and using GPU-accelerated algorithms allows us to treat the raw data as “Write Once, Read Never”. In other words, we can write the raw data to tape almost immediately (and leave it there until planned reprocessing), providing DUNE with a reduction both in compute time and disk utilization. This presentation will describe the overall design of the Prompt Processing workflow, our use of GPU-accelerated algorithms to parallelize our signal processing, and a fully Machine-Learning-based ROI finding algorithm which extends the current state-of-the-art in ROI finding.
Speakers: Amit Bashyal (Brookhaven National Laboratory), Jacob Calcutt (Brookhaven National Laboratory (US)) -
349
From quarks to quasars: unifying the universe through scalable computing
Large-scale, data-intensive research is no longer exclusive to high energy physics. Astronomy, gravitational wave physics, nuclear physics, and many more scientific fields now face comparable challenges in data management, distributed computing, and virtual research environments. With the technological landscape increasingly evolving towards centralised and specialised facilities, communities that rely on a distributed computing model must build bridges that enable sharing technical solutions while preserving their domain-specific needs.
Driven by the challenges that research infrastructures will increasingly face, and inspired by the Worldwide LHC Computing Grid Data Challenges, more than five research infrastructures (including ET, SKAO, CTAO, and KM3NeT, with support from ATLAS and CMS) joined forces to create a forum for collective learning and shared experience. ESCAPE xRIDGE (European Science Cluster of Astronomy & Particle Physics ESFRI Cross-RI Distributed Grand Exercise) is a coordinated series of data and analysis challenges designed around the scientific and technical priorities of each participating community.
xRIDGE aims to enable infrastructures at different stages of maturity to advance their own goals while leveraging a partly shared infrastructure that provides a common operational environment and fosters a cross-community dialogue, presenting a unified front in interactions with computing centres.
This contribution presents the xRIDGE framework and initial results from the first exercise, carried out at a scale consistent with the data and computing ambitions of the HL-LHC era. The focus is specifically on the concrete progress across participating infrastructures, and on how this collaborative process may establish the foundations of a cross-disciplinary competence centre.Speaker: Giovanni Guerrieri (CERN)
-
347
-
12:30
Lunch
-
Track 1 - Data and metadata organization, management and access: Compression and I/O optimization
-
350
Characterizing Event I/O Access Patterns in HEP Workflows Using Darshan
Efficient data access is becoming increasingly important for high-energy physics (HEP) workflows on HPC systems. Large datasets, a greater degree of concurrency (multi-process and multithreading), and complex event formats can lead to hidden performance issues. The HEP-CCE/SOP group used the Darshan I/O characterization tool to identify data re-operations in representative HEP workflows, using ATLAS and CMS production workflows as case studies, to quantify access locality and measure the impact of repeated I/O on job walltime. Complementary instrumentation of ROOT-based file formats (TTree and RNTuple) enables variable and event range level analysis, revealing access patterns that inform content slimming and restructuring strategies.
To detect regressions over time, we extend the ATLAS release-level performance monitoring with continuous re-operation metrics, exposing inefficiencies introduced by software changes and configuration defaults. Initial studies across multiple HPC platforms demonstrate observable correlations between access entropy, cluster granularity, and end-to-end runtime.
This work provides a scalable methodology for detecting and diagnosing I/O bottlenecks, guiding workflow optimization, and improving resource utilization of HEP experiments as data volumes and HPC concurrency continue to grow in the exascale era and beyond.Speaker: Wesley Patrick Kwiecinski (University of Illinois Chicago) -
351
Real-Time I/O Traffic Shaping in EOS
As the scale and complexity of high-energy physics computing grows, storage systems are being pushed to serve radically diverse workloads at once, often with significant performance consequences. To ensure EOS can meet these evolving demands, we introduce a real-time I/O traffic-shaping framework that monitors ongoing I/O patterns and dynamically adjusts and balance read/write flows to maintain stable, efficient throughput, ensuring critical workflows continue to be served reliably even under heavy contention.
The concept introduces a lightweight, distributed I/O monitoring layer that collects detailed metrics directly from each storage node. These local measurements are periodically aggregated by the MGM namespace service, where they are analysed to compute bandwidth usage, IOPS distribution, and emerging congestion patterns. From this global view, shaping parameters - such as rate limits per application or per user, and prioritization hints - can be dynamically computed and propagated back to the nodes
This architecture forms the basis for a closed loop in EOS: monitoring, aggregation, and shaping work together to regulate throughput, improve fairness, and protect interactive responsiveness under load. The implementation uses open standards and efficient serialization formats (e.g., Protocol Buffers) to enable easy communication and future integration with machine learning or policy-driven optimization frameworks.
Ultimately, the real-time I/O shaping framework aims to transform EOS from a reactive storage system into an adaptive, self-regulating storage service capable of maintaining performance and predictability across diverse and fluctuating workloads.
Speaker: Gianmaria Del Monte (CERN) -
352
Efficient data movement for Machine Learning inference in heterogeneous CMS software
Efficient data processing using machine learning relies on heterogeneous computing approaches, but optimizing input and output data movements remains a challenge. In GPU-based workflows data already resides on GPU memory, but machine learning models requires the input and output data to be provided in specific tensor format, often requiring unnecessary copying outside of the GPU device and conversion steps. To address this, we present an interface that allows seamless conversion of Structure of Arrays (SoA) data into lists of PyTorch tensors without explicit data movement. Our approach computes the necessary strides for various data types, including scalars and rows of vectors, matrices, allowing PyTorch tensors to directly access the data on the GPU memory. The introduced metadata structure provides a flexible mechanism for defining the columns to be used and specifying the order of the resulting tensor list. This user-friendly interface minimizes the amount of code required, allowing direct integration with machine learning models. Implemented within the CMS computing framework and using the Alpaka library for heterogeneous applications, this solution significantly improves GPU efficiency. By avoiding unnecessary CPU-GPU transfers, it accelerates model execution while maintaining flexibility and ease of use.
Speaker: Felice Pantaleo (CERN) -
353
Boa Constrictor: A Mamba-based Lossless Compressor for High Energy Physics data
The petabyte-scale data generated annually by High Energy Physics (HEP) experiments like those at the Large Hadron Collider present a significant data storage challenge. Whilst traditional algorithms like LZMA and ZLIB are widely used, they often fail to exploit the deep structure inherent in scientific data. We investigate the application of modern state space models (SSMs) to this problem, which have shown promise for capturing long-range dependencies in sequences. We present the Bytewise Online Autoregressive (BOA) Constrictor, a novel, streaming-capable lossless compressor built upon the Mamba architecture. BOA combines an autoregressive Mamba model for next-byte prediction with a parallelised streaming range coder. We evaluate our method on three distinct structured datasets in HEP, demonstrating state-of-the-art compression ratios, improving upon LZMA-9 across all datasets. These improvements range from 2.21$\times$ (vs. 1.69$\times$) on the ATLAS dataset to a substantial 44.14$\times$ (vs. 27.14$\times$) on the highly-structured CMS dataset, with a modest $\sim 4.5$MB model size. However, this gain in compression ratio comes with a trade-off in throughput; the Storage-Saving Rate ($\sigma_{SSR}$) of our prototype currently lags behind highly-optimised CPU-based algorithms like ZLIB. We conclude that while this Mamba-based approach is a highly promising proof-of-principle, significant future work on performance optimisation and hardware portability is required to develop it into a production-ready tool for the HEP community.
Speaker: Mr Akshat Gupta -
354
Impact of Compression Algorithms on I/O performance for ATLAS MP Derivation Workflows in new ROOT data formats for analysis (RNtuple)
The ATLAS experiment at the CERN Large Hadron Collider (LHC) records and processes large amounts of data from proton-proton collisions. With the upcoming High-Luminosity LHC (HL-LHC), the data volume is expected to increase by more than an order of magnitude, posing new challenges for storage, data throughput, and analysis scalability.
Currently, all major production output formats support RNTuple. Performance studies using ATLAS data have already demonstrated substantial improvements in space usage and I/O performance.The main goal of this work is to explore the potential benefits of switching from the default LZMA compression to alternative compression algorithms such as ZSTD or LZ4 and to study their impact on both file size and I/O throughput for AOD data. In this study, we focus specifically on AOD reading during the processing into derived, smaller data formats. This process is I/O-intensive and most frequently executed in multiprocessing mode workflows in ATLAS production. Our priority is to determine whether using different compression algorithms for the AOD stored in the RNTuple layout can provide measurable improvements in input throughput, and how these throughput changes compare to the corresponding differences in file size.
This work contributes to the ongoing effort to prepare ATLAS computing for the data-intensive HL-LHC and will be critical for supporting large-scale data analysis.
Speaker: Bralyne Matoukam (University of the Witwatersrand)
-
350
-
Track 2 - Online and real-time computing
-
355
Mitigating Deadtime in Distributed Optical Arrays: A Liveness-Aware Trigger Approach for High-Energy Neutrino Detection
Large-scale neutrino observatories operate under unavoidable detector deadtime arising from photomultiplier saturation, digitizer limits, and front-end readout constraints. Conventional coincidence-based trigger logic implicitly assumes continuous sensor availability and therefore suffers systematic efficiency loss when channels become temporarily non-live. This work presents the design of a liveness-aware trigger architecture targeting FPGA deployment in distributed optical arrays. We introduce a recursive Infinite Impulse Response (IIR) update law designed as a low-latency synthesizable pipeline that constructs a continuity-preserving effective observable at each sensor node. Rather than collapsing during non-liveness intervals, the observable decays smoothly, retaining phase and amplitude information relevant for network-level coherence estimation. The trigger architecture explicitly separates continuous measurement construction from discrete decision logic, enabling graceful degradation under partial non-liveness. Performance is evaluated within a hybrid validation framework that combines event topologies from IceCube Open Data with a hardware-accurate parametric signal model that explicitly captures PMT saturation and front-end digitizer limits. Simulation results demonstrate that the proposed trigger sustains $>90\%$ event recovery efficiency at $20\%$ deadtime probability, a regime where conventional coincidence logic degrades to below $40\%$. Furthermore, the continuity-preserving observable yields a two-order-of-magnitude improvement in effective Signal-to-Noise Ratio (SNR), enabling robust detection even under severe saturation. This method provides a robust foundation for next-generation firmware-level trigger strategies.
Speaker: Mr Thammarat Yawisit (King Mongkut's Institute of Technology Ladkrabang) -
356
Real-Time Uncertainty Quantification for Jet Tagging Models on the Level-1 Trigger
Current deep learning based models at the LHC produce deterministic point estimates without any accompanying measure of epistemic uncertainty. Without this information, the system cannot determine when its predictions may be unreliable, particularly in rare or weakly sampled regions of feature space. This work introduces a high performance Bayesian Neural Network architecture for the Level-1 Trigger that replaces fixed weights with learned probability distributions, enabling real-time uncertainty quantification alongside standard classification. The resulting predictive variance provides an online indicator of model reliability, improving score calibration by reducing expected calibration error by over 70%. To assess the computational feasibility of real time inference, we provide a FPGA implementation and show that the Bayesian components add only a modest ~15% latency overhead while maintaining a total inference time of under 100 nanoseconds. The design therefore remains fully compatible with Level-1 trigger constraints while delivering reliable uncertainty estimates in real time.
Speaker: Tarik Ourida -
357
Recursive Manifold Coherence: A Geometric Framework for Deadtime Recovery in Distributed Trigger Systems
Large-scale neutrino observatories operate under unavoidable detector deadtime and signal pile-up, leading to systematic inefficiencies in conventional coincidence-based trigger systems. Such triggers typically rely on binary temporal windows and assume continuous sensor availability, causing partial or complete loss of correlated signal information during non-live intervals. We introduce Recursive Manifold Coherence (RMC), a geometric framework that reformulates distributed trigger logic as a continuous state estimation problem in a low-dimensional information space defined by correlated charge and timing observables. Instead of applying hard vetoes during deadtime, the proposed method employs a recursive update rule that propagates a coherence state across sensor nodes, allowing partially obscured signals to be retained and evaluated consistently. Using simulation studies representative of large optical detector arrays, we demonstrate that RMC successfully recovers event-level coherence for high-multiplicity topologies even when direct coincidence chains are broken. By treating the detector response as a smooth manifold rather than discrete hits, the framework achieves superior robustness against data fragmentation compared to standard binary logic. The framework is detector-agnostic and compatible with software-defined trigger pipelines, providing a flexible foundation for deadtime-aware analysis and triggering strategies in future distributed detector systems.
Speaker: Mr Thammarat Yawisit (King Mongkut's Institute of Technology Ladkrabang) -
358
New trigger algorithms for detecting Long-Lived particle decays inside the LHCb magnet region
The reconstruction of particle decays inside LHCb’s dipole magnet region enables novel measurements of hyperon decays and sensitive searches for long-lived particles with lifetimes above 100 ps, relevant both to the Standard Model and to many of its extensions. Reconstructing such displaced vertices using only track segments in LHCb’s outermost tracker (SciFi) is challenging due to limited momentum resolution, short lever arms, and the need to extrapolate tracks through a strong, inhomogeneous magnetic field around the decay vertex. A new fast-extrapolation strategy has been introduced in the second level of LHCb’s fully software trigger to detect these decays, in which track states are precomputed and cached at multiple positions along the beam line, with trajectories described by cubic-spline interpolation between these anchors. This avoids expensive Runge-Kutta calculations and significantly improves displaced-vertex finding performance, increasing signal yields and purity without impacting throughput. In the first-level trigger, a new high-quality extrapolation is introduced. This approach overcomes GPU memory bottlenecks caused by random access to the magnetic-field map using GPU texture memory. The spatially local caching and hardware-accelerated interpolation substantially reduce extrapolation time, enabling the detection of displaced vertices up to 8 m from the interaction point. Together, these developments enhance the real-time reconstruction and selection of highly displaced vertices within both trigger stages, opening new opportunities for long-lived particle searches at LHCb.
Speakers: Izaac Sanderswood (Univ. of Valencia and CSIC (ES)), Volodymyr Svintozelskyi (Univ. of Valencia and CSIC (ES)) -
359
The phase-1 upgrade of the ATLAS level-1 calorimeter trigger
The ATLAS level-1 calorimeter trigger is a custom-built hardware system
that identifies events containing calorimeter-based physics objects,
including electrons, photons, taus, jets, and total and missing transverse energy.
In Run 3, L1Calo has been upgraded to process higher granularity
input data. The new trigger comprises several FPGA-based feature extractor modules,
which process the new digital information from the calorimeters and
execute more sophisticated trigger algorithms. The design of the
system will be presented along with an analysis of the improved
performance for identifying interesting proton-proton collisions in the increasingly
challenging Run-3 LHC pile-up environment, as well as in heavy ion collisions where
timing and noise effects are particularly challenging.Speaker: ATLAS Collaboration
-
355
-
Track 2 - Online and real-time computing
-
360
A Unified Interface to ML Runtimes for Inference across Heterogeneous Architectures
Machine learning approaches have been widely adopted across several areas of high-energy physics research, including simulations, anomaly detection, and trigger systems. Deploying machine learning in trigger systems requires inference approaches capable of processing data at enormous rates, often on the order of 10–100 thousand events per second while making real-time decisions about which events to retain. This, in turn, demands inference pipelines that are both extremely low-latency and highly reliable. While ONNX Runtime provides a portable solution for inference, several applications require specialized libraries. Libraries such as NVIDIA’s TensorRT and AMD ROCm’s MIGraphX provide highly optimized inference stacks; however, integrating them into existing workflows often requires significant effort to convert input data, manage data formats, and handle library-specific configurations. Although specialized libraries serve specific purposes for particular applications, their integration and long-term maintenance remain challenging.
Code-generation approaches such as SOFIE can also enable efficient inference, but their integration into workflows can be cumbersome, particularly when model architectures or input formats change.
We present a unified interface that enables inference across multiple target libraries without copying data or requiring configuration changes by the user. Acting as a wrapper, the interface allows input data to be referenced directly, avoiding data movement and additional user configuration. The interface also allows using the C++ code generated by SOFIE for inference in a load-and-use style. This design simplifies maintenance and helps ensure consistent inference performance as models evolve.
In its first iteration, the interface supports processing alpaka buffers, leveraging their abstraction for heterogeneous computing environments. Support for additional input formats can be added seamlessly based on user requirements. The interface consists of two main components: a processor, which runs offline to convert trained models into optimized plans tailored to the target inference library (e.g., converting an ONNX model into a
.planfile for TensorRT,.mrtfor MIGraphX, or.hxxin SOFIE), and a runtime interface, which loads these optimized plans and executes inference efficiently using the selected backend.Finally, we present benchmarking results for several models executed through our interface across different inference libraries, highlighting their respective performance characteristics. By providing a single unified interface, our approach reduces workflow complexity, minimizes data movement, and ensures that updates to model architectures or input formats can be accommodated with minimal overhead. This enables developers to deploy machine learning models across heterogeneous hardware and software stacks more efficiently and with lower maintenance effort.
Speaker: Sanjiban Sengupta (CERN, University of Manchester) -
361
When Less is More: Towards Lightweight and Distilled Graph Neural Networks and for Efficient Particle Reconstruction in LHCb’s Next-Generation Calorimeter
Graph Neural Networks (GNNs) excel at modeling the complex, irregular geometry of modern calorimeters, but their computational cost poses challenges for real-time or resource-constrained environments. We present lightweight, attention-enhanced GNNs built on node-centric GarNet layers, which eliminate costly edge message passing and provide learnable, permutation-invariant aggregation optimized for fast inference and firmware deployment. Tailored for particle reconstruction in the proposed PicoCal for the LHCb Upgrade II, these architectures achieve up to 8× faster inference than traditional message-passing GNNs while maintaining superior energy-resolution performance compared to conventional reconstruction algorithms.
To further reduce latency, we evaluate two compressed variants: a compact GarNet student with ~40% fewer parameters that preserves the teacher’s performance, and a knowledge-distilled MLP trained on GarNet’s latent graph embeddings—a Graph(GarNet)-to-MLP approach—that provides an additional 2–6× speedup and even surpasses the GarNet teacher in energy resolution despite a ~95% reduction in model size. Together with ongoing firmware-level integration for real-time filtering in the LHCb trigger system, this work demonstrates a practical and scalable pathway for deploying high-performance, graph-based calorimeter reconstruction in future high-rate particle-detection pipelines.Speakers: Cilicia Uzziel Perez (La Salle, Ramon Llull University (ES)), Irvin Jadurier Umana Chacon (Consejo Nacional de Rectores (CONARE) (CR)) -
362
Conceptual Design and Operation of the Calibration Loop for the Next Generation Triggers in the CMS Experiment
The Next Generation Triggers (NGT) initiative in CMS aims to enable the processing of all Level-1 Trigger accepted collisions for the HL-LHC. Central to this effort is the expansion of the High-Level Trigger (HLT) data scouting strategy, where events are reconstructed and stored in an analysis-ready format. This necessitates an in situ processing loop to derive high-quality calibration constants during data taking, as offline reprocessing is not possible. In order to showcase the feasibility of this plan, an NGT Demonstrator has been commissioned during LHC Run-3. We present the software implementation of a novel calibration loop for the NGT Demonstrator, and the design choices motivated by offline calibration infrastructure studies. The system architecture comprises three interacting Finite State Machines (FSMs) that automate the data flow through three intertwined processing cycles which follow CMS’s current Prompt Calibration Loop: incremental physics event reconstruction, aggregation of calibration statistics, and derivation of calibration constants. The final calibration constants are stored within a dedicated record in CMS’s conditions database to be consumed by the NGT Demonstrator. This enables a delayed re-HLT processing step with optimal calibrations after 8 hours of data buffering. Finally, by extending the existing infrastructure for online data quality monitoring in CMS for the HLT data scouting stream, we evaluate the effect of the rederived calibrations in the online reconstruction, demonstrating the improvement in the physics performance of the HLT event selection.
Speaker: Jessica Prendi (ETH Zurich (CH)) -
363
Towards AI-Driven Automation for Data Quality Monitoring in ALICE
The ALICE (A Large Ion Collider Experiment) is a general-purpose heavy-ion detector at the CERN Large Hadron Collider (LHC) that operates at interaction rates producing raw data streams of O(TB/s). Due to these data volumes, an online reconstruction is performed to achieve a compressed representation of the continuous data stream. Given the lossy nature of this process, early assessment of data quality and processing is critical. To address this challenge, the online Data Quality Monitoring (DQM) serves as a first line of defense against detector malfunctions and data corruption. This is performed through a combination of rule-based checks and continuous 24/7 visual inspection of monitoring objects, such as histograms and trends, by operators. This approach is subject to several limitations such as the operational cost associated with continuous human shift coverage, inherent human-factor constraints, and the increasing difficulty of defining checks under evolving detector configurations and running conditions. In this work, we explore semi-supervised and unsupervised machine learning techniques, including representation learning and embedding-based approaches, for anomaly detection in online DQM data from key ALICE sub-detectors. The goal is to reduce anomaly detection latency and enable automation of routine quality control tasks currently performed by operators. Preliminary results indicate that automated anomaly detection can be achieved while maintaining a low false discovery rate, demonstrating the potential of these approaches to support and enhance DQM.
Speaker: Zeta Sourpi (Universite de Geneve (CH)) -
364
PQuantML: A Tool for End-to-End Hardware-aware Model Compression
We present PQuantML, an open-source library for end-to-end hardware-aware model compression that enables the training and deployment of compact, high-performance neural networks on resource-constrained hardware in physics and beyond. PQuantML abstracts away the low-level details of compression by letting users compress models with a simple configuration file and an API call. It enables the use of pruning and quantization methods and supports layer-wise customization for compression parameters such as how many quantization bits are used for data, weights or biases, granularity of quantization, which pruning method to use, and whether pruning is disabled for a particular layer. It utilizes a global switch to enable or disable pruning or quantization, allowing the user to experiment with both, either jointly or individually. We demonstrate PQuantML on tasks such as the jet substructure classification (JSC) at the LHC using the hls4ml jet-tagging dataset, achieving substantial parameter reduction and bit-width reduction while maintaining accuracy.
Speaker: Roope Oskari Niemi
-
360
-
Track 3 - Offline data processing: Tracking 3
-
365
Offline quality track reconstruction on heterogeneous hardware with ACTS/traccc
During the last ten years the detector agnostic open source track reconstruction toolkit ACTS has matured to production level quality and is used in offline data taking in ATLAS, sPHENIX, FASER, and is part of many upgrade and feasibility studies within the community at large. For ATLAS, the ACTS based track reconstruction has surpassed the legacy setup for the predicted Phase-2 performance in computational efficiency, while retaining the same or partly improving on the physics performance. Underpinned by the ATLAS initiative for a technology choice of the high level trigger farm for HL-LHC, a dedicated R&D effort has taken place to reimplement the same conceptual track reconstruction for massively parallel hardware, particularly targeting GPGPUs. This development followed the design principle to strictly avoid compromises (algorithmically and numerically) in physics performance and has led to the creation of several individual leaf libraries that are tightly connected to the corresponding ACTS equivalents. The detray library allows for a GPU friendly geometry description of generic detectors that can be automatically generated from an ACTS tracking geometry description without loss of precision in describing the geometry and material content of the detector. The covfie library enables a generic description of covariant vector fields and can be used to accurately describe and access the magnetic field needed for track propagation on host and device; it has also been adopted by the AdePT project aiming at a Geant4 based full simulation on GPGPUs. The algorithmic code of clustering, seeding, track finding and fitting, together with the necessary components of track propagation have been implemented in detray and traccc to match the algorithmic performance of the corresponding ACTS algorithms, while achieving a remarkable relative speedup with respect to full CPU based processing. We present the physics and computational performance of both ACTS and traccc using the ColliderML dataset, that has been simulated with Geant4 and the OpenDataDetector. Furthermore, we demonstrate the level of compatibility of results in this heterogeneous setup and augment this with a discussion about numerical stability and prospects of long term maintenance. Where possible, we translate those results onto applications of clients of ACTS, most prominently the ATLAS Phase-2 upgrade ITk.
Speaker: Fabrice Le Goff (University of Oregon (US)) -
366
ColliderML: the High-Luminosity benchmark dataset
We present the first full release of ColliderML, a large-scale, fully simulated benchmark dataset for algorithm R&D and development, as well as machine-learning applications.
It is built on top of the OpenDataDetector (ODD) under high-luminosity collider conditions (ColliderML). ODD comprises a set of subsystems that are representative of future collider experiments like at the High-Luminosity Large Hadron Collider (HL-LHC) and the Future Circular Collider (FCC).The dataset contains O(1 million) high-pileup proton-proton collisions, realistically simulated and digitized, distributed across O(10) important Standard Model (SM) and Beyond Standard Model (BSM) physics channels. Single-particle samples are also included, targeting track fitting and calorimeter response studies. The object content includes detailed energy depositions from tracker and calorimeter sensors, as well as reconstructed physics objects such as particle tracks and jets.
The simulation pipeline leverages tooling from the ACTS and Key4HEP projects, with digitization procedures adopting best-practices from experimental collaborations. Baseline reconstruction performance using ACTS track finding, fitting and vertexing establishes benchmarks for comparing alternative approaches. Beyond conventional reconstruction, the dataset enables diverse machine learning studies, including distinguishing physics channels from multi-scale detector information, harnessing multiple detector regions for integrated reconstruction tasks like particle flow, and assessing generalizability to unseen SM and BSM conditions. The realistic detector effects from full simulation provide crucial tests for symmetry-preserving architectures, while the multi-scale data structure supports exploration of particle physics foundation models that consume low-level readout and adapt to various downstream tasks. An intuitive software library accompanies the dataset to facilitate widespread adoption and efficient data processing.
Speaker: Anna Zaborowska (CERN) -
367
3+1D GPU reconstruction of the Inner Tracking System (ITS2) for ALICE Run 3
In ALICE, LHC Run 3 marks a major step toward GPU-centric data processing.
During the synchronous (online) phase, GPUs are fully dedicated to Time Projection Chamber reconstruction and compression. During the asynchronous (offline) phase, additional reconstruction tasks can be offloaded to GPUs to improve overall computing efficiency and throughput.We report the porting of the ITS2 reconstruction chain to AMD and NVIDIA GPUs and its integration into the mainline ALICE GPU reconstruction framework.
The GPU algorithms are currently being commissioned for the upgraded Inner Tracking System (ITS2); we present integration details, performance characterization, validation against the CPU baseline and the outstanding challenges.
Results for representative physics datasets demonstrate an overall speed-up in full asynchronous production of $>26\%$, highlighting the benefit of heterogeneous acceleration for Run 3.Finally, we address a time-dependent detector effect in the continuous ITS2 readout: due to the finite rise time of the ALPIDE chip, charge deposits created near a readout frame boundary can be time-shifted so that their clusters appear entirely in the subsequent frame.
This migration leads to missing clusters in the readout frame and, consequently, a loss of tracks in events close to frame borders.
We describe the procedure implemented in the reconstruction chain to compensate for this effect and demonstrate that it recovers the affected tracks, thereby improving the overall reconstruction performance and extending ALICE’s physics reach in Run 3.Speaker: Felix Schlepper (CERN, Heidelberg University (DE)) -
368
Multi-Modal Graph Neural Network Tracking for Belle II with an ONNX-based Integration
High levels of beam-induced detector noise and detector aging degrade track-finding performance in the Belle II central drift chamber, resulting in losses of both track finding efficiency and purity. This motivates the development of reconstruction approaches capable of maintaining robust performance under deteriorating detector conditions. Building on our earlier work on an end-to-end multi-track reconstruction method for Belle II at the SuperKEKB collider (arXiv:2411.13596), we have expanded the algorithm to utilise information from both the drift chamber and the silicon vertex detector simultaneously, creating a multi-modal network. Graph neural networks are used to accommodate the irregular detector geometry, while object condensation enables reconstruction in the presence of an unknown and variable number of charged particles per event. The resulting model reconstructs all tracks in an event simultaneously and estimates their corresponding parameters.
We demonstrate the algorithm's effectiveness using a realistic full detector simulation, which incorporates beam-induced backgrounds and noise modeled from actual collision data. The simultaneous reconstruction of the information from the two detectors significantly improves the track purity while maintaining comparable efficiency. We provide a detailed comparison of its track-finding performance against the current Belle II baseline across various event topologies. Finally, we address the practical implementation by detailing the network's integration into the Belle II analysis software framework via ONNX, discussing critical challenges like model conversion, inference speed, memory usage, and ensuring compatibility with existing reconstruction workflows.Speaker: Giacomo De Pietro (Karlsruhe Institute of Technology) -
369
Track Reconstruction with a Heterogeneous and Geometry-Agnostic Framework for HEP Experiments
The upcoming upgrades to the Large Hadron Collider for the HL-LHC era will progressively increase the nominal luminosity, aiming to a reach peak value of $5×10^{34} $ cm$^{-2}$ s$^{-1}$ for the ATLAS and CMS experiments. Higher luminosity will naturally lead to a larger number of proton–proton interactions occurring in the same bunch crossing, with pileup levels that may reach up to 200, creating a significantly more complex environment for track reconstruction.
To cope with these conditions, several experiments have begun redesigning parts of their track reconstruction software so that it can run efficiently on heterogeneous computing architectures. Although these initiatives have yielded promising results, they have generally remained internal to each individual experiment.
In this work, we present the capabilities of a standalone framework designed to operate across multiple backends including CPUs, NVIDIA GPUs and AMD GPUs, and to reconstruct tracks in cylindrical tracker detectors used by different high-energy physics experiments. We evaluate its physics performance as well as its computational performance for a variety of detectors.
This effort constitutes a first step toward a unified and experiment-independent reconstruction tool for HL-LHC–like detectors defined only in terms of its fundamental components: a silicon tracking system, at least one calorimeter, and a muon subsystem; which is capable of taking advantage of heterogeneous computing resources.
Speaker: Adriano Di Florio (CC-IN2P3)
-
365
-
Track 4 - Distributed computing
-
370
Systematic studies of RDataFrame-based CMS analysis distributed over an HPC-Bubble
For a few years INFN has been investing effort in exploring technologies to seamlessly integrate distributed resources to effectively enable high-rate data analysis patterns supporting interactive and/or quasi-interactive analysis of sizable amounts of data. One of the main drivers for this initiative is to contribute to the R&D activities for the evolution of the analysis computing model for the CMS experiment. Recent results obtained by integrating an “HPC-Bubble” hosted at INFN-Padova (a cluster with specialized hardware) demonstrated the technical feasibility of the proposed model. This first work motivated us to extend the R&D program by performing a series of systematic studies aiming to define a cost/benefit matrix of various hardware configurations and setups.
As a first contribution, we provide an overview of the recently enhanced integration strategy which leverages Dask and interLink technologies to transparently offload interactive payloads to remote distributed resources. Then, we elaborate on systematic studies carried out as a series of CMS analysis runs implemented with the ROOT RDataframe high-level interface. The studies compare multiple configurations of the analysis and the specialized cluster, including different data formats, namely the TTree data format already in production and the future RNTuple data format, as well as different hardware and storage system setups. The resulting evaluation is based on several metrics, from event throughput to resource efficiency.
Speaker: CMS Collaboration -
371
Extending PanDA/Harvester to US HPC Through Facility and Edge Service APIs
High energy physics (HEP) workflows are approaching the throughput limits of traditional grid/HTC computing, as LHC and DUNE are driving O(10–100)× data growth and increased GPU demand. This motivates a practical path to routinely use leadership-class HPC resources remotely. One of the challenges is the varied authentication, authorization and job submission mechanisms at different HPC centers. In this work, we extend the PanDA/Harvester workflow management system with an edge service model that keeps Harvester as a centralized control plane while deploying small, site-resident clients to interface with facility services and batch schedulers. We evaluate three complementary edge mechanisms: (1) Globus Compute (\textbf{GC}) via a Multi-user Endpoint for site-agnostic submission and control; (2) NERSC Superfacility API (\textbf{SFAPI}); and (3) OLCF Secure Scientific Service Mesh (\textbf{S3M}) for facility-native batch job control. Starting from Perlmutter and OLCF, we run PanDA pilots with CVMFS-based runtime delivery and demonstrate practical resource acquisition, robust environment setup, and a clean association between Slurm allocations, Harvester workers, and pilot execution units. To simplify operations for remote and multi-site deployments, we add a remote credential manager that supports controlled issuance, renewal, and isolation of credentials needed for edge side execution. We also strengthen launch and control paths for parallel workloads by providing scheduler-aware wrappers that support three levels of parallelism: multiple pilots per node, multi-node allocations per job, and multiple concurrent jobs/allocations, and we improve observability with a monitoring plugin that cross-checks edge service state with native scheduler queries for accurate task lifecycle tracking and failure diagnosis. Our results highlight the expected trade-off between portability and depth of integration: GC offers a uniform interface across sites, while SFAPI/S3M enables tighter coupling with NERSC/OLCF capabilities. This indicates a clear path to port from Grid to HPC within the PanDA/Harvester ecosystem. This work is also aligned with Project Genesis as a representative use case for API-driven, multi-facility workflow automation.
Speaker: Tianle Wang (Brookhaven National Lab) -
372
Cross-Facility Workflow Portability for HEP Experiments: Integrating JustIN with the NERSC Superfacility API using DUNE 2x2
As HEP experiments increasingly rely on diverse computing resources across multiple facilities, sustainable workflow orchestration that bridges experiment-native tools with facility-native interfaces becomes critical. This work develops and evaluates a generalizable approach to cross-facility workflow integration, using the DUNE 2×2 Near Detector simulation as a challenging demonstrator case.
We present a study of integration between JustIN, the DUNE workflow management system, and the NERSC Superfacility API—NERSC's programmatic interface for HPC job submission and storage systems—, demonstrating a new approach to executing HEP simulation campaigns on leadership-class HPC facilities. This work extends previous portability efforts—which brought the DUNE 2×2 Near Detector simulation chain to HPCs like Perlmutter, Polaris, and Frontier—by addressing what we term the "workflow-management dimension" of portability: the challenge of bridging experiment-native orchestration tools with facility-native job submission interfaces.
The DUNE 2×2 ND simulation chain stress-tests workflow portability by coupling CPU- and GPU-intensive stages, depending on CVMFS-distributed software, and requiring access to external databases and metadata services. JustIN orchestrates data-driven, multi-stage campaigns using standard HEP ecosystem components for data management (Rucio), metadata handling (MetaCat), and distributed job execution (HTCondor/GlideinWMS), while the Superfacility API provides programmatic access to job submission, data movement, and monitoring on Perlmutter.
Our study executes JustIN workflow stages on Perlmutter through Superfacility-driven wrapper jobs. We characterize challenges specific to this integrated setting—including token and identity propagation across trust boundaries, duplicated monitoring and bookkeeping, and mismatches between pilot-based and facility-native job models—and present mitigation strategies using thin adapter layers and container overlays. We outline a design for extending this approach to other facilities adopting Integrated Research Infrastructure (IRI) API patterns. The resulting lessons offer a reusable template for HEP experiments seeking sustainable, cross-facility workflow orchestration without abandoning their existing toolchains.
Speaker: Ozgur Ozan Kilic (Brookhaven National Laboratory) -
373
CernVM-FS Filebundles for low-latency data distribution in interactive usecases
The CernVM-Filesystem (CVMFS) is a global, read-only, on-demand filesystem optimized for software distribution. Its on-demand nature is well adapted and extremely efficient for distributed batch computing, but can mean noticeable latency in interactive use, especially when working with applications such as python that load a large number of small file on startup.
In this contribution we present CVMFS file bundles, a new feature that allows repository owners to increase startup performance by defining lists of files that are known to be loaded together and can therefore be requested asynchronously by the client, and benchmark startup performance of common applications.
Speaker: Valentin Volkl (CERN) -
374
Real-time supervision and control of Grid workflows
Effective tools for monitoring Grid workflow executions are crucial for the prompt identification of issues, which in turn facilitates the design and deployment of appropriate solutions. The ALICE Grid middleware JAliEn utilizes the MonALISA framework to monitor all its Grid components, which collectively generate an enormous amount of data - about 200,000 monitored parameters per second across the entire Grid. The contemporary Grid environment is characterized by execution nodes featuring an increasing CPU core count and larger batch queue slot sizes, sometimes encompassing whole nodes with hundreds of cores. In such an environment, the efficient extraction of monitoring parameters becomes a critical operation, as a single monitoring agent must fetch and transmit all the monitoring data for tens of thousands concurrently executing jobs.
To address this challenge, we have achieved significant performance improvements by leveraging cgroups v2. These are used to set boundaries on resource utilization and their accounting metrics are profited from to monitor all the middleware components and executing payloads. This new methodology has dramatically reduced the time required for monitoring JAliEn agents' resource utilization from the order of tens of seconds to the order of milliseconds on large Grid whole nodes and complex process trees.
Complementing this monitoring enhancement is the remote logging system in JAliEn. This system sends logs generated by the Grid middleware in real time directly to the JAliEn Central Services. This capability enables live supervision of executions at various Grid sites, proving to be an exceptionally effective tool for debugging issues and identifying areas for potential improvements. Furthermore, the system includes a crucial feature for severe cases: it ensures offline persistence of logs if nodes become inaccessible for any reason. Considering that our agents generate logs at an average frequency of 15kHz Grid-wide, running the system in all instances would result in a substantial increase in traffic to the Central Services. Therefore, the remote logging tool has been designed to allow users to selectively "cherry-pick" the desired logs based on criteria such as site, host, and JAliEn version. This selective logging has been particularly valuable for debugging and reacting to major issues effectively.
The combined utilization of JAliEn remote logging and the detailed Grid monitoring data provides a broader, real-time understanding of the complex workflows executing across our heterogeneous sites. Having all this customizable data readily available is a powerful resource for implementing a more robust and adaptable middleware framework.
Speaker: Marta Bertran Ferrer (CERN)
-
370
-
Track 5 - Event generation and simulation: Event Generation 2
-
375
On numerical validation within MadGraph5 for performance and efficiency enhancement
At the HL-LHC, computing demands, particularly for event generation, will reach an unprecedented volume for which simple scaling of current resources will be insufficient, requiring new algorithmic and architectural strategies to sustain performance within economic and energy constraints.
A particularly promising approach is to identify parts of the simulation workflow that can be safely executed in single precision (FP32) without compromising physics accuracy. Even partial replacement of FP64 with FP32 can provide substantial improvements in both throughput and energy efficiency, with energy consumption commonly approximated to scale with the square of the number of significant bits. This consideration becomes even more critical as modern GPU architectures dedicate an increasing portion of their silicon to ultra-low-precision units originally designed for machine learning—such as FP16, FP8, and FP4—while FP64 performance improvements have largely plateaued.
In this study, we perform a detailed numerical validation of the MadGraph5 Monte Carlo event generator as an example simulation software package using stochastic arithmetic. By employing the CADNA and PROMISE frameworks, we automatically determine the minimally required precision across the code, rigorously identifying which sections require double precision (FP64) and which remain accurate in lower numerical formats.
The workflow presented is fully code-agnostic for any application written in C++, providing a general methodology for mixed-precision deployment. Our results outline a principled path toward exploiting future GPU architectures efficiently while preserving the numerical reliability.
Speaker: František Stloukal (CERN) -
376
Focused Angular N-Body Event Generator (FANG)
We introduce FANG (Focused Angular $N$-body event Generator), a new Monte Carlo tool for efficient event generation in restricted Lorentz-invariant phase space (LIPS). Unlike conventional approaches that uniformly sample the full $4\pi$ solid angle, FANG directly generates events in which selected final-state particles are constrained to fixed directions or finite angular regions in the laboratory frame. Because of the way the generator is constructed, angular constraints can be imposed directly in the laboratory frame while maintaining the correct LIPS structure, enabling differential and total cross sections or decay rates to be computed with high efficiency. The method is validated against analytic results and existing event generators, showing excellent agreement. By reducing computational cost by several orders of magnitude for angular observables, FANG provides a robust and versatile framework for applications in particle, nuclear, and detector physics.
Speaker: Itay Horin -
377
New techniques for reducing negative-weight events in MC@NLO-type simulations
Physics event generators are essential components of the simulation software chain of HEP experiments, providing theoretical predictions against which experimental data are compared. In the LHC experiments, the simulation of QCD physics processes at the Next-to-Leading-Order (NLO) or beyond is essential to reach the level of accuracy required. However, a distinctive feature of QCD NLO generators such as Madgraph5_aMC@NLO (MG5aMC) is that some events are generated with negative weights: this is a problem because the number of MC events that must be generated and processed rapidly increases with the fraction of negative weights. In this presentation, we report on new techniques which we are developing to reduce the fraction of negative-weight NLO events in MG5aMC, notably by using Machine Learning approaches. After presenting the method, we discuss some results based on toy models and proof-of-concept tests in MG5aMC, as well as the prospects for implementing this new approach for production use by the experiments.
Speaker: Andrea Valassi (CERN) -
378
Cell Reweighting Algorithms for Pathological Weight Mitigation in LHC Simulations using Optimal Transport
As the accuracy of experimental results increase in high energy physics, so too must the precision of Monte Carlo simulations. Currently, event generation at next to leading order (NLO) accuracy in QCD and beyond results in the production of negatively-weighted events. The presence of these weights increases strain on computational resources by degrading the statistical power of MC samples, and can be pathological in the context of machine learning. We have developed a post hoc ‘cell reweighting’ scheme by applying an IRC-safe metric in the multidimensional metric space of events so that nearby events in this space are reweighted together. This metric is implemented using Optimal Transport techniques, borrowing from the field of computer vision to solve a longstanding problem in computational particle physics. We compare the performance of the algorithm with different choices of metric, and explicitly demonstrate the performance of the algorithm by implementing the reweighting scheme on simulated events with a Z boson and two jets produced at NLO accuracy.
Speakers: Lauren Meryl Hay (SUNY Buffalo), Rishabh Jain (Brown University (US)) -
379
Revisiting the Fermi Break Up model for the Geant4 library
The Geant4 toolkit is widely used for modelling light-nuclei beam fragmentation in human tissue and other radiological studies (see, for example, [1]). Precise and fast modelling of secondary fragments resulting from beam fragmentation in tissue is vital for studying the radiobiological effects of heavy ion therapy [1]. Short $^{16}$O–$^{16}$O and $^{20}$Ne–$^{20}$Ne runs have been conducted at the LHC to study small systems [2]. Spectator fragments with a Z/A ratio similar to $^{16}$O can be transported in the LHC alongside the initial nuclei. The fragmentation of $^{16}$O, particularly the alpha yields, should be modelled to evaluate these effects [3].
A new model for the fragmentation of the light excited nuclei was developed. It can be used both as a fragmentation model in Geant4 and as a separate code to model fragmentation in light nuclei collisions. A model physics performance was validated with FORTRAN77 reference implementation by A. Botvina [4]. The implementation employs modern C++ features and a systematic caching strategy: precomputing valid decay channels indexed by mass ($A$) and charge ($Z$), storing light nuclear properties in a perfect-hashing table, and minimizing dynamic memory allocations through move semantics and lightweight programming patterns. This design ensures thread-safety and adherence to Geant4 coding guidelines while maintaining a memory footprint of under 25 MB. The optimized model achieves significant speed-up along with optimal memory usage compared to the previous version from Geant4 v9.2. After, the revisited code was validated to demonstrate its physics performance. The code has been included in Geant4 v11.4 release.
References
[1] I. Pshenichnov et al., Nucl. Instr. and Meth. B, 268, 604 (2010)
[2] https://home.cern/news/news/accelerators/first-ever-collisions-oxygen-lhc
[3] A. Svetlichnyi et al., Physics 5, 381 (2023)
[4] J.P. Bondorf et al., Physics Reports, 257, 133 (1995)Speaker: Aleksandr Svetlichnyi (INR RAS, MIPT(NRU))
-
375
-
Track 7 - Computing infrastructure and sustainability
-
380
Exabyte-Scale Automation, Alarms and Monitoring at CERN
Over the past 70 years, CERN’s pioneering work in particle physics and more than a decade of operations at the Large Hadron Collider (LHC) has driven a dramatic transformation in data storage. With each new experimental run, the scale and complexity of data handling continue to grow. As we approach the next Long Shutdown (LS3) and the High-Luminosity LHC (HL-LHC) era, storage infrastructure demands are expected to rise exponentially, bringing significant challenges and opportunities.
Today at CERN, we operate over 800 storage nodes across eight independent EOS instances, forming the backbone of data storage for experiments, services and users. Managing this infrastructure at the Exabyte scale requires robust monitoring, smart alerting systems and a deep understanding of system performance and operational behavior.
In this talk, we will take a behind-the-scenes look at the daily operations of CERN’s storage systems, exploring what it takes to keep EOS running reliably under extreme conditions. We will highlight the evolution of our operational tools/practices and how we are preparing for future requirements in scalability, performance and reliability. Key topics will include improvements in observability, automation, fault detection and incident response, essential components to support EOS as it scales to meet the demands of HL-LHC data workflows.
Speaker: Octavian-Mihai Matei (CERN) -
381
Anomaly Detection in the LHCb Computing Infrastructure
Data centers play a key role in High Energy Physics (HEP) experiments, as there is the need to collect, process, and store large quantities of data. Given the scale and complexity of those computing infrastructures, it is not trivial to spot failures of any nature. Traditional rule-based monitoring systems work well, but they might struggle in large, heterogeneous, and dynamic environments. It is important to be able to identify issues as quickly as possible in order to react and minimize downtime, which can have a negative impact on the data acquisition. In this work, we present an Artificial Intelligence based technique for an accurate and context-aware anomaly detection system in the LHCb computing farm. We show methodologies, design choices, and tools adopted and present results, highlighting the benefits and limitations of this approach.
Speaker: Pierfrancesco Cifra (CERN) -
382
Tag based Resource Monitoring for CMS Sites
Over the last years the landscape of distributed resources used by the CMS experiment has changed significantly. In the past, dedicated compute resources were essentially based on (pledged) x86-CPU installed at classical Grid sites. Nowadays other CPU architectures such as ARM and accelerators like GPUs have become common resources also thanks to the non-Grid opportunistic centres such as HPCs or public clouds being continuously integrated with the CMS computing infrastructure. The possibility to distinguish pledged resources versus opportunistic resources providing fine grained information of the hardware type, has become increasingly important particularly for understanding efficient resource utilization and WLCG central accounting purposes. To address the need for gathering monitoring information in a flexible manner CMS introduced a tagging scheme that is managed via the site configuration, which usually is maintained jointly by local site administrators and central CMS operators in a Gitlab repository. Jobs executed at the various compute resources are parsing the site configuration and can discover a number of agreed tags that get further propagated via HTCondor mechanisms of the CMS Global Pool to the central monitoring infrastructure at CERN, MONIT. There, more complex queries regarding resources usage, e.g. with respect to HPCs or a certain architecture, can be executed. This contribution will present the end-to-end workflow as well as the results obtained during the pilot integration with a growing number of integrated computing centres.
Speaker: CMS Collaboration -
383
JobLens: A Lightweight Job Observability Collector for High-Throughput HEP Computing
With the escalating processing demands of modern high-energy physics experiments, traditional monitoring tools are faltering under the dual pressures of cumbersome deployment and coarse-grained observability in high-throughput production environments. JobLens is a lightweight, one-click-deployable data collector designed to deliver fine-grained, job-level observability for HEP workloads. Its architecture centers on three core innovations: (1) eBPF-based kernel instrumentation enabling near-zero-overhead, dynamic tracing of process lifecycles and system calls without kernel modifications; (2) a highly configurable plugin architecture featuring asynchronous double-buffered pipelines that seamlessly export metrics to diverse backends (Elasticsearch, Prometheus, Kafka) while maintaining under 5% CPU average overhead; and (3) a Lua-scripted rule engine that dynamically registers monitoring policies to autonomously detect and track specific job categories in HTCondor-managed HEP clusters. This script-driven automation eliminates manual configuration, empowering operators to define custom matching rules (by experiment, user group, or resource template) that are evaluated at runtime to instantiate per-job collectors. Design analysis and preliminary benchmarks demonstrate support for over 200 concurrent jobs on a single worker node, targeting sub-second 99th-percentile collection latency. Comprehensive validation at production scale across HEP experiment workflows is currently underway.
Speaker: Mr Zhenyuan Wang (Computing center, Institute of High Energy Physics, CAS, China) -
384
Towards An Unified Electric Energy Monitoring for the Worldwide LHC Computing Grid
The Worldwide LHC Computing Grid (WLCG) provides the distributed infrastructure necessary to support both LHC and non-LHC experiments; however, the corresponding rise in energy usage presents new challenges, in particular with the upcoming HL-LHC era, where computing requirements will continue to expand significantly.
Therefore monitoring power consumption has become increasingly important, due to rising energy costs and the growing computational demand of scientific research.Collecting accurate power consumption data across thousands of servers in diverse data centers is not trivial, and establishing a unified monitoring solution often requires additional infrastructure and maintenance. This work proposes a lightweight approach to obtain information on power consumption from computing centers by leveraging existing infrastructure, requiring minimal effort from site administrators.
The proposed method enables large-scale and continuous data collection, allowing us to construct a comprehensive model of power usage across the entire grid. Currently, no reliable metric collector exists to quantify the overall power efficiency of the WLCG. By increasing site adoption and integrating this solution, we can finally obtain these missing insights. With sufficient statistics, the model will also enable predictive capabilities, such as estimating the expected power efficiency of new or evolving sites. This approach will create a shared knowledge base that supports data-driven decisions, capacity planning, and long-term energy-efficiency strategies, while also improving overall resource accounting. Previous studies and isolated tests demonstrate that the proposed approach is both technically feasible and highly valuable for the HEP community.
Speaker: Natalia Diana Szczepanek (CERN)
-
380
-
Track 8 - Analysis infrastructure, outreach and education: Training
-
385
HEP Software Training with IRIS-HEP/HSF
The IRIS-HEP training program and the HEP Software Foundation (HSF), collaborate and co-organize software training events for the high-energy physics community. These activities include hands-on workshops and schools that focus on modern software, computing, and analysis tools. The program addresses the need for both general computational skills and domain-specific knowledge required to effectively contribute to HEP software.
In this presentation, we describe the work involved in planning, coordinating, and delivering these training events for a large and international audience. We will discuss approaches to motivating learners, accommodating participants with varied backgrounds, and sustaining a training framework that supports the development of future researchers and software practitioners, HEP in particular.
Speaker: Richa Sharma (University of Puerto Rico (US)) -
386
heptraining.cern.ch – A catalog for HEP training resources and events
The HEP Training platform is a new online registry designed to facilitate the discovery and dissemination of HEP-related training materials and events across high energy physics experiments, labs and universities. Students, researchers, and educators can have access to a list of curated resources – such as tutorials, guidelines, workshops and training events. These resources are links to contributions in our growing network of content providers.
Beyond improving educational resource discovery, our platform fosters HEP community engagement by bridging training and recognition, connecting the development of high-quality educational materials with mechanisms that acknowledge and credit their authors.
HEP Training contributes and links together various projects which in turn allows improvements of the current platform. The latter ranges from content exchange between other training catalogs to raising recognition to trainers, scientists and engineers in the realm of research software engineering.
heptraining.cern.ch is currently deployed, maintained and curated at CERN.
In this talk, we will explain the setup and status of the HEP Training platform, the mTeSS-X and EVERSE projects and their interplay, as well as their future developments such as the recognition framework, ensuring appropriate acknowledgment of trainers, whose work is often underrated – yet fundamental – for academic and scientific work.
Speaker: Kenneth Rioja (CERN) -
387
Teaching ROOT
Since 2024, the ROOT team has started a modernisation campaign of the ROOT software trainings as well as of dedicated ROOT tutorials available online on our website. Collectively, we have trained more than 700 people, including newcomers and experienced users wanting to dive into the newest features. We taught in person at CERN and at the Users Workshop in Valencia, and online during the HSF/IRIS-HEP Python for Analysis training series. We recorded and edited the videos [1],[2], so that the trainings are also available online and can be followed along with the associated repositories [3],[4]; we have reached around 4000 views in total in the two videos produced so far. We are also planning an Advanced ROOT course focused on data analysis which will take place in March 2026. In this talk we will share the ideas behind the trainings and summarize the new advanced training from March. We will describe the tutorials modernisation campaign and we will share our plan on how to expand the trainings, engaging even larger part of the community, in the meantime opening the floor to feedback and discussion afterwards.
[1] ROOT Summer Student Course 2025 - video, available online at: https://videos.cern.ch/record/2301866
[2] ROOT Summer Student Course 2024 - video, available online at: https://videos.cern.ch/record/2300516
[3] ROOT Summer Student Course 2025 - repository, available online at: https://github.com/root-project/student-course/tree/25.07
[4] ROOT Summer Student Course 2024 - repository, available online at: https://github.com/root-project/student-course/tree/24.09Speaker: Danilo Piparo (CERN) -
388
ePIC User Learning Training and Documentation Strategies
The ePIC experiment at the future Electron-Ion Collider relies on a rapidly evolving software ecosystem for simulation, reconstruction, physics analysis and detector support. As the collaboration grows, enabling users to efficiently discover, learn, and develop software tools has become increasingly important. The ePIC User Learning working group addresses this challenge by developing training and onboarding strategies that lower barriers to entry while supporting sustainable software development.
We describe our approach to improving software discoverability through regular, engaging software tutorials and a centralized landing page that serves as an entry point for users. The landing page provides curated links to repositories, documentation, tutorials, and recommended workflows, helping users to quickly identify relevant tools and understand their use cases. These efforts are complemented by recorded tutorials and task-oriented learning modules that support both new and experienced collaborators. While developed for ePIC, these strategies are broadly applicable to other large-scale scientific software projects.Speaker: Alexandr Prozorov -
389
A Hosted BinderHub Service as a Scalable Training Platform for HEP
We present the development and user experience of a hosted BinderHub service that delivers a scalable, uniform, and reproducible computing environment for training sessions and workshops. The IRIS-HEP Scalable Systems Laboratory operates an enhanced, Kubernetes-based BinderHub platform for HEP training and analysis, extending the upstream project with GPU support, guaranteed CPU and memory resources, image pre-pulling, multi-cluster federation, and Dask Gateway integration. By providing browser-based interactive environments, this implementation lowers the barrier to entry for workshop participants and requires no local software installation.
We present the system architecture, custom spawner implementation, CI/CD workflow, and operational experience used to support large events (25 to 100 participants), including HSF-India workshops, CODAS-HEP summer schools, and PyHEP workshops. These events necessitate expanding the resource pool to external infrastructures such as the National Research Platform. We analyze operational, workshop leader, and participant experiences, and describe improvements that enhance user experience and platform robustness. Our results indicate that a hosted BinderHub system is a robust and scalable training platform for the HEP community.
Speaker: Fengping Hu (University of Chicago (US))
-
385
-
Track 9 - Analysis software and workflows
-
390
Leveraging Neural Simulation-Based Inference for an EFT Analysis in CMS
Neural Simulation-Based Inference (NSBI) is an analysis technique which leverages the output of trained deep neural networks (DNNs) to construct a surrogate likelihood ratio which can then be used for a binned or unbinned likelihood scan. These techniques have show some success when applied to analyses involving effective field theory (EFT) approaches, where it can be difficult to achieve sensitivity using a hand-engineered variable to infer the likelihood. In this talk, we will report the recent progress in implementing NSBI techniques to CMS data analyses involving top pair production. In particular, we will explore the challenges involved with implementing NSBI in analyses involving many parameters of interest, such as the Wilson coefficients in an EFT analysis. We will also report on the various approaches used in incorporating systematic uncertainties in NSBI analyses.
Speaker: Eddie Mcgrady (University of Notre Dame (US)) -
391
Neural Simulation-Based Inference at the LHC
Neural Simulation-Based Inference (NSBI) is a family of emerging techniques that allow statistical inference using high-dimensional data, even when the exact likelihoods are analytically intractable. The techniques rely on leveraging deep learning to directly build likelihood-based or posterior-based inference models using high-dimensional information. By not relying on hand-crafted, low-dimensional summary observables, NSBI can improve sensitivity for precision measurements and searches — as has been demonstrated across several scientific domains.
We review recent NSBI applications in ATLAS and CMS and focus on a key practical challenge for application of NSBI to full-scale LHC analyses: scalable treatment of $O(10^2)$ nuisance parameters encoding systematic uncertainties. Building on HistFactory-style interpolations, Gaussian-process surrogates and information-geometric approximations, we explore the development of NSBI models and fitting strategies that retain sensitivity while keeping training and inference tractable at the scale of LHC experiments.
Finally, to support adoption of these techniques in real analyses, we introduce nsbi-common-utils [1], an open source Python toolkit providing a reproducible, modular end-to-end workflow for NSBI: data preparation, model training, calibration, and statistical inference, steered by configuration files. The talk will go into the details of this library and on its scope and practical usage in high-energy physics analysis.
Speaker: Jay Ajitbhai Sandesara (University of Wisconsin Madison (US)) -
392
Simulation-Based Inference (SBI) in Precision Physics
Recent anomalies in flavour observables have motivated renewed interest in precision measurements of semileptonic $B$-meson decays as a probe of possible physics beyond the Standard Model. Extracting such effects often requires fitting complex, high-dimensional datasets in which traditional likelihood-based methods become computationally challenging or intractable. Simulation-based inference (SBI) offers a promising alternative: by leveraging the full power of modern machine learning, SBI can exploit detailed event-level information while remaining agnostic to analytical likelihood forms.
In this work, we investigate artificial intelligence methods for SBI of the deviation of the Wilson coefficient $C_9$ from its Standard Model value, denoted $\delta C_9$, using $B \to K^\ast \ell^+ \ell^-$ events simulated within the Belle II software and environment. We compare three neural-network–based approaches to this high-dimensional fitting problem. The first method maps each dataset onto a three-dimensional grid and infers $\delta C_9$ using computer-vision techniques. The second employs a Deep Sets architecture that predicts $\delta C_9$ while explicitly enforcing event-level permutation invariance. The third trains a per-event classification model to produce a binned probability distribution over $\delta C_9$; predictions from individual events are then aggregated to obtain the full posterior distribution for an entire dataset. We train and evaluate all models on simulated samples with and without detector effects, and additionally assess performance on datasets that include simulated background events taken from the beam-constrained-mass ($M_{\mathrm{bc}}$) sideband.
Our results highlight both the potential and the challenges of applying SBI to flavour-physics analyses. They demonstrate that machine-learning models can recover meaningful information on $\delta C_9$ even in the presence of detector smearing and background contamination, suggesting that SBI may become a valuable tool for future precision measurements at Belle II and other flavour experiments.
Speaker: Ethan Lee -
393
Differentiable Simulation-Based Inference in RooFit with Neural Surrogates
Neural Simulation-Based Inference (NSBI) enables efficient use of complex generative models in statistical analyses, outperforming template histogram methods in particular for high-dimensional problems. When augmented with gradient information, NSBI can both maximise sensitivity to new physics and reduce the required amount of simulation.
The integration of NSBI into established likelihood-based frameworks remains an active area of development. In this contribution, we demonstrate end-to-end SBI workflows within the RooFit statistical modeling framework, combining traditional likelihood components with neural network surrogates.
As analyses for the HL-LHC are expected to involve increasing model complexity and parameter counts, scalable likelihood minimization becomes essential. Automatic Differentiation (AD) is one essential ingredient for that. We demonstrate AD of RooFit likelihoods that include neural network inference in their computation, targeted for NSBI applications. Neural surrogates are deployed in RooFit via C++ code generation using TMVA-SOFIE, and differentiated using Clad, a source-to-source AD tool for C++, yielding exact gradients of the full likelihood with respect to model parameters.
To the best of our knowledge, this is among the first demonstrations in particle physics of an NSBI pipeline that supports automatic differentiation through neural network inference in a fully compiled C++ likelihood. We present example use cases, performance benchmarks, and discuss implications for future precision analyses at scale.Speaker: Jonas Rembser (CERN) -
394
NEEDLE: A Columnar Workflow Orchestrator for Large-Scale Neural Simulation-Based Inference in HEP
Neural Simulation Based Inference (NSBI) has emerged as a powerful statistical inference methodology for large datasets with high-dimensional representations. NSBI methods rely on neural networks to estimate the underlying, multi-dimensional likelihood distributions of the data at a per-event level. This approach significantly improves the inference performance over classical binned approaches by circumventing the need for summary statistics. In practice, NSBI tools remain computationally expensive due to the per-event statistical inference step and the training of many large neural networks in order to prevent biases. Implementations of NSBI in High Energy Physics (HEP) analyses therefore require reliable orchestration on heterogeneous resources, high-throughput ingestion, and processing of variable-length event data.
The NEEDLE project aims to meet these demands by providing a flexible framework for distributed training on computing infrastructure alongside a toolbox of powerful NSBI methods. The framework reduces the operational cost associated with NSBI by implementing core features such as orchestration, data ingestion and experiment tracking. First, the orchestration is performed with a directed acyclic graph (DAG) workflow manager for the machine learning training and evaluation, versioning and experimentation life cycle. Second, models and datasets are tracked with pytorch Lightning, allowing for flexible and reproducible experiments. Finally, common HEP storage formats such as ROOT and parquet are read dynamically using dask-based libraries for optimal memory management.
In this contribution, we present the design principles, software architecture, and performance characteristics of the NEEDLE framework. This will be demonstrated for Neural Likelihood Ratio approaches on HEP open datasets.
Speaker: Kylian Schmidt (KIT - Karlsruhe Institute of Technology (DE))
-
390
-
Poster
-
Track 1 - Data and metadata organization, management and access: Databases for experiment data and operations
-
395
Scaling the HSF Conditions Database Across Experiments
The HSF Conditions Database (CDB) is a community-driven solution for managing conditions data - non-event data required for event processing - which present common challenges across HENP and astro-particle experiments. In the three years of production operation for sPHENIX at BNL, where the HSF CDB supports over 70,000 concurrent jobs on a farm running 132,000 logical cores, it has evolved into a robust and scalable service shaped by continuous real-world feedback. It is also being adopted by the Belle II experiment at KEK, which runs up to about 36,000 concurrent jobs across a distributed grid of about 38 computing sites worldwide. We will present the insights from the migration, including the challenges of data migration, as well as the developments on both the server and client software sides required to adopt and integrate the new system.
These experiences highlight practical considerations for experiment-wide deployment and provide guidance for future adopters of the HSF CDB. Recent developments further enhance the system, including caching techniques, experiment-specific authentication plugins, database replication for high availability, and a deployment framework based on Helm charts and OpenShift. An IntelligentLogging pipeline, developed through the HSF GSoC programme, provides central log aggregation, storage, and monitoring, and uses DeepLog-based anomaly detection. Beyond sPHENIX and Belle II, DUNE is preparing for adoption after initial prototype deployments demonstrated sufficient performance and Einstein Telescope is integrating it into an Open Science solution, demonstrating the system’s versatility across diverse experimental environments.
Speaker: Ruslan Mashinistov (Brookhaven National Laboratory (US)) -
396
CREST Conditions Data Model for HL-LHC in ATLAS
The ATLAS experiment is redesigning its Conditions database infrastructure in preparation for Run 4. The new system (CREST - Conditions REST) adopts a multi-tier architecture in which interactions with all databases including the Trigger physics configuration database are mediated through a web-based server layer using a REST API. The data caching is provided via Varnish HTTP proxies. We present the current status of the associated software and infrastructure developments, illustrating progress through concrete workflow examples ranging from High-Level Trigger (HLT) usage to Monte Carlo Upgrade simulations for Run 4.
To validate performance and functionality, a dedicated demonstrator has been developed for a typical HLT workflow. This workflow poses unique challenges compared to later data-processing stages, as it requires both efficient data caching and rapid updates of frequently changing conditions, such as luminosity and beamspot parameters, during raw data processing. We describe the demonstrator setup along with results from performance and scalability tests.
Finally, we outline the migration strategy from the current COOL conditions database to CREST system, which enables large scale validation using HLT reprocessing, data reprocessing, and Monte Carlo simulation workflows. This represents a key milestone toward certifying a grid-ready Conditions database service and ensuring full compatibility across the experiment’s software ecosystem for HL-LHC operations.
Speaker: Andrea Formica (Université Paris-Saclay (FR)) -
397
Migration of Conditions data caching from Squid to Varnish in the ATLAS experiment
Efficient access to Conditions data is critical for data processing in the ATLAS experiment at the LHC. For more than a decade, Squid HTTP proxies deployed across distributed computing sites have provided low-latency access, reduced WAN bandwidth consumption, and protected origin servers from excessive load. Conditions data traffic is characterized by exceptionally high request rates - often exceeding 20,000 requests per minute at large Tier-2 sites - with cache hit rates above 99.9%. To meet growing performance and operational demands, ATLAS has modernized this infrastructure by adopting Varnish HTTP reverse proxies with RAM-only storage, selected for their high throughput, robustness, and strong community ecosystem. A streamlined deployment strategy was implemented: a small number of high-capacity regional proxies were installed at major sites and placed behind location-aware Cloudflare DNS load balancers, providing seamless global access. Additional local proxies were introduced only where required to meet performance or special needs. The origin server layer - the Frontier launchpads - was re-architected as a high-availability Kubernetes-based service. The new design integrates Nginx load balancers, Varnish proxies, and Tomcat-based Frontier servlets into resilient launchpad clusters. The modernization effort also introduced a comprehensive monitoring and alerting framework, including an AI-driven anomaly-detection agent capable of identifying irregular traffic patterns and issuing proactive alerts. This upgraded architecture significantly improves scalability, resilience, and operational efficiency, ensuring robust delivery of Conditions data for the medium term and providing a model for similar services in support of future LHC runs.
Speaker: Ilija Vukotic (University of Chicago (US)) -
398
Enhancing the use of the Calibration and Conditions Database in ALICE Grid jobs
Authors:
- Martin Øines Eide, Western Norway University of Applied Sciences,
University of Bergen, Bergen, Norway and European Organization for
Nuclear Research (CERN), Geneva, Switzerland
- Costin Grigoras, European Organization for Nuclear Research (CERN), Geneva,
Switzerland
on behalf of the ALICE collaboration
The ALICE experiment at CERN relies on a central service known as the Calibration and Conditions Database (CCDB).This service acts as a single, uniform source of data essential for online and offline reconstruction, analysis, and other crucial tasks within the experiment. Currently, the CCDB is fully operational and has successfully managed a heavy workload, serving thousands of requests per second across the online and the distributed offline Grid environment. Due to the centralized nature of the CCDB service combined with the distributed execution of ALICE Grid jobs, connectivity is a significant concern - jobs occasionally encounter connectivity issues when attempting to access the CCDB. Furthermore, the practice of redundant lookups, where multiple jobs or even the same job repeatedly request identical pieces of calibration or conditions data, imposes an unnecessary load on the central service. To mitigate these operational challenges, the ALICE team is actively investigating and implementing a caching solution.
This work details the specific technical improvements made to the CCDB usage tracking and analysis mechanisms, which were necessary to properly characterize the service's workload and optimize the caching strategy. In particular, to maintain the reliability and responsiveness of the CCDB in the face of immense Grid job traffic, rigorous connection monitoring tracking key network and database metrics, such as the latency for establishing a connection, the duration of open connections, and the frequency of connection timeouts or failures experienced by distributed Grid jobs was implemented. By closely monitoring these parameters, the system can identify and flag specific regions or job types prone to connectivity issues, allowing for targeted network or service adjustments. This detailed monitoring directly informs the assessment of database performance and the choice of the caching solution itself, along with its architecture and successful integration into the ALICE Grid middleware, JAliEn.
Speaker: Martin Øines Eide (Western Norway University of Applied Sciences (NO)) -
399
A Distributed Event Sourced Configuration Database for the ATLAS ITk Pixel Detector
In its high luminosity phase, the Large Hadron Collider (LHC) will
achieve unprecedented levels of instantaneous luminosity
of up to $7.5\times10^34 $cm$^2$s$^{-1}$, which exposes the ITk (Inner Tracker) Pixel
detector of the ATLAS experiment to extraordinary levels of radiation.
A maximum fluence of $9.2\times10^15$cm$^{-2} 1$MeV $n_{eq}$ in the harshest radiation
region at the innermost detector layers is expected.
To keep occupancy low in these conditions, the number of pixels will
increase from ~92 million in the current ATLAS Pixel Detector to ~5 billion.
Each pixel holds configuration data which has to be constantly
reprogrammed and monitored while the detector is taking data.
The configuration of all these pixels has to be kept available rapidly
at all times throughout operation.
Due to levels of radiation exposure never seen before, easy and quick
evolution monitoring must be possible in order to understand and
potentially mitigate radiation damage to the detector system.
Based on this monitoring, a re-calibration of the detector can be
derived, generating a new set of configuration data which has to be
applied to the pixels eventually.
Possibly updates of the configuration between accelerator fills will be
necessary.To meet these challenging requirements, a distributed event sourced
configuration database has been proposed and is currently under development.
An event sourced database stores data as small updates, so-called domain
events, which restore the state of the system when played back into
so-called aggregates.
In our case, the aggregates are configuration object blueprints.
Once restored, the configuration objects are passed down to the data
acquisiton (DAQ) software.
By storing only differential updates of the configuration data it is
expected to achieve great savings in required storage, and great
increases in speed, compared to a solution that stores the full
configuration data for each version.
The database adheres to the CQRS principle in order to decouple writes
and reads.
This allows a fan-out topology of the database on the read-side, where
O(100) DAQ hosts should keep a subset of the total configuration data
destined to the respectively connected frontend detector hardware.
The optimal architecture to distribute the configuration data from the
central servers to the DAQ hosts is currently under study.
For example, to achieve strict consistency of configuration updates an
event bus fully integrated with the database is under investigation to
provide feedback about the state of the system.
Full-scale modelling of the database is underway on a SLURM cluster in
Wuppertal.
The implementation of the database in Python is expected to provide
ready access to the vast Python ecosystem for data analysis.Speaker: Gerhard Immanuel Brandt (Bergische Universitaet Wuppertal (DE)) -
400
Intelligent Database of ZTF and LSST Alerts
This contribution presents the architecture and implementation of an intelligent database system for astronomical alerts produced by the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). The system is designed to support efficient exploration of large-scale alert streams through both traditional query mechanisms and advanced similarity-based navigation.
Beyond standard attribute-based searches, the database enables fuzzy, proximity, and vector-based searches, allowing users to identify alerts with similar characteristics and to navigate the alert space in a recommendation-style manner. This is achieved by combining vector database techniques with additional domain-specific intelligence, providing flexible and extensible support for heterogeneous alert representations and evolving data formats.
The presentation will describe the overall system architecture, its theoretical foundations, and its implementation as a multi-language, multi-database solution that integrates multiple database technologies, including relational (SQL), document-oriented and key–value (NoSQL), and graph databases, each used according to its strengths and access patterns, and optimized for high-volume, high-velocity alert data. Particular attention will be given to data format flexibility, interoperability, and scalability. An intuitive, interactive web-service-based user interface allows both exploratory and programmatic access to the data.
Several advanced usage patterns, including similarity search, alert clustering, and exploratory classification workflows, will be demonstrated. The described system is an integral component of the Fink alert broker and contributes to efficient real-time and offline analysis of time-domain astronomical events.
Speaker: Julius Hrivnac (Université Paris-Saclay (FR))
-
395
-
Track 2 - Online and real-time computing
-
401
New DAQ software for the NEXT-100 experiment
The Neutrino Experiment with a Xenon TPC (NEXT) investigates neutrinoless double-beta decay (0νββ) in xenon using high-pressure xenon time projection chambers. This approach enables excellent energy resolution and allows for the 3D reconstruction of the track, improving the sensitivity using the topological information.
Previous prototypes of the NEXT experimental programme were using DATE as the online data taking software. This software was used by ALICE, but was discontinued a few years ago. After evaluating alternatives, the NEXT collaboration decided to develop a new DAQ software adapted for the experiment's requirements.
The DAQ software for the NEXT-100 experiment is written in Go, leveraging go-routines for efficient data stream handling. Local Data Concentrators (LDCs) collect data via UDP and forward subevents to a Global Data Concentrator (GDC) over TCP for event building, with results stored in HDF5 files containing full waveforms. A web interface built with Vue.js, supported by a Go REST API, manages the LDC and GDC servers. The system uses gRPC for backend communication and websockets (via Centrifugo) for real-time status and event-rate monitoring. It also integrates with Grafana and Prometheus for monitoring and alerting to operators and administrators.
Speaker: Jose Maria Benlloch Rodriguez (Donostia International Physics Center (DIPC) (ES)) -
402
Scalable DAQ Performance for Supernova Neutrino Observations in DUNE
The Deep Underground Neutrino Experiment (DUNE) is an international next-generation project that will use a powerful neutrino beam produced at Fermilab and two detectors: a near detector at Fermilab and a far detector ~1300 kilometers away, at the Sanford Underground Research Facility in South Dakota. DUNE features a high-throughput, modular data acquisition system (DAQ) specifically designed to capture intense physics events, including Supernova Neutrino Bursts (SNBs). Within the first 10 seconds of such a burst, approximately 10^57 neutrinos are emitted, with around 60 expected to interact in the far detectors. Given DUNE’s ambitious scientific goals and the rarity of supernova bursts (roughly twice a century), the DAQ is required to meet stringent performance criteria: the ability to run continuously for extended periods with a 99% up-time requirement, the functionality to record both beam neutrinos and low-energy neutrinos, data throughputs of up to 1.8 TB/s for the far detectors, and a total storage capacity of 30 PB per year. The system’s modular design enables this workload to be shared evenly across 150 identical detector units, distributed among high-performance commercial off-the-shelf servers where one readout server manages four detector units. These servers interface directly with the detector electronics, receiving data over Ethernet. The data are then buffered and processed to extract “trigger primitives” used for data selection. In this talk we present the results of recent performance tests conducted using the protoDUNE horizontal drift time projection chamber at the Neutrino Platform at CERN, including the ability to record a 100 s-long data capture that will be used for SNB readout. We show that the DUNE DAQ readout system is reliable and scalable for capturing SNB neutrino interactions in DUNE's far detector modules.
Speaker: Mr Robert-Mihai Amarinei (University of Toronto (CA)) -
403
The online event classification software in the JUNO experiment
The Jiangmen Underground Neutrino Observatory (JUNO) is a large-scale neutrino experiment with multiple physics goals. After many years of dedicated effort, the construction of the JUNO detector has been successfully completed, and physics data-taking officially commenced on August 26, 2025.
The detector readout system produces waveform data at a rate of approximately 40 GB/s at a 1 kHz trigger rate, making it impractical to store all the raw data. To address this challenge, the Online Event Classification (OEC) software is employed to reduce the data rate by more than two orders of magnitude. The OEC system saves reconstructed time/charge (T/Q) information for all events and selectively stores waveforms for events of interest, which will be used in subsequent offline precision reconstruction.
This contribution presents the software of the OEC system, including the multithreaded Low-Level Event Classification (LEC) module and the single-threaded High-Level Event Classification (HEC) module. In addition, a middleware layer has been developed to support the integration of offline algorithms into the online environment. Finally, we report on the computing performance observed during data-taking operations.Speaker: Wenxing Fang -
404
The Effelsberg Direct Digitization Radio Astronomy Backend
The Effelsberg Direct Digitization (EDD) backend is a multi-science computing system for real-time processing of data from radio telescopes on commercial-off-the-shelf computing hardware. While originally developed for the Effelsberg 100-m telescope, its has been generalized into an open-source framework that currently drives data recording at four independent telescopes, including single dishes and arrays, running at scales ranging from 2 to 36 HPC processing nodes an the individual telescopes. The framework supports all common observation modes in radio astronomy. Additional modes for specialized science cases can be easily integrated into the system due to its plugin-focused design. Data transport between the processing components is achieved via multicast Ethernet, while orchestration, monitoring, and lifecycle management are handled through containerisation, CI/CD pipelines, and automated deployment with Ansible. The system allows telescope operators to rapidly switch between observation modes, with computing resources surplus to a given mode being repurposed for offline data analysis, for example through the use of HTC Condor clusters. In this talk we will outline the architecture of the EDD system and discuss its role in next-generation radio observatories.
Speaker: Tobias Winchen (Max Planck Institute for Radio Astronomy) -
405
The SPD Online filter.
The Spin Physics Detector (SPD) is currently under construction at the second interaction point of the NICA collider at JINR. Its primary physics goal is to test fundamental aspects of Quantum Chromodynamics by studying the polarized structure of the nucleon and investigating spin-dependent phenomena in collisions of longitudinally and transversely polarized protons and deuterons. These collisions will reach a center-of-mass energy of up to 27 GeV and a luminosity of up to 10^32 cm^−2 s^−1.
Due to the use of a free-running data acquisition (DAQ) system, the experiment facility will produce a continuous data stream at an estimated rate up to 20 GB/s (approximately 200 PB/year). Unlike conventional setups, hardware-level event selection is not feasible for SPD, as trigger decisions depend on reconstructed tracking information—such as momentum and vertex position—which cannot be determined at the earliest stage of data taking. Consequently, the raw data stream requires real-time processing to unscramble and assemble individual physics events.
To meet requirements for primary data processing, a dedicated computing system—the SPD Online Filter—is being developed. This facility will perform data processing on the full data throughput from the DAQ system, executing multistep workflows as for real-time event selection so for other needs like data quality monitoring, partial calibration etc. This presentation will describe the conceptual design, system architecture, and current implementation status of the SPD Online Filter.Speaker: Dr Danila Oleynik (Joint Institute for Nuclear Research (RU)) -
406
Service-Based DUNE Run Control Architecture for High-Availability Data Acquisition
DUNE is a long-baseline neutrino oscillation experiment utilizing several detectors at both the Near Detector (ND) and Far Detector (FD) facilities. The design and architecture of the FD control and data acquisition (DAQ) system have progressed with the successful operation of the ProtoDUNE-II FD prototypes at CERN. The control system architecture has evolved from a single monolithic structure to a service-based approach. This new design was successfully commissioned and operated on the vertical drift technology in the NP02 cryostat during the summer of 2025.
This system effectively orchestrated the DAQ applications on bare metal, interfacing with industry-standard services such as Grafana, alongside custom microservices. These custom components include a run number generator, a run registry for configuration archiving, and middleware responsible for persisting message data from Kafka brokers to various databases. This contribution reports on the operational experience gained from this infrastructure and details the required new services and modifications needed to deploy a fault-tolerant production system capable of meeting DUNE’s high-uptime requirements.Speaker: Pawel Maciej Plesniak (Imperial College (GB))
-
401
-
Track 2 - Online and real-time computing
-
407
The upgrade of the ATLAS Trigger and Data Acquisition system for the High Luminosity LHC
The ATLAS experiment at CERN is constructing upgraded system
for the "High Luminosity LHC", with collisions due to start in
2030. In order to deliver an order of magnitude more data than
previous LHC runs, 14 TeV protons will collide with an instantaneous
luminosity of up to 7.5 x 10e34 cm^-2s^-1, resulting in much higher pileup and
data rates than the current experiment was designed to handle. While
this is essential to realise the physics programme, it presents a huge
challenge for the detector, trigger, data acquisition and computing.
The detector upgrades themselves also present new requirements and
opportunities for the trigger and data acquisition system.The design of the TDAQ upgrade comprises: a hardware-based low-latency
real-time Trigger operating at 40 MHz, Data Acquisition which combines
custom readout with commodity hardware and networking to deal with
4.6 TB/s input, and an Event Filter running at 1 MHz which combines
offline-like algorithms on a large commodity compute service
with the potential to be augmented by commercial accelerators .
Commodity servers and networks are used as far as possible, with
custom ATCA boards, high speed links and powerful FPGAs deployed
in the low-latency parts of the system. Offline-style clustering and
jet-finding in FPGAs, and accelerated track reconstruction are
designed to combat pileup in the Trigger and Event Filter
respectively.This contribution will report recent progress on the design, technology and
construction of the system. The physics motivation and expected
performance will be shown for key physics processes.Speaker: ATLAS Collaboration -
408
The data handling library of the DUNE Data Acquisition System
The Deep Underground Neutrino Experiment (DUNE) is a long-baseline neutrino physics experiment with detectors located 1.5 km underground at the Sanford Underground Research Facility. The Data Acquisition (DAQ) system interfaces with multiple front-end electronics, each producing data with distinct rates and formats, and handles the reception, transportation, and preparation of this data for later reconstruction and analysis. The DAQ must sustain high-rate input, perform buffering, process data, and assemble data fragments for requested time-windows.
We developed a type-generic data handling library within the DUNE DAQ framework to perform data reception, data validation, data buffering, and additional post-buffer-insertion processing. Each functionality allows for multiple implementations and further specializations in order to handle a wide variety of streaming data and processing patterns, including fixed- or variable-rate reception, ordered or unordered data, and pipeline-staged or asynchronous event-driven processing. Data insertion, lookup, and range extraction are implemented with attention to time complexity and lock-free techniques to ensure reliable performance under high incoming rates.
Our contribution presents the data handling library, its current use-cases with both detector readout data and DAQ-produced trigger data, and discusses further planned developments.Speaker: Deniz Tuana Ergonul -
409
Wall-time-driven data aggregation for the CBM FLES input interface
The CBM First-Level Event Selector (FLES) serves as the central data processing and event selection system for the upcoming CBM experiment at FAIR. Designed as a scalable high-performance computing cluster, it facilitates online event reconstruction and selection of unfiltered physics data at rates surpassing 1 TByte/s. The FLES input data originates from approximately 5000 detector links, each delivering time-stamped messages in a free-streaming data acquisition mode. For efficient data handling, these detector data streams are time-partitioned into context-free packages called microslices, which are subsequently aggregated into larger processing intervals known as timeslices.
We present a new design for the FLES input processing chain that introduces subtimeslices as a new data structure enabling local aggregation of timeslice components on entry nodes. The main characteristic of the new design is a shift from data-driven to wall-time-driven operation with dynamic timeslice component building.
Exploiting the time information inherent in the data streams, the content of each timeslice component is dynamically determined by evaluating the timestamps of available microslices against the defined time boundaries of the timeslice component. Subtimeslices are assembled opportunistically based on wall time ensuring that subtimeslices are formed even when some contributions are delayed or missing. As an additional benefit, this approach enables automatic synchronization of channels into the data stream without complex extra logic, as channels automatically participate once new microslices become available.
These developments significantly improve resilience against detector malfunctions and overload conditions, reduce networking overhead and buffer fragmentation, and enable modular system startup. The redesigned system is currently operational in our development setup, with deployment to production planned for the upcoming beam time campaigns.
This work is supported by BMFTR (05P24RF3).
Speaker: Dirk Hutter (Goethe University Frankfurt (DE)) -
410
Networking in KM3NeT
The KM3NeT neutrino detectors, currently under construction in the Mediterranean Sea, are designed to measure high-energy cosmic neutrinos and their properties. To exploit the Cherenkov effect as the detection technique, the ARCA and ORCA detectors are deployed at two abyssal sites, off the coasts of southern Italy and France, respectively. Operating in such an extreme deep-sea environment, the detectors rely on a robust, high-performance networking architecture that ensures negligible data loss and precise time synchronisation across thousands of distributed detector modules.
Given the impossibility of routine maintenance operations for such submarine installations, the electronics integrated in each detection module are deliberately kept as simple and robust as possible to ensure decades-long operational stability. Consequently, no data filtering is performed underwater, and all acquired data are continuously streamed to shore. This "all-data-to-shore" architecture shifts event selection and data processing to onshore facilities, imposing stringent requirements on network throughput, latency, and synchronisation accuracy.
This contribution outlines the primary design constraints of the KM3NeT networking system and discusses the rationale behind the adopted architecture and technologies. The offshore communication and timing systems are described in detail, while the main features of the onshore infrastructure are also presented, providing a complete overview of the end-to-end data transmission chain of the detector.
Speaker: Emidio Maria Giorgio (INFN LNS) -
411
Performance evaluation and optimization of ethernet-based data acquisition servers in DUNE
The Data Acquisition (DAQ) system of the Deep Underground Neutrino Experiment (DUNE) at the Sanford Underground Research Facility must receive detector data aggregated over multiple 100 Gbps Ethernet streams from the Far Detector modules front-end electronics. This contribution outlines the performance tuning and evaluation of high-performance COTS (Commercial Off-The-Shelf) readout servers, which interface with heterogeneous front-end electronics and forward detector payloads for buffering and quasi real-time processing.
We present measurements from targeted throughput studies and baselining testing campaigns that examine the readout servers and their direct interfaces under both synthetic and operational conditions. These studies aim to quantify achievable throughput and resource utilization, and identify where network, CPU, and software implementation begin to limit end-to-end performance on a single COTS server. This work is essential for defining minimum system requirements and the most power-efficient technical specifications for DUNE DAQ readout servers.
Particular focus is placed on achieving deterministic behaviour and on providing configurable parameters for threading, buffering, and processing scheduling to support varying operational requirements. The results inform ongoing DAQ software refinements and guide the technical specifications for DUNE DAQ readout servers and system topology to support the baseline 400 Gbps capable configuration.Speakers: Deniz Tuana Ergonul, Shyam Bhuller (University of Oxford (GB)) -
412
Development of a common platform for streaming readout DAQ systems within the SPADI Alliance
The SPADI Alliance in Japan is developing a common, trigger-less streaming data acquisition (DAQ) platform to address the increasing demands of modern nuclear and particle physics experiments. The Alliance integrates R&D efforts from front-end electronics to computing and networking, promoting open collaboration across laboratories.
At the hardware level, the platform is developing a family of streaming front-end electronics. Its first implementation, the FPGA-based time-to-digital converter (TDC) module, AMANEQ, achieves sub–30 ps timing resolution and continuous readout. Additional front-end modules, such as waveform digitizers currently under development, extend applicability to diverse detector systems. A lightweight and deterministic clock synchronization scheme, MIKUMARI, together with the LACCP protocol, enables precise clock frequency and timestamp alignment without reliance on accelerator timing signals, allowing plug-and-play operation and flexible system scaling.
On the software side, the platform employs NestDAQ, a modular streaming DAQ framework that supports distributed data transport, semi-automatic topology configuration, load balancing, and software-based triggering. Ongoing developments also focus on observability for system monitoring, error analysis and anomaly detection. For the online processing and monitoring, Artemis, a root-based analysis framework, is employed.
The SPADI standard system has been deployed at multiple facilities including RCNP, J-PARC, RARiS, and HIMAC, demonstrating online coincidence identification and event reconstruction in practical beam experiments. This contribution describes the concept and architecture of the organization and the system, recent progress in hardware and software development, and initial experimental implementations.Speaker: Tomonori Takahashi (RCNP, University of Osaka)
-
407
-
Track 3 - Offline data processing: Core Software and Frameworks 3
-
413
Accelerating ML Inference on heterogeneous architectures using SOFIE and alpaka
Deploying machine learning models in environments with high-throughput, low-latency, and strict memory constraints is challenging, especially when these environments evolve rapidly and require simplified user-control, dependency management, and long-term maintainability. In high-energy physics, and particularly within the Trigger Systems of major LHC experiments, similar requirements arise for real-time data processing. While ML models offer significant opportunities, their inference phase continues to be challenging.
SOFIE (System for Optimized Fast Inference code Emit) translates trained ML models and generates low-latency, high-performance C++ code for inference while depending only on BLAS for matrix operations. Recent developments include an improved inference on CPUs that surpasses state-of-the-art ONNX Runtime performance for several LHC models. Through optimized kernels for common ML operations, enhanced memory-reuse mechanisms and improved dynamic tensor support, SOFIE delivers high CPU performance while remaining extremely lightweight.
The upcoming High-Luminosity LHC era highlights the growing need for co-processors to accelerate workloads that benefit from massive parallelism, such as ML inference. However, heterogeneous devices introduce complications like non-uniform memory formats, diverse inference configurations, and costly data movement between host and device.
To support ML inference across heterogeneous architectures, SOFIE can generate C++ code that uses alpaka data buffers for its memory management. Leveraging the benefits of alpaka on abstracting heterogeneous programming, this enables the generated code to run on multiple backends with minimal modification. Because the generated code is architecture-agnostic, integration into existing HEP workflows becomes easier.
While SOFIE relies on BLAS libraries for optimized matrix operations, heterogeneous architectures typically provide vendor-specific BLAS implementations with limited portability. This complicates the generation of fully architecture-agnostic code. To address this, we introduce sofieBLAS, a lightweight abstraction layer that exposes a unified BLAS interface and dynamically selects the appropriate backend at runtime. This preserves portability while allowing efficient use of vendor-optimized BLAS implementations.
With ML models in HEP research getting more complex and sophisticated everyday, compression techniques are becoming increasingly useful. In order to address this need, SOFIE is now integrated with PQuant- a library for End-to-End Hardware-Aware Model Compression using Pruning and Quantization techniques. This allows us to generate C++ code for inference from quantized ML models.
We present the recent developments in SOFIE, including the optimizations in CPU implementation and the new heterogeneous inference capabilities. We also show benchmarking results for both CPU and GPUs using SOFIE’s generated code, comparing performance against other ML inference libraries, including PyTorch and ONNX Runtime on common models used in HEP research such as ATLAS GN2, CMS ParticleNet, Diffusion models for Fast Simulations, etc.
Speaker: Sanjiban Sengupta (CERN, University of Manchester) -
414
Leveraging Inference as a Service technology for executing ML models by the Derived AOD production applications of the ATLAS experiment
To address this challenge and prepare for the transition to large, resource-intensive ML models, we propose leveraging AthenaTriton for DAOD production, where these ML models are executed on dedicated computing resources. AthenaTriton is a tool for running ML inference as a service in Athena using the NVIDIA Triton server software.We discuss different deployment strategies for Triton servers across heterogeneous computing platforms, including WLCG sites and High Performance Computing centers. We present the results of measurements of various performance metrics, including network transfer rate and latency, as well as event processing throughput. Finally, we evaluate the scalability of the AthenaTriton approach as a function of computing resources, enabling data-driven optimization of future DAOD workflows and ensuring sustainable, efficient large-scale ML inference across the evolving ATLAS computing infrastructure, which will increasingly rely on shared computing resources like those provided by the American Science Cloud.
Speaker: Vakho Tsulaia (Lawrence Berkeley National Lab. (US)) -
415
Accelerating GNN Inference for the GNN4ITk Pipeline in ATLAS
The High-Luminosity LHC (HL-LHC) will impose unprecedented pile-up and throughput demands on the ATLAS offline tracking reconstruction, making computational efficiency an essential requirement alongside physics performance. We present a comprehensive study of the ATLAS GNN4ITk offline track-reconstruction pipeline, spanning graph construction, Graph Neural Network (GNN) inference, and track building, with a focus on scalable deployment on GPU-accelerated computing platforms.
The pipeline, which includes a CUDA kernel graph construction step followed by the downstream GNN architecture, maintains high tracking efficiency while improving edge-level purity. Model-level optimizations include mixed-precision inference, CUDA Graphs, and torch.compile, complemented by quantization-aware training and structured pruning to reduce model size and improve throughput. For offline reconstruction integration, models are exported to ONNX and optimized using Torch-TensorRT for execution within the ACTS-based tracking workflow. The final graph segmentation step is implemented as a dedicated CUDA kernel, improving both physics robustness and end-to-end reconstruction performance.
To assess scalability in realistic offline processing scenarios, we benchmark a tracking-as-a-service deployment using NVIDIA Triton Inference Server on HPC GPU clusters, measuring end-to-end event throughput under concurrent workloads and representative I/O conditions. We report a set of relevant performance metrics, including per-event latency, throughput per GPU, peak memory usage, and energy per event, and support these with ablation studies that isolate the impact of compute, memory, and data loading optimizations.
These results demonstrate that the GPU-accelerated GNN4ITk pipeline improves offline tracking throughput and resource efficiency while preserving physics performance, providing a viable and scalable path toward HL-LHC offline reconstruction.Speaker: Alina Lazar (Youngstown State University (US)) -
416
Redesigning the ATLAS In-File Metadata and Navigational Infrastructure for HL-LHC
Efficient and maintainable in-file metadata is crucial for large-scale event processing. The ATLAS experiment's Athena event-processing framework relies on complex navigational and metadata infrastructure to manage event processing across diverse workflows. As experimental demands grow, inefficiencies and redundancies in the current metadata infrastructure have constrained storage efficiency, affected reliability, and increased maintenance challenges.
We describe a comprehensive redesign of the metadata and navigation infrastructure, which handles data organization and retrieval, aimed at simplifying data relationships and reducing architectural and code duplication. By consolidating metadata handling into a more streamlined software design, our prototype improves both storage efficiency and maintainability.
We also survey metadata usage across workflows to normalize and deduplicate analysis metadata, creating a more robust and transparent system. Finally, we explore the potential of modern storage technologies, particularly RNTuple attributes, to support the evolution of in-file metadata structures. These developments complement I/O framework modernization by providing a coherent and efficient metadata layer atop the next-generation persistence backend.
Together, these improvements demonstrate a path toward more maintainable, efficient, and scalable metadata frameworks applicable to large-scale HEP software beyond ATLAS.
Speaker: Nathan Jihoon Kang (Argonne National Laboratory (US)) -
417
Robust by Design: A Meta‑Algorithm for Stable Deep Learning
The reliability and reproducibility of machine learning models are critically important for their use in automated systems. In the field of HEP, this may include detector optimization, use in blind analysis, and situations where estimates of model uncertainties are required. Building upon our previous research on developing robust model selection algorithms, we propose and comprehensively test an empirical approach to defining and automatically selecting robust deep learning models. In our study, a robust model is one that produces similar losses regardless of the choice of training sample from the population, and weight initialization. We previously implemented this approach in a regression task to reconstruct photon energy and position using GEANT4 simulation data from a Shashlik-type electromagnetic calorimeter. We investigated the impact of several factors on model robustness, including the size and heterogeneity of the training sample, the weight initialization method, and the inclusion of inductive biases (e.g., total cluster energy or its barycenter). In the present study, we demonstrate the universality of the proposed approach by extending its functionality to the classical, widely adopted computer vision task of classifying images from the CIFAR-10 dataset. The proposed method is a meta-algorithm with two key components: 1) a robustness assessment procedure that uses statistical analysis to evaluate loss variance across multiple, independently trained instances of the same model architecture and 2) a selection algorithm that sequentially filters less robust models out of a broad initial candidate pool by accumulating statistical evidence. Additionally, which is particularly important for HEP, the method allows one to extract the model's systematic uncertainties. Results confirm that the algorithm retains its efficiency, achieving convergence to a competitive model with a significantly lower total computational cost than an exhaustive search.
Speaker: Dr Alexey Boldyrev -
418
CUDA Acceleration of Awkward Array Using Python CCCL
Awkward Array is a widely used library in high-energy physics (HEP) for representing and manipulating nested, variable-length data in Python. Previous CHEP contributions have explored GPU acceleration for Awkward Array, demonstrating the feasibility and performance benefits of CUDA-based backend while also identifying limitations related to irregular data access, fine-grained kernel launches, and composability of operations. In this contribution, we present recent developments that build directly on these earlier efforts by introducing a CUDA execution model for Awkward Array based on the Python CUDA Core Compute Libraries (CCCL).
Using CCCL, we eliminate the need for custom CUDA kernels and can instead use a high-level Python interface. The CCCL-based approach also enables fusion of multiple Awkward operations into a reduced number of CUDA kernels, addressing kernel launch overhead observed in earlier GPU implementations. Lazy execution allows expression graphs to be constructed and optimized prior to kernel generation, improving performance for analysis workflows involving jagged arrays, combinatorial operations, and reductions. In contrast to earlier approaches, this design also emphasizes extensibility, allowing user-defined Python code to be incorporated into GPU execution paths with minimal boilerplate and without breaking existing analysis semantics.
We present performance studies that demonstrate improvements over previously reported eager GPU execution strategies for representative HEP analysis patterns. These developments extend the GPU capabilities of Awkward Array toward a more composable and sustainable backend, aligned with the needs of Python-based analysis at the HL-LHC and beyond.
We thank the NVIDIA team for support and collaboration in integrating CCCL into Awkward, developing CUDA kernels and providing guidance on GPU optimization strategies. Their contributions are gratefully acknowledged.
Speaker: Maksym Naumchyk
-
413
-
Track 4 - Distributed computing
-
419
CMS Monitoring infrastructure beyond Run 3
As part of the Run 3 of the Large Hadron Collider (LHC), the CMS experiment generates large amounts of data that have to be processed and stored efficiently. The complex distributed computing infrastructure used for these purposes has to be highly available, and having a reliable and comprehensive monitoring setup is essential for it. The CMS monitoring team is responsible for providing the necessary monitoring services.
CMS monitoring services are partially based on open-source solutions provided by the CERN IT MONIT infrastructure, and partially on custom applications mainly devoted to data mining deployed on Kubernetes clusters at CERN. We report on recent improvements that increase the productivity and efficiency of the services offered by the CMS Monitoring team, with a strong focus on data popularity monitoring, HTCondor job monitoring and Infrastructure-as-Code integration.
Data popularity is one of the key metrics for CMS, due to the distributed nature of the storage infrastructure. Being able to keep a close eye on which datasets have not been accessed recently, or which ones get the most accesses over time is essential for decision making on data center maintenance or on choosing where popular datasets should be hosted, for example.
HTCondor is a central piece of software for processing and analysing data coming from the CMS experiment, and most applications and users that interact directly with such data do so through “HTCondor jobs”. Due to the large amount of HTCondor jobs running at all CMS sites at the same time, we are completely refactoring the architecture of our current HTCondor job monitoring application in favor of a more scalable and flexible solution by using different Kubernetes resources such as NATS (Neural Autonomic Transport System) as message queue and KEDA (Kubernetes Event-Driven Autoscaling) for horizontal autoscaling.
To improve the work efficiency of operators in the team we are migrating the CMS monitoring infrastructure to use OpenTofu as an Infrastructure-as-Code solution. This will enable better automation with more complex integrations with CI/CD pipelines, as well as easier maintenance of separate environments for the different stages of development.
This contribution will go through these projects, covering the challenges and adopted solutions, which could serve as examples for similar issues faced in different HEP experiments or in the broader physics community.
Speaker: Carlos Borrajo Gomez (CERN) -
420
Next-Generation Accounting Architecture for WLCG and EGI #30
The WLCG infrastructure is evolving to support the HL-LHC, requiring greater capacity and increasingly diverse resource types, which challenges the existing accounting system to become more flexible in handling heterogeneous resources such as GPUs and in incorporating new metrics, including environmental and sustainability indicators. The current system relies on outdated and overly complex technologies, motivating a full redesign. Other communities relying on the system through EGI face similar challenges, and the enhanced accounting system is therefore intended to serve a broad set of distributed research infrastructures beyond the HL-LHC. The new architecture is based on tools such as the WLCG Accounting Utility (WAU) and the Accounting Data Handling Toolbox for Opportunistic Resources (AUDITOR), introducing a modular data model, modern scalable repositories, and streamlined data flows supporting both push- and pull-based publication. Core features include end-to-end validation, cross-checks with experiment accounting systems, and flexible enrichment using WLCG topology. Developed collaboratively by CERN, STFC, and the University of Freiburg, the redesigned system will support both WLCG and non-WLCG communities.
Speaker: Panos Paparrigopoulos (CERN) -
421
A look inside the ALICE Grid: visualisation tools to better understand how the system operates
The ALICE Grid incorporates a large volume of heterogeneous resources, including systems with a diverse range of CPU and GPU resources, various operating system versions, and differing hardware architectures. The Central Grid Operation team lacks direct access to the individual clusters and nodes that compose the Grid, which presents numerous challenges to fully understanding and optimizing the middleware workflow. Consequently, having tools that help streamline the debugging of issues within such a complex environment is extremely valuable for the Grid managers, site administrators, and users alike. This capability allows a faster response to problems, thereby fostering a more collaborative and efficient environment.
This contribution focuses on advanced dashboards of Grid parameters that have been instrumental in identifying relevant issues and areas for improvement. The first of these is the job-to-core allocation, which graphically illustrates the distribution of running jobs across the CPU resources of the Grid nodes. This visualization depicts currently running and recently executed jobs (with a history retention of five days), showing the lifetime from the batch queue slot down to the allocated CPU resources on any given Grid node. To understand why some of the resources remain underused, we have developed specialized views which analyze sampled job match requests to identify the specific conditions that are preventing jobs from matching the advertised resources.
Additionally, statistical visualizations are presented that illustrate the success rates and the reasons for failure of jobs executed under different conditions, such as those that are oversubscribed or those optimized for Time-To-Live (TTL). We demonstrate how these visualizations have aided in diagnosing various problems and how they have directly led to the optimization of our middleware workflows.
Speaker: Marta Bertran Ferrer (CERN) -
422
Enabling monitoring of GPU accelerators in the ALICE Grid
The ALICE Collaboration actively relies on accelerators, such as GPUs, to handle increasingly complex workflows and data rates. Such resources have rapidly risen in importance across a number of usecases, and their emergence can be reflected in their availability in the WLCG. Through broader vendor support, as well as improved matching techniques, the ALICE Grid middleware may allocate and use these resources as any other. Yet unlike traditional CPU workloads, the utilisation of GPU resources cannot be trivially tracked solely through the kernel, and generally requires interacting with various drivers and kernel modules. These not only vary between vendors, but also between architectures and driver versions. This poses challenges to both providing accurate resource accounting, and monitoring, for GPU workloads across the Grid.
This contribution outlines an updated middleware stack for ALICE, capable of not only allocating individual GPUs, but also providing a monitoring interface that works across GPU resources. Specifically, it describes how it allows exposing these resources in a unified manner that is agnostic to both vendor and driver versions, avoiding having to tailor to multiple vendor-specific APIs. Furthermore, it will examine how the resulting monitoring data can be exposed to the MonALISA monitoring infrastructure of ALICE. In turn, allowing the tracking of both GPU load and virtual memory across the ALICE Grid - just as any other resource.
Speaker: Maksim Melnik Storetvedt (Western Norway University of Applied Sciences (NO)) -
423
MONIT: Evolution towards new observability ecosystems
We present the evolution of the CERN IT Monitoring (MONIT) architecture for the CERN Data Centres and WLCG Infrastructure monitoring use cases, and how it has been updated to improve scalability, interoperability, and observability. Prometheus has been introduced as the core metrics collection and aggregation system and the previous Collectd-based framework is being replaced by Prometheus exporters. Grafana Mimir provides an scalable long-term storage backend for our time-series data, implementing multi-tenancy for resource isolation. In addition, the adoption of the OpenTelemetry protocol for the transport layer enables unified collection of metrics, logs, and traces across diverse services. This contribution outlines the updated monitoring architecture, the rationale behind these technology transitions, and the operational experience gained in integrating Prometheus, Mimir, and OpenTelemetry into the MONIT ecosystem.
Speaker: Borja Garrido Bear (CERN) -
424
The AUDITOR Ecosystem: High‑Performance, Low‑Memory Accounting with Integrated CO₂ Utilization Reporting for Distributed HEP Computing
Distributed computing infrastructures are shared by multiple research communities, particularly within High Energy Physics (HEP), where precise and transparent resource accounting is critical. To meet these demands, we developed AUDITOR (AccoUnting DatahandlIng Toolbox for Opportunistic Resources), a flexible, modular, and extensible accounting ecosystem designed for heterogeneous computing clusters. AUDITOR is deployed at major HEP sites such as CERN, KIT, where AUDITOR processes over 13 million jobs per month. This figure is expected to increase by a factor of 7-10 with the forthcoming High-Luminosity LHC era and AUDITOR is equipped to handle this increase in job counts.
AUDITOR captures, processes, and analyses usage metrics through specialized collectors from HTCondor, Kubernetes, and Slurm batch systems, storing all data in a central PostgreSQL database. AUDITOR is built in Rust, providing high performance and minimal memory footprint. Its plugin-based architecture allows integration with external tools via a REST API, with Rust and Python clients. Existing plugins include the APEL plugin, which publishes accounting data to the European Grid Initiative (EGI) and priority plugin, which sets the priorities of different groups on a batch system.
Recent developments include Role-Based Access Control (RBAC) system for fine-grained data access and an archival subsystem that now periodically exports historical data to Parquet files, optimizing database performance and enabling efficient long-term storage. The new utilization report plugin provides summaries of job counts, HEPScore performance, power consumption, and estimated CO₂ footprint. Looking ahead, our evolving accounting system is being designed to integrate environmental attributes directly into job-level analytics, enabling a deeper understanding of the energy efficiency and carbon impact of computing workloads. In this talk, we will present the architecture and capabilities of our new utilization reporting tool and highlight how these features are preparing us to meet the present and future challenges of accounting in high-performance and sustainable computing.
Speaker: Raghuvar Vijayakumar (University of Freiburg (DE))
-
419
-
Track 5 - Event generation and simulation: Simulation Tuning, Calibration & Validation
-
425
AI-Assisted Detector Design and optimization Environment for large-scale nuclear and particle physics experiments
Artificial Intelligence (AI) is poised to play a central role in the design and optimization of complex, large-scale detectors, such as the future ePIC experiment at the Electron-Ion Collider (EIC), an international next-generation QCD facility in the United States.
The ePIC experiment consists in an integrated detector comprising a central apparatus complemented by forward and backward subsystems, designed to support a broad physics program while meeting stringent performance requirements within cost, mechanical, and geometric constraints. Addressing these competing demands requires scalable and reproducible optimization strategies operating over a multidimensional, multi-objective design space. This contribution presents recent developments of AID$^2$E, a scalable and distributed AI-assisted detector design and optimization environment motivated by the future EIC program but broadly applicable beyond it. While developed in an experiment-agnostic manner, AID$^2$E has been deployed using the official ePIC software stack and its Geant4-based simulations, combining transparent detector parameterization with modern multi-objective optimization techniques to enable systematic exploration of high-dimensional design spaces. The workflow employs the PanDA and iDDS workload-management systems$-$successfully deployed in experiments such as ATLAS and the Rubin Observatory$-$to orchestrate large-scale simulation and optimization campaigns.
Recent enhancements include expanded PanDA-based workflow support, extended compatibility with PanDA, Slurm and local execution modes, and a Function-as-a-Task (FaaT) system that translates detector-design and optimization workflows into large-scale distributed processing pipelines across heterogeneous computing environments.
Ongoing work focuses on advanced optimization strategies, improved data-analysis tools for navigating design trade-offs, and the planned integration of large language models to enable enhanced workflow orchestration and control with human-in-the-loop oversight.
These advances position AID$^2$E as an extensible platform for large-scale detector design and optimization, with applications transferable to future nuclear and particle physics experiments and to detector-optimization tasks in ongoing Jefferson Lab experiments, where AID$^2$E has been successfully integrated for calibration and alignment.
AID$^2$E thus exemplifies the transformative role of AI in automating and scaling complex scientific workflows.Speaker: Cristiano Fanelli (William & Mary) -
426
Accelerated Calibration: Unbinned, High-Dimensional Tag-and-Probe Scale Factors with Normalizing Flows
Scale factors derived from Tag & Probe measurements are essential for correcting detector effects in CMS simulation. However, traditional binned methods fail to capture continuous kinematic evolutions and require time-consuming manual tuning that becomes unmanageable as dimensions increase. To address this, we present a novel unbinned, multivariate Tag&Probe strategy implemented in PyTorch. By replacing histograms with Normalizing Flows, we model the signal and background resonance shapes conditional on the probe kinematics. This allows for a simultaneous unbinned likelihood fit where the efficiency is treated as a continuous, learned parameter of the dataset. We demonstrate that maximizing this unbinned likelihood allows for the extraction of scale factors in high-dimensional phase spaces where binning would be statistically prohibitive. Furthermore, the method offers significant computational advantages. By leveraging GPU acceleration and the unbinned nature of Normalizing Flows, we achieve a multidimensional measurement orders of magnitude faster than classical workflows, eliminating the need for repetitive manual fitting in different bins. We discuss the implementation and performance of this approach, showcasing its potential to improve the accuracy of the detector modeling from simulation while drastically streamlining calibration campaigns at CMS.
Speaker: Davide Valsecchi (ETH Zurich (CH)) -
427
Reweight Me No More: Transformer-Based Mitigation of Multi-Dimensional Mismodelling in HEP Simulations
Accurate Monte Carlo (MC) modelling of high-energy physics (HEP) data remains a central challenge, especially when simulated distributions fail to reproduce observations. Traditional remedies rely on reweighting individual observables to data, an approach that is effective when only one or two dimensions exhibit discrepancies. However, for N correlated observables with N > 2, conventional reweighting becomes impractical due to the exponential growth of required statistics.
In this work, we investigate a transformer-based approach designed to learn corrections directly from data without relying on manual reweighting procedures. By training the network to reproduce multiple target distributions simultaneously, the method yields a generative model that captures complex correlations and mitigates mismodelling in many dimensions. We present studies demonstrating the viability of this technique on representative HEP datasets, discuss its robustness and systematic behaviour, and compare its performance to standard reweighting-based workflows. This approach offers a scalable path toward improved MC description in modern analyses, where high-dimensional observable spaces are increasingly the norm.
Speaker: Mrs Lucie Flek (University of Bonn) -
428
Simulation-based Inference for Precision Neutrino Physics through Neural Monte Carlo Tuning
The Jiangmen Underground Neutrino Observatory (JUNO) is a next-generation neutrino experiment located in China. To achieve its main objectives, the experiment demands highly accurate Monte Carlo (MC) simulations. These simulations must describe the complex response of the 20-kton liquid scintillator target within a 35.4 m diameter acrylic sphere, which is monitored by thousands of photomultiplier tubes. Tuning the effective parameters of these simulations to match experimental data is crucial to characterize the complex detector response and understanding detector related systematics, but traditional iterative methods are computationally prohibitive for modern, large-scale experiments like JUNO.
This contribution presents a novel solution using simulation-based inference (specifically, neural likelihood estimation) to perform precise and accurate MC tuning [1]. We achieve this by creating fast surrogate models that efficiently approximate otherwise intractable likelihoods, incorporating detector response. We developed two complementary neural likelihood estimators: (i) a transformer encoder-based density estimator for binned analysis and (ii) a normalizing flows-based density estimator suitable for both binned and unbinned analyses. Using the JUNO detector as a case study, we train these models on sets of simulated energy spectra from five distinct calibration sources, with each set generated for a specific configuration of detector response parameters. The models learn the complex, non-linear relationship between three key energy response parameters — the Birks' coefficient, the Cherenkov light yield factor, and the absolute light yield — to accurately approximate the conditional probability density function of the energy spectra for any combination of the parameters.
Parameter inference is performed by integrating these learned likelihoods with a Bayesian nested sampling algorithm. Our results show that this approach successfully recovers the true parameter values with near-zero systematic bias and uncertainties limited purely by the statistics of the input data. The applicability of this method using real calibration data will be demonstrated. The proposed framework establishes a promising and generalizable template for parameter inference in modern physics experiments where a comprehensive detector response is computationally expensive to evaluate.
[1] A. Gavrikov et al. "Simulation-based inference for Precision Neutrino Physics through Neural Monte Carlo tuning", arXiv:2507.23297.
Speaker: Dr Arsenii Gavrikov -
429
Differentiable particle simulation for detector optimization
Applying automatic differentiation (AD) to particle simulations such as Geant4 opens the possibility of gradient-based optimization for detector design and parameter tuning in high-energy physics. In this talk, we extend our previous work on differentiable Geant4 simulations by incorporating multiple Coulomb scattering into the physics model, moving closer to realistic detector modeling. The inclusion of multiple scattering introduces substantial challenges for differentiation, due to increased stochasticity. We study these effects in detail and demonstrate stable derivatives of a Geant simulation. As a concrete application, we perform gradient-based optimization of a realistic sampling calorimeter using full electromagnetic physics. These results represent an important step toward practical, large-scale detector optimization with complete Geant4 electromagnetic simulations.
Speaker: Jeffrey Krupa (SLAC) -
430
Machine Learning for Retrieving Published Experimental Data to Validate the Physics Content of Monte Carlo Transport Codes
Validation testing of the physics content of Monte Carlo particle transport systems—used extensively in high-energy, astroparticle, and nuclear physics—requires extensive retrieval of pertinent experimental measurements from the scientific literature. This process often entails examining thousands of papers published over several decades. The rapidly growing volume of literature poses a significant challenge to efficiently identifying relevant publications for this purpose.
Manual review of titles, abstracts, and full texts is increasingly cumbersome due to the huge volume of the literature to be examined and it is prone to subjective judgment. For each physics item (e.g., cross sections, angular distributions, or model parameters), the validation process typically involves screening thousands of candidate papers, out of which only a small fraction represents a suitable source of experimental data.
This study proposes an automated framework that leverages machine learning and natural language processing (NLP) techniques to identify papers aligned with targeted research topics relevant to the validation of physics models implemented in Monte Carlo transport codes such as Geant4, MCNP, Penelope, EGS, and FLUKA. We evaluated different combinations of AI models- including feature extraction methods (TF-IDF and Sentence-BERT) and classification models (Random Forest and neural networks) - to develop an effective literature screening approach.
The models are trained using a labeled dataset derived from references selected in the “Validation of Shell Ionization Cross Sections for Monte Carlo Electron Transport” study. Furthermore, the models are evaluated on a new dataset containing likely relevant references. The methodology is assessed through a concrete validation test use case involving ionization cross sections relevant to Geant4-based simulations.
Results demonstrate that the proposed approach effectively identifies relevant theoretical and experimental publications, significantly reducing the manual effort required to acquire pertinent experimental data. This work highlights the potential of machine learning–driven methodologies to support data-informed literature discovery, streamline validation workflows, and accelerate knowledge acquisition in particle physics.
Speaker: Elisabetta Ronchieri
-
425
-
Track 7 - Computing infrastructure and sustainability
-
431
A Next Generation (Triggers) Computing Platform for HEP
High Energy Physics (HEP) computing at CERN has long relied on interactive SSH environments, shared software stacks and large-scale batch systems. As workloads increasingly adopt containerized and accelerator-driven execution models, a key requirement is to provide a consistent user interface while enabling modern orchestration platforms.
This contribution presents the computing platform developed for the Next Generation Triggers (NGT) project, which unifies traditional HEP workflows with a centralized pool of accelerator resources across on-premises and external infrastructures. The platform is built around a large centralized Kubernetes environment hosting GPUs and other accelerator technologies from multiple vendors, with heterogeneous interconnects including InfiniBand and RoCEv2. Users can access these resources interactively through SSH, notebooks, VSCode and standard Kubernetes interfaces, or through batch-style scheduling and quotas with support for MPI.
We also introduce the MLOps stack developed for NGT, including automated model training and inference pipelines, integration with GitLab CI and GitHub Actions, and a comprehensive monitoring system for workload-level observability, resource utilization and energy reporting. The platform demonstrates how cloud-native tooling can sustain familiar HEP development practices while enabling scalable and accelerator-efficient computing for future trigger and analysis applications.
Speakers: Raulian-Ionut Chiorescu, Ricardo Rocha (CERN) -
432
The GRID Computing Facility at VECC: past, present and future
The GRID Computing Facility, VECC, has been operational for the last two decades. It comprises "Kolkata tier-2 for ALICE" and a "grid-peer tier-3 cluster" for the Indian collaborating Institutes. This is the only computing tier-2 in India for the ALICE CERN experiment under the WLCG umbrella. In this article we will describe how the GRID Computing Facility at VECC evolved and piece by piece developed since its inception during 2003-04. We will also highlight how we designed and implemented a cold aisle containment architecture by which Power Usage Effectiveness factor reduced from 2 to 1.47 and energy required for cooling the data centre reduced to half with the extreme ambient atmosphere.
Other than auxiliary infrastructure, it will be discussed that the entire facility is developed using all the open source software and GRID middleware. The facility is also made up to date with the evolution of the different computing and storage software stacks. Changing of underlining batch scheduler or upgrading to a new major release of the OS is a not trivial task as it is equivalent to building the computing cluster from the scratch. We will present how the facility is performing consistently with the present infrastructure and providing an uninterrupted computing support to the ALICE experiment with averaged more than 90% availability for the last 15 years. In India we are coordinating the GRID India project, therefore, in this article we will describe the future roadmap for the facility and also will explore the next generation heterogeneous resources. Other than managing the computing resource, We envisaged to perform workshops and schools in India towards High Energy Physics computing.Speaker: Dr Vikas Singhal (Department of Atomic Energy (IN)) -
433
A Computing Cluster for Technology Tracking in Einstein Telescope.
The Einstein Telescope, the third generation ground-based interferometer for gravitational wave detection, will observe a sky volume one thousand times larger than the second generation interferometers. This will be reflected in a higher observation rate. The physics information contained in the “strain” time series will increase, while on the machine side the size of the raw data from the instrument will scale with the number and the complexity of the detectors, that will be either four or six, depending on the chosen geometrical configuration. To meet ET specific computing needs, an adequate choice of the technologies, the tools and the framework to handle the collected data, share them among the interested users and enable their offline analysis is mandatory. On the computing side, since ET is expected to begin the data taking in no less than ten years, it is crucial to keep up with the technology that is always improving and to test the new architectures which gradually become available. In INFN-Torino, we are setting up a computing cluster dedicated to Technology Tracking, where machines with heterogeneous architectures are made available to the ET community to develop analysis algorithms and test them on advanced hardware. The cluster is orchestrated via Kubernetes, the authentication is provided via Indigo-IAM and equipped with a custom tool for resource booking written in Play. In INFN-Torino, the Rucio server and one of the storage elements for data distribution is under test and available for Mock Data Challenges. It is also connected to the Technology Tracking cluster. In this talk a description of the computing cluster deployment will be given.
Speaker: Lia Lavezzi (INFN Torino (IT)) -
434
Upgrading the RO-03-UPB site to a Tier 1 facility in 2025-2026
Hosted by the National University of Science and Technology POLITEHNICA Bucharest, the RO-03-UPB site has been an active member of the WLCG computing Grid since 2017 and a member of the ALICE Grid since 2005. Over the course of this collaboration, the site has evolved significantly: originally deployed as a Tier-2 facility, it has grown into a major contributor to the ALICE Grid, currently providing 7,200 CPU cores and 9.6 PB of storage.
To address the ALICE experiment's increased capacity requirements for asynchronous data reconstruction during Long Shutdown 3 and beyond, RO-03-UPB is upgrading to a Tier-1 facility. This contribution outlines the cost-effective strategies employed to achieve this transition.
We discuss the implementation of a disk-based custodial storage solution using EOS in a Redundant Array of Independent Nodes (RAIN) configuration. This approach mitigates the risks associated with disk-based storage while offering distinct advantages over traditional tape-based solutions—specifically, the elimination of complex data staging mechanisms. Consequently, the site can provide a low-latency, high-throughput interface ideal for testing and running asynchronous reconstruction jobs.
Furthermore, to manage the high-volume transfer of raw data between RO-03-UPB and CERN, the site will join the LHCOPN via a 100 Gbps Dense Wavelength Division Multiplexing (DWDM) link. The contribution evaluates this network upgrade alongside alternative solutions. Finally, we address the rigorous availability standards required of a Tier-1 site, detailing the deployment of open-source, industry-standard software for the monitoring, alerting, and backup of Grid services.Speaker: Sergiu Weisz (National University of Science and Technology POLITEHNICA Bucharest (RO)) -
435
Design and Cooling Strategy of the LHCb Data Acquisition System for Run 4
During Long Shutdown 3 (LS3), the LHCb experiment will undergo a major upgrade, requiring a new data centre to cope with the 32 Tb/s of data produced by the detector. Part of the data-acquisition infrastructure, mostly composed of Commercial Off-The-Shelf (COTS) Data Center hardware, must be installed close to the detector, which introduces several challenges, including limited underground space, sustainability constraints, and a rapidly evolving market where computing densities are rising and direct-liquid cooling (DLC) is becoming a new standard. The project aims to build a new 2 MW data center distributed across two floors next to the cavern where the LHCb detector is located using DLC as the primary cooling technology for CPUs, GPUs, and readout boards, combined with in-row air coolers to remove residual heat. We present the design plans, cooling strategy, and integration approach for this next-generation facility, which will support LHCb Data Acquisition during the LHC-HL era.
Speaker: Pierfrancesco Cifra (CERN) -
436
Offloading AI/ML inference as-a-service on “any” remote HPC center
The acceleration of machine learning and domain algorithm inference is increasing in importance as the LHC and other domains seek to improve reconstruction and analysis performance in extreme environments. At the same time, the geographically distributed computing infrastructure model is increasing in complexity, with the introduction of heterogeneous resources (HPC, HTC, cloud). There is corresponding tension between the demand for improved performance and the need for infrastructure flexibility. We will show the results of an initiative that aims to address such a dilemma by integrating two technologies in a real-world deployment. On one side, SONIC (services for optimized network inference on coprocessors) implements an efficient and cloud-friendly framework for GPU-accelerated inference as a service in scientific workflows. On the other side, interLink is a cloud-native solution to allow a Kubernetes cluster to seamlessly orchestrate workloads across supercomputers, HTC grid jobs, and cloud-hosted GPU VMs, through a minimal set of lightweight components. The result is the SONIC setup at Purdue University, where the orchestration of inference workloads is managed by a cloud-native stack deployed on a Kubernetes cluster, while the GPU-enabled inference servers are hosted at a different, Slurm-based cluster and made accessible to Kubernetes via interLink virtual nodes. We will highlight our experience operating such a system, the inference performance for a CMS experiment workflow, and the benefits of the technology stack, along with the current roadmap to address the main points of improvement.
Speaker: Diego Ciangottini (INFN, Perugia (IT))
-
431
-
Track 8 - Analysis infrastructure, outreach and education: Outreach & Visualization
-
437
Revamping the ATLAS Open Data for Outreach and Education
The ATLAS Open Data for Outreach and Education were transformed in 2025, with an entirely new release featuring new (public) ntuple-making infrastructure, and myriad new notebook examples demonstrating everything from fundamental HEP concepts to complex analyses. The focus of the overhaul has been on simplifying the user experience: with just a few clicks, anyone can make a plot from the Open Data. Newly developed web-based applications are designed to be accessible to a wide range of audiences, including, for the first time, non-English speaking audiences. To introduce the user community to the wide range of tools and features, ATLAS has held its first Open Data Tutorial, providing instruction for outreach, education, and research audiences. The ATLAS Open Data initiative continues to widen its reach, with worldwide usage, including in particular new implementations of educational material. This contribution focuses on the new ATLAS Open Data for Outreach and Education release, the new user-facing material that has been developed, and the outcomes from the first Open Data tutorial.
Speaker: Dr Eirik Gramstad (University of Oslo (NO)) -
438
Pathways to Particle Physics: A Scalable Model for High School and Undergraduate Engagement with Particle Physics Computing
This contribution presents a scalable and replicable model to engage high-school and undergraduate students with real-world high energy physics (HEP) computing and analysis. At Washington College, we have integrated hands-on analysis of LHC data into both curricular and co-curricular settings. With support from the NSF LEAPS-MPS program, we organize annual workshops for high school students from rural Maryland, introducing them to HEP concepts, detector design, and data analysis using uproot and Jupyter notebooks. Select workshop participants are subsequently invited to enroll in a semester-long, 1-credit dual-enrollment course, in which they carry out supervised ATLAS-inspired analysis projects. Concurrently, undergraduates in PHY252 (Scientific Modeling and Data Analysis) explore real ATLAS datasets as part of their core curriculum, applying machine learning and statistical modeling to understand particle kinematics and event classification. This initiative demonstrates how modern HEP computing tools can be adapted to educational settings with minimal infrastructure, providing meaningful exposure to scientific research. It also lays the groundwork for a broader effort, Pathways to Particle Physics, which will expand public engagement, develop curriculum-aligned teaching resources, and create structured research pipelines for students at primarily undergraduate institutions and high schools. By embedding HEP computing in early-stage education, this program builds STEM pipelines, enhances data fluency, and fosters inclusive participation in frontier science.
Speaker: Suyog Shrestha (Washington College (US)) -
439
A comprehensive Pedagogical Pipeline for the H to WW Search Using CMS Open Data
The discovery of the Higgs boson by the ATLAS & CMS collaborations at the Large Hadron Collider (LHC) stands as a monumental achievement in particle physics. While the theoretical underpinnings of the Higgs mechanism are widely taught at the university level and substantial data sets have been made publically available, the practical complexities of experimental data analysis, ranging from object reconstruction to statistical interpretation, often remain inaccessible. Comprehensive end-to-end tutorials that mirror real experimental workflows are notably scarce.
We present an example of a complete training and educational focused data analysis based on CMS open data and analysis tools available in the Python ecosystem. This project reconstructs the workflow of a search for the Higgs boson decaying into a pair of W bosons ($H \to WW$) targeting the electron–muon final state produced via gluon–gluon fusion ($ggH$). Using the CMS experiment architecture as a case study, the analysis guides users through the critical stages of modern high-energy physics research. The pipeline includes the implementation of triggers and object reconstruction focusing on lepton identification and the role of missing transverse energy in neutrino inference. This is followed by kinematic and topological cuts to isolate the high-purity signal from the overwhelming background. The pipeline illustrates both data-driven and Monte Carlo techniques to model dominant backgrounds, specifically top-quark pair production and Drell–Yan process, culminating in statistical inference through signal significance estimation and uncertainty evaluation. We discuss how corrections were derived and which analysis aspects can not be easily included without information not publicly available.
This pipeline aims to serve as an illustrative example of an experimental analysis possible using LHC Open Data. Beyond its pedagogical value, the project highlights the educational potential of open data initiatives and provides a modular template adaptable to other physics processes and experiments.
Speaker: Anuj Raghav (University of Delhi (IN)) -
440
ALICE Event Display - lessons learned and future enhancements
Title: ALICE Event Display - lessons learned and future enhancements
Authors: Julian Myrcha on behalf of the ALICE collaboration
Affiliations: Warsaw University of TechnologyAfter two years of continuous development and operation, several lessons have been learned that have led to substantial improvements in the ALICE event visualization system. The current solution allows users across the CERN network to connect to the visualization server and perform visualizations locally on live data as it is being recorded. Importantly, this does not interfere with the displays shown in the ALICE Control Room.
The transition to a web-based technology preserved the overall functionality of the previous ROOT-based solution, which facilitated the adoption of the new system. However, now there is the need for more advanced visual effects: screenshots produced today look very similar to those from previous years. At present, all visualizations are rendered in real time using basic OpenGL technology provided by the Three.js library. This limitation was anticipated during the preparation of the web-based approach by dividing the visualization client into separate modules.
The current solution was designed to be fast and efficient - the crucial requirements in online visualizations. However, event visualizations are also used for publication and outreach purposes - using more advanced techniques may produce better looking ones which may come at the cost of some increase of display time. Replacing the modules allows obtaining enhanced visualization effects.
The improved design also makes it easier for developers to create custom visualizations by leveraging the existing architecture and implementing only visualization plugins to display clusters and tracks in different ways. This flexibility benefits not only publication-quality screenshots but also additional features, such as simple animations.
Finally, the visualization architecture has been extended with optional communication based on sockets instead of files, providing further flexibility in data handling and integration.
Speaker: Julian Myrcha (Warsaw University of Technology (PL)) -
441
CMS Open Data Visualization with FireworksWeb
FireworksWeb is a web-based event display utilizing a C++ ROOT/EVE backend with SAPUI5 frontend for interactive 3D visualization of particle physics events directly in the browser. Building upon ROOT/EVE7 and RenderCore, it eliminates local software installation while maintaining professional-grade event display capabilities. FireworksWeb is currently deployed for live event monitoring in the CMS control room, demonstrating its reliability for real-time data visualization in production environments.
The CMS experiment has made extensive collision datasets publicly available through the CERN Open Data portal, providing valuable resources for education, outreach, and research. We have extended FireworksWeb to enable visualization of these open datasets, making particle physics data accessible to diverse audiences without requiring local software installation.
Our prototype implementation includes customizable camera perspectives, event filters, configurable collection settings, and projection controls with fish-eye distortion for cylindrical geometries. The web architecture communicating via CGI enables dataset queries and filtered event selection based on physics criteria.
We will demonstrate the prototype with CMS Open Data events, discuss the technical architecture, and outline applications for education, public outreach, and exploratory physics analysis.Speaker: Yuxiao Wang (Tsinghua University (CN)) -
442
Zero-Download Visualization: Accelerating Remote ROOT Analysis via Server-Side Range Slicing and Data Reduction
Analyzing ROOT files stored in remote Data Lakes (S3) presents a significant bottleneck: traditional workflows requiring full file downloads incur high latency, while pure client-side solutions (e.g., JSROOT) frequently cause browser memory exhaustion (OOM) when parsing gigabyte-scale binaries.
To resolve this, we developed a lightweight, hybrid visualization microservice that decouples data retrieval from rendering. The system’s core contribution is the integration of intelligent HTTP Range Slicing with Server-Side Data Reduction. By leveraging Uproot and fsspec, the backend issues precise byte-range requests to S3, retrieving only the minimal headers and baskets required—eliminating the need for full downloads. Subsequently, the system employs a hybrid processing strategy: binary objects are processed server-side and reduced to essential drawing primitives. Benchmarks demonstrate that even high-statistics TH1 histograms (e.g., aggregating over 11,000 entries) are serialized into compact JSON payloads of merely 1.0 KB to 1.7 KB (corresponding to standard binning), independent of the underlying statistical scale. This extreme reduction allows for instantaneous "Time-to-First-Plot" and enables high-speed transmission using ORJSON without the CPU overhead of additional compression. Furthermore, a transparent proxy fallback ensures 100% compatibility for complex objects, providing a robust, cloud-native solution for HEP data analysis.Speaker: 隗立畅 weilc (IHEP)
-
437
-
Track 9 - Analysis software and workflows
-
443
ROOT 7: Getting Ready for HL-LHC
Two years before the start of the High-Luminosity LHC, the ROOT project will evolve to its 7th release cycle. This contribution will explain ROOT's release schedule, and discuss new features being developed for ROOT 7 such as RFile or a high-performance histogram package to support concurrent filling. ROOT 7 is also planned to introduce a change in ROOT's object ownership model, allowing for switching off implicit ownership of objects such as histograms. Finally, we will show how users can adopt a ROOT-7 way of working already today, profiting from modern interfaces such as RDataFrame, RNTuple or RFile, and making the transition to ROOT 7 a minimal change.
Speaker: Stephan Hageboeck (CERN) -
444
ROOT's new Python Interfaces based on CppInterOp: Performance, Correctness, and Future-Proof Design
The ROOT Python interfaces are a cornerstone of HENP analysis workflows, enabling rapid development while retaining access to high-performance C++ code. In this contribution, we present a major upcoming update to the backend powering the dynamic C++ bindings generation, based on the new CppInterOp library.
For ROOT users, this migration translates directly into a better experience: faster bindings creation in several cases, reduced memory consumption, and improved correctness when using advanced C++ language features. The new backend relies more directly on the Clang API to understand C++ semantics directly, eliminating a number of long-standing issues caused by heuristic approaches and string-based type manipulation. This will also make it easier to support upcoming C++ language features and implement sophisticated Pythonizations.
Beyond immediate gains, this work prepares ROOT's Python interfaces for future evolution. By supporting both Cling and Clang-REPL via CppInterOp, the new Python interface backend aligns with ongoing efforts towards a transparent migration from Cling to Clang-REPL in ROOT, while continuing to support existing user code.
Furthermore, our work showcases the latest developments in CppInterOp, enabling cutting-edge R&D in the domain of language bindings by providing better encapsulation of C++ reflection via Clang.We describe the motivation for the updated ROOT Python bindings migration, highlight user-visible improvements in performance and memory usage, and demonstrate how the new backend strengthens ROOT as a future-proof framework for HENP data analysis in Python.
Speaker: Aaron Jomy (CERN) -
445
Enhancing LLM for HEP code generation
In High Energy Physics (HEP), the demand for high-quality andefficient code is essential for data processing and analysis. However, Large Language Models (LLMs), while proficient in general programming, exhibit significant inaccuracies when generating specialized HEP code, reflected in a high failure rate. At the same time, a more complex offline software system will be necessary to adapt to future experiments. To address these challenges, this work develops a CodeGraph model tailored for complex C++ HEP software to enhance LLM-assisted code generation. Using the ROOT framework as an initial dataset (comprising about 21k files), we constructed a comprehensive CodeGraph with over 50,000 nodes and 160,000 edges by employing the Tree-sitter parser and regex-based methods. Built upon this graph, our CodeGraphRAG framework is designed to be useful for HEP analysis agent. It works by retrieving the minimal query-relevant code subgraph and using that subgraph as a prompt for LLM to generate new code.Preliminary tests indicate that CodeGraphRAG provides tangible assistance in code generation. In future work, we will construct a benchmark dataset for HEP code generation to evaluate the performance more accurately and further optimize the overallsystem.
Speaker: Yue Sun (The Institute of High Energy Physics of the Chinese Academy of Science) -
446
From Query to Plot: Implementing a Tool-Based LLM Framework for ATLAS Analysis
Large Language Models (LLMs) can serve as connective elements within ATLAS analysis workflows, linking data-discovery utilities, columnar data-delivery systems, and analysis-level plotting frameworks. Building on earlier exploratory studies of LLM-generated plotting code, we now focus on an implementable architecture suitable for real use. The system is decomposed into reusable Model Context Protocol (MCP) tools that handle key tasks: ATLAS dataset and metadata lookup, luminosity and auxiliary data retrieval, and orchestration of ServiceX with both Pythonic Awkward Array–based analysis and ROOT RDataFrame workflows. A user supplies a high-level request—such as a variable to plot from a given dataset—and the toolset resolves dataset identifiers, fetches required metadata, generates an analysis snippet consistent with ATLAS conventions, and produces a complete plotting workflow. We describe the design of this modular tool layer, the improvements in robustness and determinism over earlier prototypes, and the path toward a lightweight, practical ATLAS plot-generation assistant that can be embedded in broader systems.
Speaker: Gordon Watts (University of Washington (US)) -
447
Towards a Complete Search for New Physics: Active Learning in the 19-dimensional pMSSM
Despite decades of searching for the true nature of dark matter, no compelling evidence of its particle nature has been found. Without this evidence, the targets of searches for new physics must be carefully re-evaluated in terms of their theoretical completeness and experimental relevance. Exploring high-dimensional parameter spaces, such as the 19-dimensional phenomenological Minimal Supersymmetric Standard Model (pMSSM), is vital for identifying models that can explain all experimental observations. However, efficiently exploring such high-dimensional parameter spaces remains extremely challenging. In this work, we explore novel approaches to efficient parameter space exploration, such as machine learning-guided exploration and Gaussian process-based active learning and compare to traditional sampling methods. In particular, we present results of demonstrating the scalability of active learning in the context of the 19-dimensional pMSSM. We discuss how these approaches can help to identify gaps in the sensitivity of collider experiments to new physics, thereby guiding the search for new physics towards completeness.
Speaker: Jonas Wurzinger (Technische Universitat Munchen (DE)) -
448
Comparative study of Tabular Foundation Models for particle physics with the FAIR Universe HiggsML Dataset
In collider-based particle physics experiments, independent events are commonly represented as tabular datasets of high-level variables, an approach widely used in multivariate and machine learning analyses. Inspired by the success of foundation models in language and vision, recent developments have introduced tabular foundation models such as TabNet (Google), TabTransformer (Amazon), TABERT (Facebook AI), and tabPFN.
We explore the use of pretrained tabular models in collider physics analyses, focusing on performance in data-limited regimes. PFNs are benchmarked against boosted decision trees and neural networks across classification and regression tasks using the FAIR Universe dataset. We discuss performance, data efficiency, and computational trade-offs, and assess the potential role of pretrained tabular models in HEP analysis workflows.Speaker: Ragansu Chakkappai (IJCLab-Orsay)
-
443
-
19:00
Conference Dinner
-
-
-
Wrap up
-
10:30
Break
-
Wrap up
-
Closing Ceremony
-
12:30
Lunch
-