-
Eric Vaandering (Fermi National Accelerator Lab. (US))25/05/2026, 13:45Track 1 - Data and metadata organization, management and accessOral Presentation
In 2025, Fermilab transitioned from its legacy tape storage management software, Enstore, to CTA (CERN Tape Archive).
The replacement system was adapted to satisfy Fermilab use cases, including the ability to read existing data off of Enstore formatted tapes. The new system also includes the ability to read aggregated files from containers, which were managed by Enstore, to maintain good...
Go to contribution page -
Fernando Harald Barreiro Megino (University of Texas at Arlington), Mr Mikhail Borodin (CERN), Misha Borodin (University of Texas at Arlington (US))25/05/2026, 14:03Track 1 - Data and metadata organization, management and accessOral Presentation
In the current ATLAS Distributed Computing model, available disk capacity is insufficient to store even a single complete copy of all data actively in use. Consequently, tape systems serve not only as long-term backups but also as primary data sources. Efficient utilization of tapes at the ATLAS scale requires specialized orchestration mechanisms, as tape access is inherently slower and...
Go to contribution page -
Xin Zhao (Brookhaven National Laboratory (US))25/05/2026, 14:21Track 1 - Data and metadata organization, management and accessOral Presentation
The High Luminosity upgrade to the LHC (HL-LHC) is expected to generate scientific data on the scale of the multiple exabytes. To address this unprecedented data storage challenge, the ATLAS experiment launched the Data Carousel project in 2018, which entered production in 2020. In the Data Carousel workflow, jobs receive input data from tapes seamlessly for user payloads. It represents a...
Go to contribution page -
Alice-Florenta Suiu (National University of Science and Technology POLITEHNICA Bucharest (RO))25/05/2026, 14:39Track 1 - Data and metadata organization, management and accessOral Presentation
The ALICE detector at the CERN LHC generates petabyte-scale raw datasets during heavy-ion collision runs, which must undergo a multi-stage offline reconstruction cycle. EOSALICEO2 serves as the primary high-performance disk buffer for ALICE operations, both during data taking and data processing, providing the sustained throughput necessary for large-scale parallel reconstruction workflows....
Go to contribution page -
Ankush Reddy Kanuganti (Brookhaven National Laboratory)25/05/2026, 14:57Track 1 - Data and metadata organization, management and accessOral Presentation
For 25 years, the STAR experiment at the Relativistic Heavy Ion Collider (RHIC) has accumulated a significant archive of metadata, supported by extensive web-based tools. As the collaboration transitions into a "long-term preservation phase," a key priority is ensuring sustained access to these critical web interfaces in a self-contained and maintenance-free format. Preserving essential...
Go to contribution page -
Hugo Gonzalez Labrador (CERN)25/05/2026, 16:15Track 1 - Data and metadata organization, management and accessOral Presentation
In this contribution, we present a new Rucio-based service designed specifically to simplify data management for the Small and Medium experiments at CERN.
Rucio has become the de-facto data management solution for major experiments in high-energy physics and related scientific domains such as astrophysics, providing a scalable, policy-driven framework for distributed data placement,...
Go to contribution page -
Dijana Vrbanec25/05/2026, 16:33Track 1 - Data and metadata organization, management and accessOral Presentation
The interTwin project, funded by Horizon Europe, developed a Digital Twin Engine (DTE), a platform for the development and running of Digital Twins across multiple scientific domains. A central component of the DTE is the interTwin Data Lake, a federated storage layer that integrates HPC, HTC, and cloud-based datasets and provides unified access while preserving site-local policies and...
Go to contribution page -
James Collinson (SKAO)25/05/2026, 16:51Track 1 - Data and metadata organization, management and accessOral Presentation
The Square Kilometre Array (SKA) telescopes, currently under construction in South Africa and Australia, are due to enter Science Verification at the end of 2026. From this point, these interferometers will generate an increasing volume of data, with the science data processors eventually producing of order 1 PB per day of science-ready data products. Managing this archive across the globally...
Go to contribution page -
Federica Legger (Universita e INFN Torino (IT))25/05/2026, 17:09Track 1 - Data and metadata organization, management and accessOral Presentation
Large-scale scientific experiments, such as those in gravitational-wave (GW) science, produce extensive datasets that are often stored in isolated data lakes. The second-generation interferometersโLIGO, Virgo, and KAGRAโare part of an international scientific network, the International Gravitational-Wave Observatory Network (IGWN). A similar framework is envisaged for the third-generation...
Go to contribution page -
Benjamin Gutierrez (Argonne National Laboratory), Doug Benjamin (Brookhaven National Laboratory (US)), Douglas Benjamin25/05/2026, 17:27Track 1 - Data and metadata organization, management and accessOral Presentation
Rucio, the scientific data management system developed by ATLAS at CERN, has become widely adopted across high-energy physics experiments for managing distributed datasets at exabyte scale. Traditionally, Rucio relies on the WLCG File Transfer Service (FTS) for data movement between storage elements. We present recent developments enabling Globusโthe research cyberinfrastructure platform...
Go to contribution page -
James William Walder (Science and Technology Facilities Council STFC (GB))25/05/2026, 17:45Track 1 - Data and metadata organization, management and accessOral Presentation
The SKA Regional Centre Network (SRCNet) is a globally federated infrastructure providing data distribution and science workflows for the Square Kilometre Array (SKA). The v0.1 test campaign delivered the first system-level validation across nine accredited nodes, integrating global services (Rucio, FTS, SKA-IAM, perfSONAR) with site services (storage, compute, science platforms) and executing...
Go to contribution page -
Katy Ellis (Science and Technology Facilities Council STFC (GB))26/05/2026, 13:45Track 1 - Data and metadata organization, management and accessOral Presentation
For Phase-II of the Large Hadron Collider program, a dramatic increase in data quantity is expected due to increased pileup, higher experiment logging rates and a larger number of channels in the upgraded detector components. For Run-4, beginning in around 2030, and using the current computing model without software improvements, CMS estimates growth of an order of magnitude in computing...
Go to contribution page -
Alessandra Forti (The University of Manchester (GB))26/05/2026, 14:03Track 1 - Data and metadata organization, management and accessOral Presentation
The WLCG Data Challenge 2027 (DC27) represents a critical milestone in preparing our global distributed computing and networking infrastructure for the demands of HL-LHC and next-generation data-intensive experiments. Building on the successes and lessons learned from previous challenges, the DC27 program is driven by a coordinated series of mini-capability and mini-capacity challenges. These...
Go to contribution page -
Nicola Pace26/05/2026, 14:21Track 1 - Data and metadata organization, management and accessOral Presentation
Version 4 of the File Transfer Service (FTS4) is currently under active development within the CERN IT Storage group. This project aims to address issues which have prevented version 3 from being proposed as a candidate for automating bulk file-transfers during LHC Physics Run 4.
FTS4 has taken an incremental rather than big-bang approach to its development. FTS4 started with the FTS3...
Go to contribution page -
Aashay Arora (Univ. of California San Diego (US))26/05/2026, 14:39Track 1 - Data and metadata organization, management and accessOral Presentation
Given the increased amount of data expected during the HL-LHC and the escalation of data transfers that this implies, it becomes of paramount importance to have control over the available network bandwidth and the ability to allocate this bandwidth for high-priority and time sensitive data flows.
The Rucio/SENSE integration project intends to provide Rucio with Software Defined Networking...
Go to contribution page -
Diogo Castro (CERN)26/05/2026, 14:57Track 1 - Data and metadata organization, management and accessOral Presentation
CERNBox is a leading participant in the emerging European sync-and-share federation effort, promoting interoperable, standards-based collaboration across scientific communities. As an active contributor to European E-Infrastructures, it plays a key role in shaping open, federated data services. This contribution will present recent work on integrating CERNBox into the current sync-and-share...
Go to contribution page -
Borja Garrido Bear (CERN), Panos Paparrigopoulos (CERN)26/05/2026, 16:15Track 1 - Data and metadata organization, management and accessOral Presentation
In preparation for Run-4 and the HL-LHC era, WLCG has initiated the redesign of its XRootD monitoring to provide a coherent and scalable view of data-access activity across distributed sites and experiments. Developed in close collaboration with CMS, the new architecture aims to serve both WLCG-level needs for global observability (such as assessing traffic patterns and validating large-scale...
Go to contribution page -
Rahul Chauhan (University of Wisconsin Madison (US))26/05/2026, 16:33Track 1 - Data and metadata organization, management and accessOral Presentation
The XRootD redirector plays a key Role in CMS Experiment's global data access infrastructure, determining where clients are sent to retrieve data across a heterogeneous, worldwide set of storage endpoints. The redirector has traditionally emphasised simplicity and performance; Its decisions tend to be opaque and based on limited inputs. This can lead to erroneous redirections, such as sending...
Go to contribution page -
James Letts (UCSD)26/05/2026, 16:51Track 1 - Data and metadata organization, management and accessOral Presentation
Contemporary research relies heavily on computational resources and storage, with data sharing serving as a critical element. Data access remains a central challenge. The Open Science Data Federation (OSDF) project aims to establish a global scientific data distribution network by leveraging the Pelican Platform and the National Research Platform (NRP). OSDF is based on the XRootD and Pelican...
Go to contribution page -
Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))26/05/2026, 17:09Track 1 - Data and metadata organization, management and accessOral Presentation
DataHarbor is a modern web application designed to provide researchers with secure, intuitive access to large-scale data stored on distributed storage systems through the XRootD protocol. The system provides a web-based file browser that enables seamless directory navigation, metadata inspection, and on-demand file downloads. Files are streamed directly from XRootD storage to the user's...
Go to contribution page -
Andrea Piccinelli (University of Notre Dame (US))26/05/2026, 17:27Track 1 - Data and metadata organization, management and accessOral Presentation
The Notre Dame CMS XRootD storage element, originally designed to handle traditional CMSSW workloads, underwent heavy I/O wait saturation when dealing with new data analysis workloads based on columnar analysis frameworks. These new workloads, using tools such as Uproot (to load data into structures such as Awkward Arrays), have revolutionized the I/O profile. This presentation starts by...
Go to contribution page -
Chin Guok (ESnet)26/05/2026, 17:45Track 1 - Data and metadata organization, management and accessOral Presentation
The rapid growth of data volumes in high-energy physics (HEP) collaborations, such as the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC), has necessitated the adoption of regional in-network caching strategies to mitigate data access latency. However, these caches often exhibit varying efficiencies across locations due to differing access patterns and storage...
Go to contribution page -
zhuo meng (Institute of High Energy Physics)27/05/2026, 13:45Track 1 - Data and metadata organization, management and accessOral Presentation
Currently High Energy Physics (HEP) faces increasingly severe data storage challenges. Next-generation particle collider experiments are expected to generate unprecedented data volumes and acquisition rates, demanding continuous I/O capabilities with sub-milliseconds PB/s-level throughput. Traditional kernel-based file systems, burdened by context switching, interrupt handling, and heavy...
Go to contribution page -
Inga Katarzyna Lakomiec (Georg August Universitaet Goettingen (DE))27/05/2026, 14:03Track 1 - Data and metadata organization, management and accessOral Presentation
German computing sites play a vital role in the Large Hadron Collider (LHC) job processing and data storage as part of the Worldwide LHC Computing Grid (WLCG). The storage and computing contributions of university-based Tier-2 centres in Germany are transitioning to the Helmholtz Centres and National High Performance Computing (NHR) sites, respectively, to meet the growing data and...
Go to contribution page -
Hugo Gonzalez Labrador (CERN)27/05/2026, 14:21Track 1 - Data and metadata organization, management and accessOral Presentation
Large-scale scientific collaborations such as WLCG need reliable and secure data transfers that optimize the available bandwidth and resources of the grid. HTTP-based third-party copy (TPC) transfers follow a de-facto community standard for moving files directly between storage endpoints (peer-to-peer). Here we report on an extension to that standard promoting improved data integrity through...
Go to contribution page -
Andreea Prigoreanu (IT-SD)27/05/2026, 14:39Track 1 - Data and metadata organization, management and accessOral Presentation
Author: Andreea Prigoreanu (University Politechnica Bucharest)
on behalf of the ALICE collaborationThe processing of ALICE experiment data relies on high-quality and reliable storage. The central file catalogue serves as the database that tracks over 2.6 billion files and their locations across more than 50 storage elements on the ALICE Grid. It is essential that the physical storage...
Go to contribution page -
LI Haibo lihaibo27/05/2026, 14:57Track 1 - Data and metadata organization, management and accessOral Presentation
In high energy physics (HEP) experiments, large-scale storage clusters typically comprise tens of thousands of disks, and their reliability is essential for continuous data acquisition, processing, and long-term preservation. Traditional rule-based disk failure detection approaches are increasingly insufficient for such environments due to heterogeneous device types, complex workload patterns,...
Go to contribution page -
Mr Ivan Knezevic (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))27/05/2026, 16:15Track 1 - Data and metadata organization, management and accessOral Presentation
The NAPMIX project aims to establish a cross-domain FAIR-compliant metadata schema for the Nuclear, Astro, and Particle (NAP) physics communities. A core challenge is reconciling the evolving nature of experimental metadata, enriched progressively from proposal through analysis, with the immutability required by Persistent Identifiers (DOIs) for findability and interoperability. This...
Go to contribution page -
Francesco Giacomini (INFN CNAF)27/05/2026, 16:15Track 1 - Data and metadata organization, management and accessOral Presentation
The StoRM system provides storage services for scientific communities relying on distributed computing infrastructures through multiple loosely-coupled components developed in different programming languages at INFN-CNAF, including StoRM WebDAV and StoRM Tape. StoRM WebDAV is a StoRM component which provides HTTP/WebDAV access to distributed storage systems, while StoRM Tape is an...
Go to contribution page -
Dmitry Litvintsev (Fermi National Accelerator Lab. (US)), Marina Sahakyan, Mr Tigran Mkrtchyan (DESY)27/05/2026, 16:33Track 1 - Data and metadata organization, management and accessOral Presentation
The dCache project provides an open-source, highly scalable distributed storage system deployed at numerous laboratories worldwide. Its modular architecture supports high-rate data ingestion, WAN data distribution, efficient HPC access, and long-term archival storage. Although initially developed for high-energy physics, dCache now serves a broad range of scientific communities with diverse...
Go to contribution page -
Dr Victoria Tokareva (Karlsruhe Institute of Technology)27/05/2026, 16:33Track 1 - Data and metadata organization, management and accessOral Presentation
The PUNCH4NFDI consortium (Particles, Universe, NuClei and Hadrons for German National Research Data Infrastructure) comprises astro-, astroparticle, particle and nuclear physicsโcommunities historically employing computationally-intensive research on big data. The data life cycles are characterized by worked out data curation practices; highly diverse metadata, being embedded in custom file...
Go to contribution page -
Dario Barberis (University of California Berkeley (US))27/05/2026, 16:51Track 1 - Data and metadata organization, management and accessOral Presentation
The ATLAS EventIndex is the global catalogue of all real and simulated data produced and processed by ATLAS. The current implementation, developed and deployed for LHC Run 3 (2022-2026) has to evolve in order to be able to ingest, store and serve the much larger amount of data that will be produced during the High-Luminosity LHC operation years, starting in 2030. The modular architecture of...
Go to contribution page -
Mr Tigran Mkrtchyan (DESY)27/05/2026, 16:51Track 1 - Data and metadata organization, management and accessOral Presentation
POSIX access remains the de facto dominant access mechanism in HPC environments, defining how applications and workflows interact with large-scale storage systems. With its NFSv4.1/pNFS protocol implementation, dCache provides a native integration into HPC environment supporting a large number of scientific applications.
The recent development efforts in dCache have concentrated on...
Go to contribution page -
Andreas Joachim Peters (CERN)27/05/2026, 17:09Track 1 - Data and metadata organization, management and accessOral Presentation
EOS, CERNโs large-scale storage system, is continuously evolving to support increasingly diverse and performance-critical scientific workflows. As part of this evolution, we are considering NFS 4.2 as a strategic new protocol for EOS in order to extend its interoperability, leverage kernel-level client performance, and open a path for community collaboration based on open...
Go to contribution page -
Victoria Tokareva27/05/2026, 17:09Track 1 - Data and metadata organization, management and accessOral Presentation
High-energy, nuclear and astroparticle physics operate at comparable scales of data volume and complexity and face closely related challenges in data preservation, metadata management, and long-term reuse. While these communities have developed robust experiment-specific data curation practices, metadata remains highly specific and heterogeneous, tightly coupled to custom formats, frameworks...
Go to contribution page -
Andreas Joachim Peters (CERN)27/05/2026, 17:27Track 1 - Data and metadata organization, management and accessOral Presentation
As part of the CERN Storage Groupโs technology investigations, we are exploring future-proof, scalable interactive service architectures that meet demanding requirements for performance and maintainability.
Go to contribution page
To achieve this, we are focusing on storage solutions that provide Linux-native filesystem access using open, standards-compliant technologies capable of securely supporting tens of... -
Eli Mizrachi (SLAC National Accelerator Laboratory)27/05/2026, 17:27Track 1 - Data and metadata organization, management and accessOral Presentation
LUX-ZEPLIN (LZ) is the worldโs most sensitive WIMP dark matter direct-detection experiment, acquiring petabytes of data per year using a dual-phase xenon time projection chamber (TPC) with a seven tonne active mass. User-facing metadata related to TPC conditions and data processing environments are stored in six different SQL and NoSQL databases, which historically were accessed by five...
Go to contribution page -
Hubert Simma (DESY)27/05/2026, 17:45Track 1 - Data and metadata organization, management and accessOral Presentation
In this contribution we report on the re-factoring and re-configuration of main components of the International Lattice Data Grid (ILDG) in order to realize a modern data management framework which is fully FAIR-compliant and has a completely token-based access control.
ILDG started 20 years ago as an effort of the Lattice QCD community to organize and enable the worldwide sharing of large...
Go to contribution page -
้็ซ็ weilc (IHEP)27/05/2026, 17:45Track 1 - Data and metadata organization, management and accessOral Presentation
With the continuous advancement of HEP detectors and online reconstruction capabilities, the scale of experimental data is growing rapidly. The data pattern is increasingly characterized by "massive small files distributed across multiple data centers." On one hand, the surge in small files creates bottlenecks in metadata and directory operations; on the other hand, cross-data center access...
Go to contribution page -
Wesley Patrick Kwiecinski (University of Illinois Chicago)28/05/2026, 13:45Track 1 - Data and metadata organization, management and accessOral Presentation
Efficient data access is becoming increasingly important for high-energy physics (HEP) workflows on HPC systems. Large datasets, a greater degree of concurrency (multi-process and multithreading), and complex event formats can lead to hidden performance issues. The HEP-CCE/SOP group used the Darshan I/O characterization tool to identify data re-operations in representative HEP workflows, using...
Go to contribution page -
Gianmaria Del Monte (CERN)28/05/2026, 14:03Track 1 - Data and metadata organization, management and accessOral Presentation
As the scale and complexity of high-energy physics computing grows, storage systems are being pushed to serve radically diverse workloads at once, often with significant performance consequences. To ensure EOS can meet these evolving demands, we introduce a real-time I/O traffic-shaping framework that monitors ongoing I/O patterns and dynamically adjusts and balance read/write flows to...
Go to contribution page -
Felice Pantaleo (CERN)28/05/2026, 14:21Track 1 - Data and metadata organization, management and accessOral Presentation
Efficient data processing using machine learning relies on heterogeneous computing approaches, but optimizing input and output data movements remains a challenge. In GPU-based workflows data already resides on GPU memory, but machine learning models requires the input and output data to be provided in specific tensor format, often requiring unnecessary copying outside of the GPU device and...
Go to contribution page -
Mr Akshat Gupta28/05/2026, 14:39Track 1 - Data and metadata organization, management and accessOral Presentation
The petabyte-scale data generated annually by High Energy Physics (HEP) experiments like those at the Large Hadron Collider present a significant data storage challenge. Whilst traditional algorithms like LZMA and ZLIB are widely used, they often fail to exploit the deep structure inherent in scientific data. We investigate the application of modern state space models (SSMs) to this problem,...
Go to contribution page -
Bralyne Matoukam (University of the Witwatersrand)28/05/2026, 14:57Track 1 - Data and metadata organization, management and accessOral Presentation
The ATLAS experiment at the CERN Large Hadron Collider (LHC) records and processes large amounts of data from proton-proton collisions. With the upcoming High-Luminosity LHC (HL-LHC), the data volume is expected to increase by more than an order of magnitude, posing new challenges for storage, data throughput, and analysis scalability.
Go to contribution page
Currently, all major production output formats support... -
Ruslan Mashinistov (Brookhaven National Laboratory (US))28/05/2026, 16:15Track 1 - Data and metadata organization, management and accessOral Presentation
The HSF Conditions Database (CDB) is a community-driven solution for managing conditions data - non-event data required for event processing - which present common challenges across HENP and astro-particle experiments. In the three years of production operation for sPHENIX at BNL, where the HSF CDB supports over 70,000 concurrent jobs on a farm running 132,000 logical cores, it has evolved...
Go to contribution page -
Andrea Formica (Universitรฉ Paris-Saclay (FR))28/05/2026, 16:33Track 1 - Data and metadata organization, management and accessOral Presentation
The ATLAS experiment is redesigning its Conditions database infrastructure in preparation for Run 4. The new system (CREST - Conditions REST) adopts a multi-tier architecture in which interactions with all databases including the Trigger physics configuration database are mediated through a web-based server layer using a REST API. The data caching is provided via Varnish HTTP proxies. We...
Go to contribution page -
Ilija Vukotic (University of Chicago (US))28/05/2026, 16:51Track 1 - Data and metadata organization, management and accessOral Presentation
Efficient access to Conditions data is critical for data processing in the ATLAS experiment at the LHC. For more than a decade, Squid HTTP proxies deployed across distributed computing sites have provided low-latency access, reduced WAN bandwidth consumption, and protected origin servers from excessive load. Conditions data traffic is characterized by exceptionally high request rates - often...
Go to contribution page -
Martin รines Eide (Western Norway University of Applied Sciences (NO))28/05/2026, 17:09Track 1 - Data and metadata organization, management and accessOral Presentation
Authors:
- Martin รines Eide, Western Norway University of Applied Sciences,
University of Bergen, Bergen, Norway and European Organization for
Nuclear Research (CERN), Geneva, Switzerland
- Costin Grigoras, European Organization for Nuclear Research (CERN), Geneva,
Switzerland
on behalf of the ALICE collaboration
The ALICE experiment at CERN relies on a...
Go to contribution page -
Gerhard Immanuel Brandt (Bergische Universitaet Wuppertal (DE))28/05/2026, 17:27Track 1 - Data and metadata organization, management and accessOral Presentation
In its high luminosity phase, the Large Hadron Collider (LHC) will
Go to contribution page
achieve unprecedented levels of instantaneous luminosity
of up to $7.5\times10^34 $cm$^2$s$^{-1}$, which exposes the ITk (Inner Tracker) Pixel
detector of the ATLAS experiment to extraordinary levels of radiation.
A maximum fluence of $9.2\times10^15$cm$^{-2} 1$MeV $n_{eq}$ in the harshest radiation
region at the innermost... -
Julius Hrivnac (Universitรฉ Paris-Saclay (FR))28/05/2026, 17:45Track 1 - Data and metadata organization, management and accessOral Presentation
This contribution presents the architecture and implementation of an intelligent database system for astronomical alerts produced by the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). The system is designed to support efficient exploration of large-scale alert streams through both traditional query mechanisms and advanced...
Go to contribution page
Choose timezone
Your profile timezone: