Conference on Computing in High Energy and Nuclear Physics

Name: Conference on Computing in High Energy and Nuclear Physics
Start: 2024-10-19T08:00:00+02:00
End: 2024-10-25T18:30:00+02:00
Location: No location set

19–25 Oct 2024

Europe/Zurich timezone

Contact Program Chairs

chep2024-pc@cern.ch

Session

Parallel (Track 1)

21 Oct 2024, 16:15

Parallel (Track 1): Data and Metadata Organization, Management and Access

Ruslan Mashinistov (Brookhaven National Laboratory (US))
Lucia Morganti

Parallel (Track 1): Data and Metadata Organization, Management and Access

Tigran Mkrtchyan (DESY)
Lucia Morganti

Parallel (Track 1): Data and Metadata Organization, Management and Access

Samuel Cadellin Skipsey
Ruslan Mashinistov (Brookhaven National Laboratory (US))

Parallel (Track 1): Data and Metadata Organization, Management and Access

Tigran Mkrtchyan (DESY)
Lucia Morganti

Parallel (Track 1): Data and Metadata Organization, Management and Access

Samuel Cadellin Skipsey
Lucia Morganti

Parallel (Track 1): Data and Metadata Organization, Management and Access

Samuel Cadellin Skipsey
Tigran Mkrtchyan (DESY)

Parallel (Track 1): Data and Metadata Organization, Management and Access

Ruslan Mashinistov (Brookhaven National Laboratory (US))
Tigran Mkrtchyan (DESY)

There are no materials yet.

192. ATLAS WLCG Data Challenge 2024 planning and implementation

Alessandra Forti (University of Manchester (GB))

21/10/2024, 16:15

Track 1 - Data and Metadata Organization, Management and Access

Talk

ATLAS is participating in the WLCG Data Challenges, a bi-yearly program established in 2021 to prepare for the data rates of the High Luminosity HL-LHC. In each challenge, transfer rates are increased to ensure preparedness for the full rates by 2029. The goal of the 2024 Data Challenge (DC24) was to reach 25% of the HL-LHC expected transfer rates, with each experiment deciding how to execute...

35. Data Challenge 2024 - CMS activities

Christoph Wissing (Deutsches Elektronen-Synchrotron (DE))

21/10/2024, 16:33

Track 1 - Data and Metadata Organization, Management and Access

Talk

To verify the readiness of the data distribution infrastructure for the HL-LHC, which is planned to start in 2029, WLCG is organizing a series of data challenges with increasing throughput and complexity. This presentation addresses the contribution of CMS to Data Challenge 2024, which aims to reach 25% of the expected network throughput of the HL-LHC. During the challenge CMS tested various...

64. Next-Gen Storage Infrastructure for ALICE: Paving the Road Toward Hi-Luminosity LHC

Andreas Joachim Peters (CERN), Elvin Alin Sindrilaru (CERN)

21/10/2024, 16:51

Track 1 - Data and Metadata Organization, Management and Access

Talk

ALICE introduced ground-breaking advances in data processing and storage requirements and presented the CERN IT data centre with new challenges with the highest data recording requirement of all experiments. For these reasons, the EOS O2 storage system was designed to be cost-efficient, highly redundant and maximise data resilience to keep data accessible even in the event of unexpected...

89. Scitags: A Standardized Framework for Traffic Identification and Network Visibility in Data-Intensive Research Infrastructures

Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Marian Babik (CERN), Tristan Sullivan (University of Victoria)

21/10/2024, 17:09

Track 1 - Data and Metadata Organization, Management and Access

Talk

High-Energy Physics (HEP) experiments rely on complex, global networks to interconnect collaborating sites, data centers, and scientific instruments. Managing these networks for data-intensive scientific projects presents significant challenges because of the ever-increasing volume of data transferred, diverse project requirements with varying quality of service needs, multi-domain...

379. Achieving 100Gb/s data rates with XRootD - Preparing for HL-HLC and SKA

James William Walder (Science and Technology Facilities Council STFC (GB))

21/10/2024, 17:27

Track 1 - Data and Metadata Organization, Management and Access

Talk

To address the needs of forthcoming projects such as the Square Kilometre Array (SKA) and the HL-LHC, there is a critical demand for data transfer nodes (DTNs) to realise O(100)Gb/s of data movement. This high-throughput can be attained through combinations of increased concurrency of transfers and improvements in the speed of individual transfers. At the Rutherford Appleton Laboratory...

348. Enhancing XRootD Load Balancing for High-Throughput transfers

Thomas Byrne, Thomas Jyothish (STFC)

21/10/2024, 17:45

Track 1 - Data and Metadata Organization, Management and Access

Talk

To address the need for high transfer throughput for projects such as the LHC experiments, including the upcoming HL-LHC, it is important to make optimal and sustainable use of our available capacity. Load balancing algorithms play a crucial role in distributing incoming network traffic across multiple servers, ensuring optimal resource utilization, preventing server overload, and enhancing...

101. Evolution of the CERN Tape Archive Scheduling System

Dr Jaroslav Guenther (CERN)

22/10/2024, 13:30

Track 1 - Data and Metadata Organization, Management and Access

Talk

The CERN Tape Archive (CTA) scheduling system implements the workflow and lifecycle of Archive, Retrieve and Repack requests. The transient metadata for queued requests is stored in the Scheduler backend store (Scheduler DB). In our previous work, we presented the CTA Scheduler together with an objectstore-based implementation of the Scheduler DB. Now with four years of experience in...

102. Challenges of repack in the era of the high-capacity tape cartridge

Joao Afonso (CERN)

22/10/2024, 13:48

Track 1 - Data and Metadata Organization, Management and Access

Talk

The latest tape hardware technologies (LTO-9, IBM TS1170) impose new constraints on the management of data archived to tape. In the past, new drives could read the previous one or even two generations of media, but this is no longer the case. This means that repacking older media to new media must be carried out on a more agressive schedule than in the past. An additional challenge is the...

507. New GridKa Tape Storage System – from design to production deployment

Mr Dorin-Daniel Lobontu

22/10/2024, 14:06

Track 1 - Data and Metadata Organization, Management and Access

Talk

Storing the ever-increasing amount of data generated by LHC experiments is still inconceivable without making use of the cost effective, though inherently complex, tape technology. GridKa tape storage system used to rely on IBM Spectrum Protect (SP). Due to a variety of limitations and to meet the even higher requirements of HL-LHC project, GridKa decided to switch from SP to High Performance...

80. ATLAS High-Luminosity LHC demonstrators with Data Carousel: Data-on-Demand and Tape Smart Writing

Xin Zhao (Brookhaven National Laboratory (US))

22/10/2024, 14:24

Track 1 - Data and Metadata Organization, Management and Access

Talk

The High Luminosity upgrade to the LHC (HL-LHC) is expected to generate scientific data on the scale of multiple exabytes. To tackle this unprecedented data storage challenge, the ATLAS experiment initiated the Data Carousel project in 2018. Data Carousel is a tape-driven workflow in which bulk production campaigns with input data resident on tape are executed by staging and promptly...

294. A Tape RSE for Extremely Large Data Collection Backups

Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US))

22/10/2024, 14:42

Track 1 - Data and Metadata Organization, Management and Access

Talk

The Vera Rubin Observatory is a very ambitious project. Using the world’s largest ground-based telescope, it will take two panoramic sweeps of the visible sky every three nights using a 3.2 Giga-pixel camera. The observation products will generate 15 PB of new data each year for 10 years. Accounting for reprocessing and related data products the total amount of critical data will reach several...

214. Archive Metadata for efficient data colocation on tape

Julien Leduc (CERN)

22/10/2024, 15:00

Track 1 - Data and Metadata Organization, Management and Access

Talk

Due to the increasing volume of physics data being produced, the LHC experiments are making more active use of archival storage. Constraints on available disk storage have motivated the evolution towards the "data carousel" and similar models. Datasets on tape are recalled multiple times for reprocessing and analysis, and this trend is expected to accelerate during the Hi-Lumi era (LHC Run-4...

22. dCache project status & update

Dmitry Litvintsev (Fermi National Accelerator Lab. (US)), Mr Tigran Mkrtchyan (DESY)

22/10/2024, 16:15

Track 1 - Data and Metadata Organization, Management and Access

Talk

The dCache project provides open-source software deployed internationally
to satisfy ever-more demanding storage requirements. Its multifaceted
approach provides an integrated way of supporting different use-cases
with the same storage, from high throughput data ingest, data sharing
over wide area networks, efficient access from HPC clusters, and long
term data persistence on tertiary...

355. Evolving StoRM WebDAV: delegation of file transfers to NGINX and support for SciTags

Luca Bassi

22/10/2024, 16:33

Track 1 - Data and Metadata Organization, Management and Access

Talk

After the deprecation of the open-source Globus Toolkit used for GridFTP transfers, the WLCG community has shifted its focus to the HTTP protocol. The WebDAV protocol extends HTTP to create, move, copy and delete resources on web servers. StoRM WebDAV provides data storage access and management through the WebDAV protocol over a POSIX file system. Mainly designed to be used by the WLCG...

204. Advancing Large-Scale Scientific Collaborations with Rucio: A Data Management Story

Hugo Gonzalez Labrador (CERN)

22/10/2024, 16:51

Track 1 - Data and Metadata Organization, Management and Access

Talk

Managing the data deluge generated by large-scale scientific collaborations is a challenge. The Rucio Data Management platform is an open-source framework engineered to orchestrate the storage, distribution, and management of massive data volumes across a globally distributed computing infrastructure. Rucio meets the requirements of high-energy physics, astrophysics, genomics, and beyond,...

392. Data Movement Manager (DMM) for the SENSE-Rucio Interoperation Prototype

Aashay Arora (Univ. of California San Diego (US))

22/10/2024, 17:09

Track 1 - Data and Metadata Organization, Management and Access

Talk

The data movement manager (DMM) is a prototype interface between the CERN developed data management software Rucio and the software defined networking (SDN) service SENSE by ESNet. It allows for SDN enabled high energy physics data flows using the existing worldwide LHC computing grid infrastructure. In addition to the key feature of DMM, namely transfer-priority based bandwidth allocation for...

288. Validation of Shoveler XRootD monitoring

Katy Ellis (Science and Technology Facilities Council STFC (GB))

22/10/2024, 17:27

Track 1 - Data and Metadata Organization, Management and Access

Talk

The Large Hadron Collider (LHC) experiments rely heavily on the XRootD software suite for data transfer and streaming across the Worldwide LHC Computing Grid (WLCG) both within sites (LAN) and across sites (WAN). While XRootD offers extensive monitoring data, there's no single, unified monitoring tool for all experiments. This becomes increasingly critical as network usage grows, and with the...

57. FTS3 Token Support for a Proxy-less WLCG World

Mihai Patrascoiu (CERN)

22/10/2024, 17:45

Track 1 - Data and Metadata Organization, Management and Access

Talk

The WLCG community, with the main LHC experiments at the forefront, is moving away from x509 certificates, replacing the Authentication and Authorization layer with OAuth2 tokens. FTS, as a middleware and core component of the WLCG, plays a crucial role in the transition from x509 proxy certificates to tokens. The paper will present in-detail the FTS token design and how this will serve the...

309. Recent Experience with the CMS Data Management System

Hasan Ozturk (CERN)

23/10/2024, 13:30

Track 1 - Data and Metadata Organization, Management and Access

Talk

The CMS experiment manages a large-scale data infrastructure, currently handling over 200 PB of disk and 500 PB of tape storage and transferring more than 1 PB of data per day on average between various WLCG sites. Utilizing Rucio for high-level data management, FTS for data transfers, and a variety of storage and network technologies at the sites, CMS confronts inevitable challenges due to...

373. DUNE Rucio development and monitoring

Wenlong Yuan (The University of Edinburgh (GB))

23/10/2024, 13:48

Track 1 - Data and Metadata Organization, Management and Access

Talk

The Deep Underground Neutrino Experiment (DUNE) is scheduled to start running in 2029, expected to record 30 PB/year of raw data. To handle this large-scale data, DUNE has adopted and deployed Rucio, the next-generation Data Replica service originally designed by the ATLAS collaboration, as an essential component of its Distributed Data Management system.

DUNE's use of Rucio has demanded...

59. FTS as a part of the SKA data movement pipeline

Rose Cooper

23/10/2024, 14:06

Track 1 - Data and Metadata Organization, Management and Access

Talk

The File Transfer Service (FTS) is a bulk data mover responsible for queuing, scheduling, dispatching and retrying file transfer requests, making it a critical infrastructure component for many experiments. FTS is primarily used by the LHC experiments, namely ATLAS, CMS and LHCb, but is also used by some non-LHC experiments, including both AMS and DUNE. FTS is as an essential part in the data...

389. Distributed Data Management with Rucio for the Einstein Telescope

Lia Lavezzi (INFN Torino (IT))

23/10/2024, 14:24

Track 1 - Data and Metadata Organization, Management and Access

Talk

Modern physics experiments are often led by large collaborations including scientists and institutions from different parts of the world. To cope with the ever increasing computing and storage demands, computing resources are nowadays offered as part of a distributed infrastructure. Einstein Telescope (ET) is a future third-generation interferometer for gravitational wave (GW) detection, and...

282. Data Movement Model for the Vera C. Rubin Observatory

Fabio Hernandez (IN2P3 / CNRS computing centre)

23/10/2024, 14:42

Track 1 - Data and Metadata Organization, Management and Access

Talk

The set of sky images recorded nightly by the camera mounted on the telescope of the [Vera C. Rubin Observatory][1] will be processed in facilities located on three continents. Data acquisition will happen in Cerro Pachón in the Andes mountains in Chile where the observatory is located. A first copy of the raw image data set is stored at the summit site of the observatory and immediately...

441. The Belle II Raw Data Transfer System

Tristan Bloomfield (KEK IPNS)

23/10/2024, 15:00

Track 1 - Data and Metadata Organization, Management and Access

Talk

The Belle II raw data transfer system is responsible for transferring raw data from the Belle II detector to the local KEK computing centre, and from there to the GRID. The Belle II experiment recently completed its first Long Shutdown period - during this time many upgrades were made to the detector and tools used to handle and analyse the data. The Belle II data acquisition (DAQ) systems...

189. Adoption of ROOT RNTuple for the next main event data storage technology in the ATLAS production framework Athena

Marcin Nowak (Brookhaven National Laboratory (US))

23/10/2024, 16:15

Track 1 - Data and Metadata Organization, Management and Access

Talk

Since the start of LHC in 2008, the ATLAS experiment has relied on ROOT to provide storage technology for all its processed event data. Internally, ROOT files are organized around TTree structures that are capable of storing complex C++ objects. The capabilities of TTrees developed over the years and are now offering support for advanced concepts like polymorphism, schema evolution and user...

142. RNTuple: A CMS Perspective

Nick Smith (Fermi National Accelerator Lab. (US))

23/10/2024, 16:33

Track 1 - Data and Metadata Organization, Management and Access

Talk

ROOT is planning to move from TTree to RNTuple as the data storage format for HL-LHC in order to, for example, speed up the IO, make the files smaller, and have a modern C++ API. Initially, RNTuple was not planned to support the same set of C++ data structures as TTree supports. CMS has explored the necessary transformations in its standard persistent data types to switch to RNTuple. Many...

110. ML-based Adaptive Prefetching and Data Placement for US HEP systems

Dr Byrav Ramamurthy (University of Nebraska-Lincoln)

23/10/2024, 16:51

Track 1 - Data and Metadata Organization, Management and Access

Talk

Although caching-based efforts [1] have been in place in the LHC infrastructure in the US, we show that integrating intelligent prefetching and targeted dataset placement into the underlying caching strategy can improve job efficiency further. Newer experiments and experiment upgrades such as HL-LHC and DUNE are expected to produce 10x the amount of data than currently being produced. This...

256. Advancements in the in-file metadata system for the ATLAS experiment

Maciej Pawel Szymanski (Argonne National Laboratory (US))

23/10/2024, 17:09

Track 1 - Data and Metadata Organization, Management and Access

Talk

The High-Luminosity upgrade of the Large Hadron Collider (HL-LHC) will increase luminosity and the number of events by an order of magnitude, demanding more concurrent processing. Event processing is trivially parallel, but metadata handling is more complex and breaks that parallelism. However, correct and reliable in-file metadata is crucial for all workflows of the experiment, enabling tasks...

412. Efficient metadata management with the AMI ecosystem

Mr Fabian Lambert (LPSC Grenoble IN2P3/CNRS (FR))

23/10/2024, 17:27

Track 1 - Data and Metadata Organization, Management and Access

Talk

The ATLAS Metadata Interface (AMI) is a comprehensive ecosystem designed for metadata aggregation, transformation, and cataloging. With over 20 years of feedback in the LHC context, it is particularly well-suited for scientific experiments that generate large volumes of data.

This presentation explains, in a general manner, why managing metadata is essential regardless of the experiment's...

297. So FAIR, so good: the INFN strategy for Data Stewardship

Lorenzo Rinaldi (Universita e INFN, Bologna (IT)), Luciano Gaido

23/10/2024, 17:45

Track 1 - Data and Metadata Organization, Management and Access

Talk

Large international collaborations in the field of Nuclear and Subnuclear Physics have been leading the implementation of FAIR principles for managing research data. These principles are essential when dealing with large volumes of data over extended periods and involving scientists from multiple countries. Recently, smaller communities and individual experiments have also started adopting...

68. Comparing Cache Utilization Trends for Regional Data Caches

John Wu (LAWRENCE BERKELEY NATIONAL LABORATORY)

24/10/2024, 13:30

Track 1 - Data and Metadata Organization, Management and Access

Talk

The surge in data volumes from large scientific collaborations, like the Large Hadron Collider (LHC), poses challenges and opportunities for High Energy Physics (HEP). With annual data projected to grow thirty-fold by 2028, efficient data management is paramount. The HEP community heavily relies on wide-area networks for global data distribution, often resulting in redundant long-distance...

368. Enhancing CMS XCache efficiency: A comparative study of Machine Learning techniques and LRU mechanisms

Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))

24/10/2024, 13:48

Track 1 - Data and Metadata Organization, Management and Access

Talk

The Large Hadron Collider (LHC) at CERN in Geneva is preparing for a major upgrade that will improve both its accelerator and particle detectors. This strategic move comes in anticipation of a tenfold increase in proton-proton collisions, expected to kick off by 2029 in the upcoming high-luminosity phase. The backbone of this evolution is the World-Wide LHC Computing Grid, crucial for handling...

249. Data Placement Optimization for ATLAS in a Multi-Tiered Storage System within a Data Center

Carlos Fernando Gamboa (Brookhaven National Laboratory (US)), Carlos Fernando Gamboa (Department of Physics-Brookhaven National Laboratory (BNL)-Unkno)

24/10/2024, 14:06

Track 1 - Data and Metadata Organization, Management and Access

Talk

Scientific experiments and computations, especially in High Energy Physics, are generating and accumulating data at an unprecedented rate. Effectively managing this vast volume of data while ensuring efficient data analysis poses a significant challenge for data centers, which must integrate various storage technologies. This paper proposes addressing this challenge by designing a multi-tiered...

257. Advancing ATLAS DCS Data Analysis with a Modern Data Platform

Michelle Ann Solis (University of Arizona (US))

24/10/2024, 14:24

Track 1 - Data and Metadata Organization, Management and Access

Talk

This paper presents a novel approach to enhance the analysis of ATLAS Detector Control System (DCS) data at CERN. Traditional storage in Oracle databases, optimized for WinCC archiver operations, is challenged by the need for extensive analysis across long timeframes and multiple devices, alongside correlating conditions data. We introduce techniques to improve troubleshooting and analysis of...

194. Impact of RNTuple on storage resources for ATLAS production (TUE 19)

Tatiana Ovsiannikova (University of Washington (US))

24/10/2024, 14:42

Track 1 - Data and Metadata Organization, Management and Access

Talk

Over the past years, the ROOT team has been developing a new I/O format called RNTuple to store data from experiments at CERN's Large Hadron Collider. RNTuple is designed to improve ROOT's existing TTree I/O subsystem by improving I/O speed and introducing a more efficient binary data format. It can be stored in both ROOT files and object stores, and it's optimized for modern storage hardware...

53. Upcoming database developments at CERN

Andrzej Nowicki (CERN)

24/10/2024, 16:15

Track 1 - Data and Metadata Organization, Management and Access

Talk

In this presentation, I will outline the upcoming transformations set to take place within CERN's database infrastructure. Among the challenges facing our database team during the Long Shutdown 3 (LS3) will be the upgrade of Oracle databases.

The forthcoming version of Oracle database is introducing a significant internal change as the databases will be converted to a container...

65. Physics Data Forge: Unveiling the Power of I/O Systems in CERN’s Test Infrastructure

Guilherme Amadio (CERN)

24/10/2024, 16:33

Track 1 - Data and Metadata Organization, Management and Access

Talk

Remote file access is critical in High Energy Physics (HEP) and is currently facilitated by XRootD and HTTP(S) protocols. With a tenfold increase in data volume expected for Run-4, higher throughput is critical. We compare some client-server implementations on 100GE LANs connected to high-throughput storage devices. A joint project between IT and EP departments aims to evaluate RNTuple as a...

58. Ceph at CERN in the multi-datacentre era

Zachary Goggin

24/10/2024, 16:51

Track 1 - Data and Metadata Organization, Management and Access

Talk

The recent commissioning of CERN’s Prevessin Data Centre (PDC) brings the opportunity for multi-datacentre Ceph deployements, bringing advantages for business continuity and disaster recovery. However, the simple extension of a single cluster across data centres is impractical due to the impact of latency on Ceph’s strong consistency requirements. This paper reports on our research towards...

333. Reading Tea Leaves - Understanding internal events and addressing performance issues within a CephFS/XRootD Storage Element.

Matt Doidge (Lancaster University (GB))

24/10/2024, 17:09

Track 1 - Data and Metadata Organization, Management and Access

Talk

Erasure-coded storage systems based on Ceph have become a mainstay within UK Grid sites as a means of providing bulk data storage whilst maintaining a good balance between data safety and space efficiency. A favoured deployment, as used at the Lancaster Tier-2 WLCG site, is to use CephFS mounted on frontend XRootD gateways as a means of presenting this storage to grid users.

These storage...

454. Distributed management and processing of ALICE monitoring data with Onedata

Dr Michał Orzechowski (AGH University of Krakow, Faculty of Computer Science, Poland)

24/10/2024, 17:27

Track 1 - Data and Metadata Organization, Management and Access

Talk

Onedata [1] platform is a high-performance data management system with a distributed, global infrastructure that enables users to access heterogeneous storage resources worldwide. It supports various use cases ranging from personal data management to data-intensive scientific computations. Onedata has a fully distributed architecture that facilitates the creation of a hybrid cloud...

317. Carbon costs of storage: a UK perspective.

Samuel Cadellin Skipsey

24/10/2024, 17:45

Track 1 - Data and Metadata Organization, Management and Access

Talk

In order to achieve the higher performance year on year required by the 2030s for future LHC upgrades at a sustainable carbon cost
to the environment, it is essential to start with accurate measurements of the state of play. Whilst there have been a number of studies
of the carbon cost of compute for WLCG workloads published, rather less has been said on the topic of storage, both nearline...

Building timetable...

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Session

Parallel (Track 1)

Conveners

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Description

Presentation materials

Choose timezone

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Conveners

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Parallel (Track 1): Data and Metadata Organization, Management and Access

Description

Presentation materials