Conveners
Storage: Tue AM
- Patrick Fuhrmann (Deutsches Elektronen-Synchrotron (DE))
- Peter Clarke (The University of Edinburgh (GB))
Storage: Tue PM
- Cedric Serfon (Brookhaven National Laboratory (US))
- Peter Clarke (The University of Edinburgh (GB))
Storage: Wed AM
- Edoardo Martelli (CERN)
- Cedric Serfon (Brookhaven National Laboratory (US))
Storage: Wed PM
- Christophe Haen (CERN)
- Xavier Espinal (CERN)
The DUNE detector is a neutrino physics experiment that is expected to take data starting from 2028. The data acquisition (DAQ) system of the experiment is designed to sustain several TB/s of incoming data which will be temporarily buffered while being processed by a software based data selection system.
In DUNE, some rare physics processes (e.g. Supernovae Burst events) require storing the...
The ATLAS experiment will undergo a major upgrade to take advantage of the new conditions provided by the upgraded High-Luminosity LHC. The Trigger and Data Acquisition system (TDAQ) will record data at unprecedented rates: the detectors will be read out at 1 MHz generating around 5 TB/s of data. The Dataflow system (DF), component of TDAQ, introduces a novel design: readout data are buffered...
In recent years, cloud sync & share storage services, provided by academic and research institutions, have become a daily workplace environment for many local user groups in the High Energy Physics (HEP) community. These, however, are primarily disconnected and deployed in isolation from one another, even though new technologies have been developed and integrated to further increase the value...
With the advancement of many large HEP experiments, the amount of data that needs to be processed and stored has increased significantly, so we must upgrade computing resources and improve the performance of storage software. This article discusses porting the EOS software from the x86_64 architecture to the aarch64 architecture, with the aim of finding a more cost-effective storage solution....
We proposed a disk-based custodial storage as an alternative to tape for the ALICE experiment at CERN to preserve its raw data.
The proposed storage system relies on RAIN layout -- the implementation of erasure coding in the EOS storage suite, which is developed by CERN -- for data protection and takes full advantage of high-density JBOD enclosures to maximize storage capacity as well as to...
The intelligent Data Delivery Service (iDDS) has been developed to cope with the huge increase of computing and storage resource usage in the coming LHC data taking. iDDS has been designed to intelligently orchestrate workflow and data management systems, decoupling data pre-processing, delivery, and main processing in various workflows. It is an experiment-agnostic service around a workflow-...
The High Luminosity upgrade to the LHC, which aims for a ten-fold increase in the luminosity of proton-proton collisions at an energy of 14 TeV, is expected to start operation in 2028/29, and will deliver an unprecedented volume of scientific data at the multi-exabyte scale. This amount of data has to be stored and the corresponding storage system must ensure fast and reliable data delivery...
The dCache project provides open-source software deployed internationally to satisfy ever more demanding storage requirements. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, data sharing over wide area networks, efficient access from HPC clusters and long term data persistence on a tertiary...
Tape storage remains the most cost-effective system for safe long-term storage of petabytes of data and reliably accessing it on demand. It has long been widely used by Tier-1 centers in WLCG. GridKa uses tape storage systems for LHC and non-LHC HEP experiments. The performance requirements on the tape storage systems are increasing every year, creating an increasing number of challenges in...
Given the anticipated increase in the amount of scientific data, it is widely accepted that primarily disk based storage will become prohibitively expensive. Tape based storage, on the other hand, provides a viable and affordable solution for the ever increasing demand for storage space. Coupled with a disk caching layer that temporarily holds a small fraction of the total data volume to allow...
A major goal of future dCache development will be to allow users to define file Quality of Service (QoS) in a more flexible way than currently available. This will mean implementing what might be called a QoS rule engine responsible for registering and managing time-bound QoS transitions for files or storage units. In anticipation of this extension to existing dCache capabilities, the...
The High Luminosity phase of the LHC, which aims for a ten-fold increase in the luminosity of proton-proton collisions is expected to start operation in eight years. An unprecedented scientific data volume at the multi-exabyte scale will be delivered to particle physics experiments at CERN. This amount of data has to be stored and the corresponding technology must ensure fast and...
The European-funded ESCAPE project (Horizon 2020) aims to address computing challenges in the context of the European Open Science Cloud. The project targets Particle Physics and Astronomy facilities and research infrastructures, focusing on the development of solutions to handle Exabyte-scale datasets. The science projects in ESCAPE are in different phases of evolution and count a variety of...
The CERN IT Storage Group ensures the symbiotic development
and operations of storage and data transfer services for all CERN physics data,
in particular the data generated by the four LHC experiments (ALICE, ATLAS,
CMS and LHCb).
In order to accomplish the objectives of the next run of the LHC (Run-3), the
Storage Group has undertaken a thorough analysis of the experiments’...
The CERN Tape Archive (CTA) provides a tape backend to disk systems and, in conjunction with EOS, is managing the data of the LHC experiments at CERN.
Magnetic tape storage offer the lowest cost per unit volume today, followed by hard disks and flash. In addition, current tape drives deliver a solid bandwidth (typically 360MB/s per device), but at the cost of high latencies, both for...
In the HEP community, software plays a central role in the operation of experiments’ facilities and for reconstruction jobs, with CVMFS being the service enabling the distribution of software at scale. In view of High Luminosity LHC, CVMFS developers investigated how to improve the publication workflow to support the most demanding use cases. This paper reports about recent CVMFS developments...
CERNBox is the cloud collaboration hub at CERN. The service has more than 37,000 user accounts. The backup of user and project data is critical for the service. The underlying storage system hosts over a billion files which amount to 12PB of storage distributed over several hundred disks with a two-replica RAIN layout. Performing a backup operation over this vast amount of data is a...
In 2016, CERN decided to phase out the legacy OpenAFS storage service due to concerns for the upstream project's longevity, and the potential impact of disorderly service stop on CERN's computing services. Early 2019, the OpenAFS risks of the project collapsing have been reassessed and several early concerns have been allayed. In this paper we recap the work done so far, highlight some of the...
Containers became the de-facto standard to package and distribute modern applications and their dependencies. The HEP community demonstrates an increasing interest in such technology, with scientists encapsulating their analysis workflow and code inside a container image. The analysis is first validated on a small dataset and minimal hardware resources to then run at scale on the massive...
This paper presents the experience in providing CERN users with
direct online access to their EOS/CERNBox-powered user storage from Win-
dows. In production for about 15 months, a High-Available Samba cluster is
regularly used by a significant fraction of the CERN user base, following the
migration of their central home folders from Microsoft DFS in the context of
CERN’s strategy to move...
Metadata management is one of three major areas and parts of functionality of scientific data management along with replica management and workflow management. Metadata is the information describing the data stored in a data item, a file or an object. It includes the data item provenance, recording conditions, format and other attributes. MetaCat is a metadata management database designed and...
Over the last decades, several data preservation efforts have been undertaken by the HEP community, as experiments are not repeatable and consequently their data considered unique. ARCHIVER is a European Commission (EC) co-funded Horizon 2020 pre-commercial procurement project procuring R&D combining multiple ICT technologies including data-intensive scalability, network, service...
Over the last two decades, ROOT TTree has been used for storing over one exabyte of High-Energy Physics (HEP) events. The TTree columnar on-disk layout has been proved to be ideal for analyses of HEP data that typically require access to many events, but only a subset of the information stored for each of them. Future accelerators, and particularly HL-LHC, will bring an increase of at least...