24th International Conference on Computing in High Energy & Nuclear Physics

Name: 24th International Conference on Computing in High Energy & Nuclear Physics
Start: 2019-11-04T08:00:00+10:30
End: 2019-11-08T13:00:00+10:30
Location: Adelaide Convention Centre

4–8 Nov 2019

Adelaide Convention Centre

Australia/Adelaide timezone

Contact us

Session

Track 4 – Data Organisation, Management and Access

4 Nov 2019, 11:00

Riverbank R8 (Adelaide Convention Centre)

Riverbank R8

Adelaide Convention Centre

Track 4 – Data Organisation, Management and Access: community input, experiments and new perspectives

Xavier Espinal (CERN)

Track 4 – Data Organisation, Management and Access: distributed computing software and beyond

Brian Paul Bockelman (University of Nebraska Lincoln (US))

Track 4 – Data Organisation, Management and Access: Data transfer, data access and storage QoS

Tigran Mkrtchyan (DESY)

Track 4 – Data Organisation, Management and Access: Caching

Alessandra Forti (University of Manchester (GB))

Track 4 – Data Organisation, Management and Access: workflows and infrastructures

Xavier Espinal (CERN)

Track 4 – Data Organisation, Management and Access: storage systems

Brian Paul Bockelman (University of Nebraska Lincoln (US))

Track 4 – Data Organisation, Management and Access: storage systems evolution and challenges

Alessandra Forti (University of Manchester (GB))

There are no materials yet.

53. ServiceX – A Distributed, Caching, Columnar Data Delivery Service

Benjamin Galewsky

04/11/2019, 11:00

Track 4 – Data Organisation, Management and Access

Oral

We will describe a component of the Intelligent Data Delivery Service being developed in collaboration with IRIS-HEP and the LHC experiments. ServiceX is an experiment-agnostic service to enable on-demand data delivery specifically tailored for nearly-interactive vectorized analysis. This work is motivated by the data engineering challenges posed by HL-LHC data volumes and the increasing...

494. OSiRIS: A Distributed Storage and Networking Project Update

Shawn Mc Kee (University of Michigan (US))

04/11/2019, 11:15

Track 4 – Data Organisation, Management and Access

Oral

We will report on the status of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) after its fourth year. OSiRIS is delivering a distributed Ceph storage infrastructure coupled together with software-defined networking to support multiple science domains across Michigan’s three largest research universities. The project’s goal is to provide a single scalable, distributed storage...

140. Distributed data management on Belle II

Siarhei Padolski (BNL)

04/11/2019, 11:30

Track 4 – Data Organisation, Management and Access

Oral

The Belle II experiment started taking physics data in March 2019, with an estimated dataset of order 60 petabytes expected by the end of operations in the mid-2020s. Originally designed as a fully integrated component of the BelleDIRAC production system, the Belle II distributed data management (DDM) software needs to manage data across 70 storage elements worldwide for a collaboration of...

232. Jiskefet, a bookkeeping application for ALICE

Marten Teitsma (Amsterdam University of Applied Sciences (NL))

04/11/2019, 11:45

Track 4 – Data Organisation, Management and Access

Oral

A new bookkeeping system called Jiskefet is being developed for A Large Ion Collider Experiment (ALICE) during Long Shutdown 2, to be in production until the end of LHC Run 4 (2029).

Jiskefet unifies two functionalities. The first is gathering, storing and presenting metadata associated with the operations of the ALICE experiment. The second is tracking the asynchronous processing of the...

338. Development of the JUNO Conditions Data Management System

Prof. Xingtao Huang (Shandong University)

04/11/2019, 12:00

Track 4 – Data Organisation, Management and Access

Oral

(On behalf of the JUNO collaboration)

Abstract:
The JUNO (Jiangmen Underground Neutrino Observatory) experiment is designed to determine the neutrino mass hierarchy and precisely measure oscillation parameters with an unprecedented energy resolution of 3% at 1MeV. It is composed of a 20kton liquid scintillator central detector equipped with 18000 20” PMTs and 25000 3” PMTs, a water pool...

248. Evaluation of the ATLAS model for remote access to database resident information for LHC Run 3

Elizabeth Gallas (University of Oxford (GB))

04/11/2019, 12:15

Track 4 – Data Organisation, Management and Access

Oral

The ATLAS model for remote access to database resident information relies upon a limited set of dedicated and distributed Oracle database repositories complemented with the deployment of Frontier system infrastructure on the WLCG. ATLAS clients with network access can get the database information they need dynamically by submitting requests to a squid server in the Frontier network which...

225. An Information Aggregation and Analytics System for ATLAS Frontier

Andrea Formica (Université Paris-Saclay (FR))

04/11/2019, 14:00

Track 4 – Data Organisation, Management and Access

Oral

ATLAS event processing requires access to centralized database systems where information about calibrations, detector status and data-taking conditions are stored. This processing is done on more than 150 computing sites on a world-wide computing grid which are able to access the database using the squid-Frontier system. Some processing workflows have been found which overload the Frontier...

138. High Performance Data Format for CLAS12

Dr Gagik Gavalian (Jefferson Lab)

04/11/2019, 14:15

Track 4 – Data Organisation, Management and Access

Oral

With increasing data volume from Nuclear Physics experiments requirements to data
storage and access are changing. To keep up with large data sets new data formats
are needed for efficient processing and analysis of the data. Frequently, in the
experiments data goes through stages from data acquisition to reconstruction and
data analysis and data is converted from one format to another...

188. XRootD 5.0.0: encryption and beyond

Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US))

04/11/2019, 14:30

Track 4 – Data Organisation, Management and Access

Oral

For almost 10 years now XRootD has been very successful at facilitating data management of LHC experiments. Being the foundation and main component of numerous solutions employed within the WLCG collaboration (like EOS and DPM), XRootD grew into one of the most important storage technologies in the High Energy Physics (HEP) community. With the latest major release (5.0.0) XRootD framework...

415. FTS improvements for LHC Run-3 and beyond

Edward Karavakis (CERN)

04/11/2019, 14:45

Track 4 – Data Organisation, Management and Access

Oral

The File Transfer Service developed at CERN and in production since 2014, has become fundamental component for LHC experiments workflows.

Starting from the beginning of 2018 with the participation to the EU project Extreme Data Cloud (XDC) [1] and the activities carried out in the context of the DOMA TPC [2] and QoS [3] working groups, a series of new developments and improvements has been...

443. XRootD and Object Store: A new paradigm

Katy Ellis (Science and Technology Facilities Council STFC (GB))

04/11/2019, 15:00

Track 4 – Data Organisation, Management and Access

Oral

The XRootD software framework is essential for data access at WLCG sites. The WLCG community is exploring and expanding XRootD functionality. This presents a particular challenge at the RAL Tier-1 as the Echo storage service is a Ceph based Erasure Coded object store. External access to Echo uses gateway machines which run GridFTP and XRootD servers. This paper will describe how third party...

542. Using WLCG data management software to support other communities

Ian Collier (Science and Technology Facilities Council STFC (GB))

04/11/2019, 15:15

Track 4 – Data Organisation, Management and Access

Oral

When the LHC started data taking in 2009 the data rates were unprecedented for the time and forced the WLCG community develop a range of tools for managing their data across many different sites. A decade later other science communities are finding their data requirements have grown far beyond what they can easily manage and are looking for help. The RAL Tier-1’s primary mission has always...

477. Modernizing Third-Party-Copy Transfers in WLCG

Alessandra Forti (University of Manchester (GB))

05/11/2019, 11:00

Track 4 – Data Organisation, Management and Access

Oral

The “Third Party Copy” (TPC) Working Group in the WLCG’s “Data Organization, Management, and Access” (DOMA) activity was proposed during a CHEP 2018 Birds of a Feather session in order to help organize the work toward developing alternatives to the GridFTP protocol. Alternate protocols enable the community to diversify; explore new approaches such as alternate authorization mechanisms; and...

355. Third-party transfers in WLCG using HTTP

Brian Paul Bockelman (University of Nebraska Lincoln (US))

05/11/2019, 11:15

Track 4 – Data Organisation, Management and Access

Oral

Since its earliest days, the Worldwide LHC Computational Grid (WLCG) has relied on GridFTP to transfer data between sites. The announcement that Globus is dropping support of its open source Globus Toolkit (GT), which forms the basis for several FTP client and servers, has created an opportunity to reevaluate the use of FTP. HTTP-TPC, an extension to HTTP compatible with WebDAV, has arisen...

523. Xrootd Third Party Copy for the WLCG and HL-LHC

Wei Yang (SLAC National Accelerator Laboratory (US))

05/11/2019, 11:30

Track 4 – Data Organisation, Management and Access

Oral

A Third Party Copy (TPC) has existed in the pure XRootD storage environment for many years. However using XRootD TPC in the WLCG environment presents additional challenges due to the diversity of the storage systems involved such as EOS, dCache, DPM and ECHO, requiring that we carefully navigate the unique constraints imposed by these storage systems and their site-specific environments...

356. Quality of Service (QoS) for cost-effective storage and improved performance

Mario Lassnig (CERN)

05/11/2019, 11:45

Track 4 – Data Organisation, Management and Access

Oral

The anticipated increase in storage requirements for the forthcoming HL-LHC data rates is not matched by a corresponding increase in budget. This results in a short-fall in available resources if the computing models remain unchanged. Therefore, effort is being invested in looking for new and innovative ways to optimise the current infrastructure, so minimising the impact of this...

270. A distributed R&D storage platform implementing quality of service

Patrick Fuhrmann

05/11/2019, 12:00

Track 4 – Data Organisation, Management and Access

Oral

Optimization of computing resources, in particular storage, the costliest one, is a tremendous challenge for the High Luminosity LHC (HL-LHC) program. Several venues are being investigated to address the storage issues foreseen for HL-LHC. Our expectation is that savings can be achieved in two primary areas: optimization of the use of various storage types and reduction of the required...

309. The Quest to solve the HL-LHC data access puzzle. The first year of the DOMA ACCESS Working Group.

Xavier Espinal (CERN)

05/11/2019, 12:15

Track 4 – Data Organisation, Management and Access

Oral

HL-LHC will confront the WLCG community with enormous data storage, management and access challenges. These are as much technical as economical. In the WLCG-DOMA Access working group, members of the experiments and site managers have explored different models for data access and storage strategies to reduce cost and complexity, taking into account the boundary conditions given by our...

63. Smart caching at CMS: applying AI to XCache edge services

Daniele Spiga (Universita e INFN, Perugia (IT))

05/11/2019, 14:00

Track 4 – Data Organisation, Management and Access

Oral

The envisaged Storage and Compute needs for the HL-LHC will be a factor up to 10 above what can be achieved by the evolution of current technology within a flat budget. The WLCG community is studying possible technical solutions to evolve the current computing in order to cope with the requirements; one of the main focuses is resource optimization, with the ultimate objective of improving...

72. CMS data access and usage studies at PIC Tier-1 and CIEMAT Tier-2

Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)

05/11/2019, 14:15

Track 4 – Data Organisation, Management and Access

Oral

Computing needs projections for the HL-LHC era (2026+), following the current computing models, indicate that much larger resource increases would be required than those that technology evolution at a constant budget could bring. Since worldwide budget for computing is not expected to increase, many research activities have emerged to improve the performance of the LHC processing software...

82. Moving the California distributed CMS xcache from bare metal into containers using Kubernetes

Matevz Tadel (Univ. of California San Diego (US))

05/11/2019, 14:30

Track 4 – Data Organisation, Management and Access

Oral

The University of California system has excellent networking between all of its campuses as well as a number of other Universities in CA, including Caltech, most of them being connected at 100 Gbps. UCSD and Caltech have thus joined their disk systems into a single logical xcache system, with worker nodes from both sites accessing data from disks at either site. This setup has been in place...

127. Creating a content delivery network for general science on the backbone of the Internet using xcaches.

Igor Sfiligoi (UCSD)

05/11/2019, 14:45

Track 4 – Data Organisation, Management and Access

Oral

A general problem faced by computing on the grid for opportunistic users is that while delivering opportunistic cycles is simpler compared to delivering opportunistic storage. In this project we show how we integrated Xrootd caches places on the internet backbone to simulate a content delivery network for general science workflows. We will show that for some workflows on LIGO, DUNE, and...

218. Implementation and performances of a DPM federated storage and integration within the ATLAS environment

Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR))

05/11/2019, 15:00

Track 4 – Data Organisation, Management and Access

Oral

With the increase of storage needs at the HL-LHC horizon, the data management and access will be very challenging for this critical service. The evaluation of possible solutions within the DOMA, DOMA-FR (IN2P3 project contribution to DOMA) and ESCAPE initiatives is a major activity to select the most optimal ones from the experiment and site point of views. The LAPP and LPSC teams have put...

84. Analysis and modeling of data access patterns in ATLAS and CMS

Markus Schulz (CERN)

05/11/2019, 15:15

Track 4 – Data Organisation, Management and Access

Oral

Data movement between sites, replication and storage are very expensive operations, in terms of time and resources, for the LHC collaborations, and are expected to be even more so in the future. In this work we derived usage patterns based on traces and logs from the data and workflow management systems of CMS and ATLAS, and simulated the impact of different caching and data lifecycle...

179. The ATLAS Data Carousel Project

Xin Zhao (Brookhaven National Laboratory (US))

05/11/2019, 16:30

Track 4 – Data Organisation, Management and Access

Oral

The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used.
Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed...

66. Transitioning CMS to Rucio Data Management

Eric Vaandering (Fermi National Accelerator Lab. (US))

05/11/2019, 16:45

Track 4 – Data Organisation, Management and Access

Oral

Following a thorough review in 2018, the CMS experiment at the CERN LHC decided to adopt Rucio as its new data management system. Rucio is emerging as a community software project and will replace an aging CMS-only system before the start-up of LHC Run 3 in 2021. Rucio was chosen after an evaluation determined that Rucio could meet the technical and scale needs of CMS. The data management...

361. The Dynafed data federator as grid site storage element

Frank Berghaus (University of Victoria (CA))

05/11/2019, 17:00

Track 4 – Data Organisation, Management and Access

Oral

The Dynafed data federator is designed to present a dynamic and unified view of a distributed file repository. We describe our use of Dynafed to construct a production-ready WLCG storage element (SE) using existing Grid storage endpoints as well as object storage. In particular, Dynafed is used as the primary SE for the Canadian distributed computing cloud systems. Specifically, we have been...

377. Data migration strategy based on file heat prediction with deep learning methods

Shiyuan Fu, Shiyuan Fu

05/11/2019, 17:15

Track 4 – Data Organisation, Management and Access

Oral

As a data-intensive computing application, high-energy physics requires storage and computing for large amounts of data at the PB level. Performance demands and data access imbalances in mass storage systems are increasing. Specifically, on one hand, traditional cheap disk storage systems have been unable to handle high IOPS demand services. On the other hand, a survey found that only a very...

111. ESCAPE prototypes a Data Infrastructure for Open Science

Simone Campana (CERN)

05/11/2019, 17:30

Track 4 – Data Organisation, Management and Access

Oral

The European-funded ESCAPE project will prototype a shared solution to computing challenges in the context of the European Open Science Cloud. It targets Astronomy and Particle Physics facilities and research infrastructures and focuses on developing solutions for handling Exabyte scale datasets.

The DIOS work package aims at delivering a Data Infrastructure for Open Science. Such an...

85. SkyhookDM: Mapping Scientific Datasets to Programmable Storage

Jeff LeFevre (University of California, Santa Cruz)

05/11/2019, 17:45

Track 4 – Data Organisation, Management and Access

Oral

Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. For example, access libraries...

93. dCache - keeping up with the evolution of science

Mr Tigran Mkrtchyan (DESY)

07/11/2019, 11:00

Track 4 – Data Organisation, Management and Access

Oral

The dCache project provides open-source software deployed internationally
to satisfy ever more demanding storage requirements of various scientific
communities. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, through wide access and easy integration with existing systems, including
event driven...

413. Disk Pool Manager(DPM): From DOME to LHC Run-3

Francesco Giovanni Sciacca (Universitaet Bern (CH))

07/11/2019, 11:15

Track 4 – Data Organisation, Management and Access

Oral

The DOMA activities gave the opportunity for DPM to contribute to
the WLCG plans for Run-3 and beyond. Here we identify the themes
that are relevant to site storage systems and explain how the
approaches chosen in DPM are relevant for features like
scalability, third party copy, bearer tokens, multi-site deployments and
volatile caching pools.

We will also discuss the status of the...

276. CDFS: A high-efficiency Data Access System for Storage Federations

Shiyuan Fu, Shiyuan Fu (Institute of High Energy Physics,Chinese Academy of Sciences)

07/11/2019, 11:30

Track 4 – Data Organisation, Management and Access

Oral

High energy physics (HEP) experiments produce a large amount of data, which is usually stored and processed on distributed sites. Nowadays, the distributed data management system faces some challenges such as global file namespace and efficient data access. Focusing on those problems, the paper proposed a cross-domain data access file system (CDFS), a data cache and access system across...

395. CERN Tape Archive: production status, migration from CASTOR and new features

Eric Cano (CERN)

07/11/2019, 11:45

Track 4 – Data Organisation, Management and Access

Oral

During 2019 and 2020, the CERN tape archive (CTA) will receive new data from LHC experiments and import existing data from CASTOR, which will be phased out for LHC experiments before Run 3.

This contribution will present the statuses of CTA as a service and of its integration with EOS and FTS and the data flow chains of LHC experiments.

The latest enhancements and additions to the...

529. Seeking an alternative to tape-based custodial storage

Sang Un Ahn (Korea Institute of Science & Technology Information (KR))

07/11/2019, 12:00

Track 4 – Data Organisation, Management and Access

Oral

In November 2018, the KISTI Tier-1 centre started a project to design, develop and deploy a disk-based custodial storage with error rate and reliability compatible with a tape-based storage. This project has been conducted in the collaboration between KISTI and CERN, especially the initial system design was laid out from the intensive discussion with CERN IT and ALICE. The initial system...

470. CERN Disk Storage Services: report from last data taking, evolution and future outlook towards Exabyte-scale storage

Luca Mascetti (CERN)

07/11/2019, 12:15

Track 4 – Data Organisation, Management and Access

Oral

The CERN IT Storage group operates multiple distributed storage systems to support all CERN data storage requirements: the physics data generated by LHC and non-LHC experiments; object and file storage for infrastructure services; block storage for the CERN cloud system; filesystems for general use and specialized HPC clusters; content distribution filesystem for software distribution and...

468. Evolution of the S3 service at CERN as a storage backend for infrastructure services and software repositories

Enrico Bocchi (CERN)

07/11/2019, 14:00

Track 4 – Data Organisation, Management and Access

Oral

The S3 service at CERN (S3.CERN.CH) is a horizontally scalable object storage system built with a flexible number of virtual RADOS Gateways on top of a conventional Ceph cluster. A Traefik load balancing frontend (operated via Nomad and Consul) redirects HTTP traffic to the RGW backends, and LogStash publishes to ElasticSearch for monitoring the user traffic. User and quota management is...

301. EOS architectural evolution and strategic development directions

Andreas Joachim Peters (CERN)

07/11/2019, 14:15

Track 4 – Data Organisation, Management and Access

Oral

EOS is the main storage system at CERN providing hundreds of PB of capacity to both physics experiments and also regular users of the CERN infrastructure. Since its first deployment in 2010, EOS has evolved and adapted to the challenges posed by ever increasing requirements for storage capacity, user friendly POSIX-like interactive experience and new paradigms like collaborative applications...

302. Erasure Coding for production in the EOS Open Storage system

Andreas Joachim Peters (CERN)

07/11/2019, 14:30

Track 4 – Data Organisation, Management and Access

Oral

The storage group of CERN IT operates more than 20 individual EOS storage services with a raw data storage volume of more than 280 PB. Storage space is a major cost factor in HEP computing and the planned future LHC Run 3 and 4 increase storage space demands by at least an order of magnitude.

A cost effective storage model providing durability is Erasure Coding (EC). The decommissioning of...

209. The ATLAS EventIndex for LHC Run 3

Dario Barberis (Università e INFN Genova (IT))

07/11/2019, 14:45

Track 4 – Data Organisation, Management and Access

Oral

The ATLAS Event Index was designed in 2012-2013 to provide a global event catalogue and limited event-level metadata for ATLAS analysis groups and users during LHC Run 2 (2015-2018). It provides a good and reliable service for the initial use cases (mainly event picking) and several additional ones, such as production consistency checks, duplicate event detection and measurements of the...

273. Testing the performance of data transfer protocols on long-haul high bandwidth paths.

Dr Richard Hughes-Jones (GEANT Association)

07/11/2019, 15:00

Track 4 – Data Organisation, Management and Access

Oral

This paper describes the work done to test the performance of several current data transfer protocols. The work was carried out as part of the AENEAS Horizon 2020 project in collaboration with the DOMA project and investigated the interactions between the application, the transfer protocol, TCP/IP and the network elements. When operational, the two telescopes in Australia and South Africa that...

210. Using Graph Databases in HEP

Julius Hrivnac (Centre National de la Recherche Scientifique (FR))

07/11/2019, 15:15

Track 4 – Data Organisation, Management and Access

Oral

Data in HEP are usually stored in tuples (tables), trees, nested tuples (trees of tuples) or relational (SQL-like) databases, with or without a defined schema. But many of our data have a graph structure without a schema, or with a weakly imposed schema. They consist of entities with relations, some of which are known in advance, but many are created later, as needs evolve. Such structures are...

Building timetable...

24th International Conference on Computing in High Energy & Nuclear Physics

Contact us

Session

Track 4 – Data Organisation, Management and Access

Riverbank R8

Adelaide Convention Centre

Conveners

Track 4 – Data Organisation, Management and Access: community input, experiments and new perspectives

Track 4 – Data Organisation, Management and Access: distributed computing software and beyond

Track 4 – Data Organisation, Management and Access: Data transfer, data access and storage QoS

Track 4 – Data Organisation, Management and Access: Caching

Track 4 – Data Organisation, Management and Access: workflows and infrastructures

Track 4 – Data Organisation, Management and Access: storage systems

Track 4 – Data Organisation, Management and Access: storage systems evolution and challenges

Presentation materials

Choose timezone

24th International Conference on Computing in High Energy & Nuclear Physics

Contact us

Conveners

Track 4 – Data Organisation, Management and Access: community input, experiments and new perspectives

Track 4 – Data Organisation, Management and Access: distributed computing software and beyond

Track 4 – Data Organisation, Management and Access: Data transfer, data access and storage QoS

Track 4 – Data Organisation, Management and Access: Caching

Track 4 – Data Organisation, Management and Access: workflows and infrastructures

Track 4 – Data Organisation, Management and Access: storage systems

Track 4 – Data Organisation, Management and Access: storage systems evolution and challenges

Presentation materials