CHEP 2018 Conference, Sofia, Bulgaria

Name: CHEP 2018 Conference, Sofia, Bulgaria
Start: 2018-07-09T08:00:00+03:00
End: 2018-07-13T13:00:00+03:00
Location: Sofia, Bulgaria

9–13 Jul 2018

Sofia, Bulgaria

Europe/Sofia timezone

Contact us

Session

T4 - Data handling

9 Jul 2018, 11:00

Hall 8 (National Palace of Culture)

Hall 8

National Palace of Culture

T4 - Data handling: S1

Tigran Mkrtchyan (A.Alikhanyan National Science Laboratory (AM))
Tigran Mkrtchyan (DESY)

T4 - Data handling: S2

Tigran Mkrtchyan (DESY)
Tigran Mkrtchyan (A.Alikhanyan National Science Laboratory (AM))

T4 - Data handling: S3

Costin Grigoras (CERN)

T4 - Data handling: S4

Costin Grigoras (CERN)

T4 - Data handling: S5

Costin Grigoras (CERN)

T4 - Data handling: S6

Elizabeth Gallas (University of Oxford (GB))

T4 - Data handling: S7

Elizabeth Gallas (University of Oxford (GB))

There are no materials yet.

431. JADE Long Term Archive

PATRICK MEADE (University of Wisconsin-Madison)

09/07/2018, 11:00

Track 4 - Data Handling

presentation

IceCube is a cubic kilometer neutrino detector located at the south pole. Every year, 29 TB of data are transmitted via satellite, and 365 TB of data are shipped on archival media, to the data warehouse in Madison, WI, USA. The JADE Long Term Archive (JADE-LTA) software indexes and bundles IceCube files and transfers the archive bundles for long term storage and preservation into tape silos...

81. CERN Tape Archive – from development to production deployment

Michael Davis (CERN)

09/07/2018, 11:15

Track 4 - Data Handling

presentation

The first production version of the CERN Tape Archive (CTA) software is planned to be released for the end of 2018. CTA is designed to replace CASTOR as the CERN tape archive solution, in order to face scalability and performance challenges arriving with LHC Run-3.

This contribution will describe the main commonalities and differences of CTA with CASTOR. We outline the functional enhancements...

526. dCache: storage for advanced scientific use-cases and beyond

Tigran Mkrtchyan (DESY)

09/07/2018, 11:30

Track 4 - Data Handling

presentation

The dCache project provides open-source storage software deployed internationally to satisfy ever more demanding scientific storage requirements. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, through wide access and easy integration with existing systems.

In supporting new communities, such as...

400. The GridKa Tape System: status and outlook

Dr Doris Ressmann (KIT)

09/07/2018, 11:45

Track 4 - Data Handling

presentation

Tape storage is still a cost effective way to keep large amounts of data over a long period of time. It is expected that this will continue in the future. The GridKa tape environment is a complex system of many hardware components and software layers. Configuring this system for optimal performance for all use cases is a non-trivial task and requires a lot of experience. We present the current...

225. The archive solution for distributed workflow management agents of the CMS experiment at LHC

Valentin Y Kuznetsov (Cornell University (US))

09/07/2018, 12:00

Track 4 - Data Handling

presentation

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this talk we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system...

85. WLCG space accounting in the SRM-less world

Julia Andreeva (CERN)

09/07/2018, 12:15

Track 4 - Data Handling

presentation

The WLCG computing infrastructure provides distributed storage capacity hosted at the geographically dispersed computing sites.
In order to effectively organize storage and processing of the LHC data, the LHC experiments require a reliable and complete overview of the storage capacity in terms of the occupied and free space, the storage shares allocated to different computing activities, and...

404. Caching technologies for Tier-2 sites: a UK perspective.

Samuel Cadellin Skipsey

09/07/2018, 14:00

Track 4 - Data Handling

presentation

Pressures from both WLCG VOs and externalities have led to a desire to "simplify" data access and handling for Tier-2 resources across the Grid. This has mostly been imagined in terms of reducing book-keeping for VOs, and total replicas needed across sites. One common direction of motion is to increasing the amount of remote-access to data for jobs, which is also seen as enabling the...

558. A data caching model for Tier-2 WLCG computing centres using XCache

Dr Teng LI (University of Edinburgh)

09/07/2018, 14:15

Track 4 - Data Handling

presentation

The XCache (XRootD Proxy Cache) provides a disk-based caching proxy for data access via the XRootD protocol. This can be deployed at WLCG Tier-2 computing sites to provide a transparent cache service for the optimisation of data access, placement and replication.

We will describe the steps to enable full read/write operations to storage endpoints consistent with the distributed data...

352. Advancing throughput of HEP analysis work-flows using caching concepts

Christoph Heidecker (KIT - Karlsruhe Institute of Technology (DE))

09/07/2018, 14:30

Track 4 - Data Handling

presentation

High throughput and short turnaround cycles are core requirements for the efficient processing of I/O-intense end-user analyses. Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to end-users. This situation is even compounded by taking into account opportunistic resources...

541. Evolution of the Hadoop platform for HEP

Zbigniew Baranowski (CERN)

09/07/2018, 14:45

Track 4 - Data Handling

presentation

The interest in using Big Data solutions based on Hadoop ecosystem is constantly growing in HEP community. This drives the need for increased reliability and availability of the central Hadoop service and underlying infrastructure provided to the community by the CERN IT department.
This contribution will report on the overall status of the Hadoop platform and the recent enhancements and...

210. Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data

Dirk Duellmann (CERN)

09/07/2018, 15:00

Track 4 - Data Handling

presentation

The EOS deployment at CERN is a core service used for both scientific data
processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX).
The collected disk failure metrics over a period of 1 year from a deployment
size of some 70k disks allows a first systematic analysis of the behaviour
of different hard disk types for the large CERN use-cases.

In this...

189. The challenges of mining logging data in ATLAS

Elizabeth Gallas (University of Oxford (GB))

09/07/2018, 15:15

Track 4 - Data Handling

presentation

Processing ATLAS event data requires a wide variety of auxiliary information from geometry, trigger, and conditions database systems. This information is used to dictate the course of processing and refine the measurement of particle trajectories and energies to construct a complete and accurate picture of the remnants of particle collisions. Such processing occurs on a worldwide computing...

203. Parallel Event Selection Performance on HPC Systems

Holger Schulz (Fermilab)

09/07/2018, 15:30

Track 4 - Data Handling

presentation

In their measurement of the neutrino oscillation parameters (PRL 118, 231801
(2017)), NOvA uses a sample of approximately 27 million reconstructed spills to
search for electron-neutrino appearance events. These events are stored in an
n-tuple format, in 180 thousand ROOT files. File sizes range from a few hundred KiB to a
few MiB; the full dataset is approximately 3 TiB. These millions of...

133. The ATLAS & Google "Data Ocean" Project for the HL-LHC era

Mario Lassnig (CERN)

09/07/2018, 15:45

Track 4 - Data Handling

presentation

With the LHC High Luminosity upgrade the workload and data management systems are facing new major challenges. To address those challenges ATLAS and Google agreed to cooperate on a project to connect Google Cloud Storage and Compute Engine to the ATLAS computing environment. The idea is to allow ATLAS to explore the use of different computing models, to allow ATLAS user analysis to benefit...

192. Conditions evolution of an experiment in mid-life, without the crisis (in ATLAS)

Lorenzo Rinaldi (Universita e INFN, Bologna (IT))

10/07/2018, 11:00

Track 4 - Data Handling

presentation

The ATLAS experiment is approaching mid-life: the long shutdown period (LS2) between LHC Runs 1 and 2 (ending in 2018) and the future collision data-taking of Runs 3 and 4 (starting in 2021). In advance of LS2, we have been assessing the future viability of existing computing infrastructure systems. This will permit changes to be implemented in time for Run 3. In systems with broad impact...

285. Performance of the Belle II Conditions Database

Lynn Wood (Pacific Northwest National Laboratory, USA)

10/07/2018, 11:15

Track 4 - Data Handling

presentation

The Belle II experiment at KEK is preparing for first collisions in early 2018. Processing the large amounts of data that will be produced requires conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. This was accomplished by relying on industry-standard tools and methods: the conditions database...

202. The Open High Throughput Computing Content Delivery Network

Dave Dykstra (Fermi National Accelerator Lab. (US))

10/07/2018, 11:30

Track 4 - Data Handling

presentation

LHC experiments make extensive use of Web proxy caches, especially for software distribution via the CernVM File System and for conditions data via the Frontier Distributed Database Caching system. Since many jobs read the same data, cache hit rates are high and hence most of the traffic flows efficiently over Local Area Networks. However, it is not always possible to have local Web caches,...

156. A new mechanism to use the Conditions Database REST API to serve the ATLAS detector description

Alessandro De Salvo (Sapienza Universita e INFN, Roma I (IT))

10/07/2018, 11:45

Track 4 - Data Handling

presentation

An efficient and fast access to the detector description of the ATLAS experiment is needed for many tasks, at different steps of the data chain: from detector development to reconstruction, from simulation to data visualization. Until now, the detector description was only accessible through dedicated services integrated into the experiment's software framework, or by the usage of external...

180. A Git-based Conditions Database backend for LHCb

Marco Clemencic (CERN)

10/07/2018, 12:00

Track 4 - Data Handling

presentation

LHCb has been using the CERN/IT developed Conditions Database library COOL for several years, during LHC Run 1 and Run 2. With the opportunity window of the second long shutdown of LHC, in preparation for Run 3 and the upgraded LHCb detector, we decided to investigate alternatives to COOL as Conditions Database backend. In particular, given our conditions and detector description data model,...

363. EventDB: an event indexer and caching system for BESIII experiment

Yaodong Cheng (Chinese Academy of Sciences (CN))

10/07/2018, 12:15

Track 4 - Data Handling

presentation

Beijing Spectrometer (BESIII) experiment has produced hundreds of billions of events. It has collected the world's largest data samples of J/ψ, ψ(3686), ψ(3770) andψ(4040) decays. The typical branching fractions for interesting physics channels are of the order of O(10^-3). The traditional event-wise accessing of BOSS (Bes Offline Software System) is not effective for the selective accessing...

588. Echo - Experiences running an erasure coded object store

Rob Appleyard (STFC)

10/07/2018, 14:00

Track 4 - Data Handling

presentation

Since February 2017, the RAL Tier-1 has been storing production data from the LHC experiments on its new Ceph backed object store called Echo. Echo has been designed to meet the data demands of LHC Run 3 and should scale to meet the challenges of HL-LHC. Echo is already providing better overall throughput than the service it will replace (CASTOR) even with significantly less hardware...

491. dCache - joining the noWORM storage club.

Mr Tigran Mkrtchyan (DESY)

10/07/2018, 14:15

Track 4 - Data Handling

presentation

The life cycle of the scientific data is well defined: data is collected, then processed,
archived and finally deleted. Data is never modified. The original data is used or new,
derived data is produced: Write Once Read Many times (WORM). With this model in
mind, dCache was designed to handle immutable files as efficiently as possible. Currently,
data replication, HSM connectivity and...

70. A milestone for DPM (Disk Pool Manager)

Fabrizio Furano (CERN)

10/07/2018, 14:30

Track 4 - Data Handling

presentation

The DPM (Disk Pool Manager) system is a multiprotocol scalable technology for Grid storage that supports about 130 sites for a total of about 90 Petabytes online.

The system has recently completed the development phase that had been announced in the past years, which consolidates its core component (DOME: Disk Operations Management Engine) as a full-featured high performance engine that can...

371. Providing large-scale disk storage at CERN

Herve Rousseau (CERN)

10/07/2018, 14:45

Track 4 - Data Handling

presentation

The CERN IT Storage group operates multiple distributed storage systems and is
responsible
for the support of the infrastructure to accommodate all CERN storage
requirements, from the
physics data generated by LHC and non-LHC experiments to the personnel users’
files.

EOS is now the key component of the CERN Storage strategy. It allows to
operate at high incoming
throughput for experiment...

57. Scaling the EOS namespace

Andrea Manzi (CERN)

10/07/2018, 15:00

Track 4 - Data Handling

presentation

The EOS namespace has outgrown its legacy in-memory implementation, presenting the need for an alternative solution. In response to this need we developed QuarkDB, a highly-available datastore capable of serving as the metadata backend for EOS. Even though the datastore was tailored to the needs of the namespace, its capabilities are generic.

We will present the overall system design, and our...

386. CERNBox: the CERN cloud storage hub

Hugo Gonzalez Labrador (CERN)

10/07/2018, 15:15

Track 4 - Data Handling

presentation

CERNBox is the CERN cloud storage hub. It allows synchronising and sharing files on all major desktop and mobile platforms (Linux, Windows, MacOSX, Android, iOS) aiming to provide universal access and offline availability to any data stored in the CERN EOS infrastructure.

With more than 12000 users registered in the system, CERNBox has responded to the high demand in our diverse community to...

426. Cloud Storage for data-intensive sciences in science and industry

Hugo Gonzalez Labrador (CERN)

10/07/2018, 15:30

Track 4 - Data Handling

presentation

In the last few years we have been seeing constant interest for technologies providing effective cloud storage for scientific use, matching the requirements of price, privacy and scientific usability. This interest is not limited to HEP and extends out to other scientific fields due to the fast data increase: for example, "big data" is a characteristic of modern genomics, energy and financial...

362. Ceph File System for the CERN HPC Infrastructure

Herve Rousseau (CERN)

10/07/2018, 15:45

Track 4 - Data Handling

presentation

The Ceph File System (CephFS) is a software-defined network filesystem built upon the RADOS object store. In the Jewel and Luminous releases, CephFS was labeled as production ready with horizontally scalable metadata performance. This paper seeks to evaluate that statement in relation to both the HPC and general IT infrastructure needs at CERN. We highlights the key metrics required by four...

154. Towards an Event Streaming Service for ATLAS data processing

Nicolo Magini (INFN e Universita Genova (IT))

11/07/2018, 11:30

Track 4 - Data Handling

presentation

he ATLAS experiment is gradually transitioning from the traditional file-based processing model to dynamic workflow management at the event level with the ATLAS Event Service (AES). The AES assigns fine-grained processing jobs to workers and streams out the data in quasi-real time, ensuring fully efficient utilization of all resources, including the most volatile. The next major step in this...

281. Development and operational experience of the web based application to collect, manage, and release the alignment and calibration configurations for data processing at CMS

Hasib Md (University of Delhi (IN))

11/07/2018, 11:45

Track 4 - Data Handling

presentation

Alignment and calibration workflows in CMS require a significant operational effort, due to the complexity of the systems involved. To serve the variety of condition data management needs of the experiment, the alignment and calibration team has developed and deployed a set of web-based applications. The Condition DB Browser is the main portal to search, navigate and prepare a consistent set...

140. Performance and impact of dynamic data placement in ATLAS

Thomas Maier (Ludwig Maximilians Universitat (DE))

11/07/2018, 12:00

Track 4 - Data Handling

presentation

For high-throughput computing the efficient use of distributed computing resources relies on an evenly distributed workload, which in turn requires wide availability of input data that is used in physics analysis. In ATLAS, the dynamic data placement agent C3PO was implemented in the ATLAS distributed data management system Rucio which identifies popular data and creates additional, transient...

151. Distributed Data Collection for the Next Generation ATLAS EventIndex Project

Alvaro Fernandez Casani (Univ. of Valencia and CSIC (ES))

11/07/2018, 12:15

Track 4 - Data Handling

presentation

The ATLAS EventIndex currently runs in production in order to build a
complete catalogue of events for experiments with large amounts of data.

The current approach is to index all final produced data files at CERN Tier0,
and at hundreds of grid sites, with a distributed data collection architecture
using Object Stores to temporarily maintain the conveyed information, with
references to them...

433. Bootstrapping a New LHC Data Transfer Ecosystem

Brian Paul Bockelman (University of Nebraska Lincoln (US))

11/07/2018, 12:30

Track 4 - Data Handling

presentation

GridFTP transfers and the corresponding Grid Security Infrastructure (GSI)-based authentication and authorization system have been data transfer pillars of the Worldwide LHC Computing Grid (WLCG) for more than a decade. However, in 2017, the end of support for the Globus Toolkit - the reference platform for these technologies - was announced. This has reinvigorated and expanded efforts to...

381. Capability-Based Authorization for HEP

Brian Paul Bockelman (University of Nebraska Lincoln (US))

11/07/2018, 12:45

Track 4 - Data Handling

presentation

Outside the HEP computing ecosystem, it is vanishingly rare to encounter user X509 certificate authentication (and proxy certificates are even more rare). The web never widely adopted the user certificate model, but increasingly sees the need for federated identity services and distributed authorization. For example, Dropbox, Google and Box instead use bearer tokens issued via the OAuth2...

137. Evolution of the open-source data management system Rucio for LHC Run-3 and beyond ATLAS

Martin Barisits (CERN)

12/07/2018, 11:00

Track 4 - Data Handling

presentation

Rucio, the distributed data management system of the ATLAS collaboration already manages more than 330 Petabytes of physics data on the grid. Rucio has seen incremental improvements throughout LHC Run-2 and is currently being prepared for the HL-LHC era of the experiment. Next to these improvements the system is currently evolving into a full-scale generic data management system for...

406. Data management for the SoLid experiment

Janusz Martyniak

12/07/2018, 11:15

Track 4 - Data Handling

presentation

The SoLid experiment is a short-baseline neutrino project located at the BR2 research reactor in Mol, Belgium. It started data taking in November 2017. Data management, including long term storage will be handled in close collaboration by VUB Brussels, Imperial College London and Rutherford Appleton Laboratory (RAL).
The data management system makes the data available for analysis on the...

42. Architecture and prototype of a WLCG data lake for HL-LHC

Simone Campana (CERN)

12/07/2018, 11:30

Track 4 - Data Handling

presentation

The computing strategy document for HL-LHC identifies storage as one of the main WLCG challenges in one decade from now. In the naive assumption of applying today’s computing model, the ATLAS and CMS experiments will need one order of magnitude more storage resources than what could be realistically provided by the funding agencies at the same cost of today. The evolution of the computing...

263. BESIII Data Management System

Ms Qiumei Ma (IHEP)

12/07/2018, 11:45

Track 4 - Data Handling

presentation

BES III experiment have taked data more than ten years, about fifty thounsand runs have been taken. So how to manage these large data is a big challenge to us. For years, we have created an efficient and complete data management system, including MySQL database, C++ API, BookKeeping system, monitor applications and etc. I will focus on introduce BESIII central database management system’s...

434. The data management of heterogeneous resources in Belle II

Dr Malachi Schram (Pacific Northwest National Laboratory)

12/07/2018, 12:00

Track 4 - Data Handling

presentation

The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start taking physics data in early 2018 and aims to accumulate 50/ab, or approximately 50 times more data than the Belle experiment. The collaboration expects it will manage and process approximately 200 PB of data.

Computing at this scale requires efficient and coordinated use of the compute grids in North America,...

165. IceCube File Catalog

PATRICK MEADE (University of Wisconsin-Madison)

12/07/2018, 12:15

Track 4 - Data Handling

presentation

IceCube is a cubic kilometer neutrino detector located at the south pole. Metadata for files in IceCube has traditionally been handled on an application by application basis, with no user-facing access. There has been no unified view of data files, and users often just ls the filesystem to locate files. Recently effort has been put into creating such a unified view. Going for a simple...

421. Building a global file system for data access using Large Scale CVMFS and DynaFed

Alastair Dewhurst (STFC-Rutherford Appleton Laboratory (GB))

12/07/2018, 14:00

Track 4 - Data Handling

presentation

CVMFS has proved an extremely effective mechanism for providing scalable, POSIX like, access to experiment software across the Grid. The normal method for file access is http downloads via squid caches from a small number of Stratum 1 servers. In the last couple of years this mechanisms has been extended to allow access of files from any storage offering http access. This has been named...

479. An http data-federation eco-system with caching functionality using DPM and Dynafed

Silvio Pardi (INFN)

12/07/2018, 14:15

Track 4 - Data Handling

presentation

The implementation of Cache Systems in the computing model of HEP experiments enables to accelerate access to hot data sets by scientists, opening new scenarios of data distribution and enable to exploit the paradigm of storage-less sites.
In this work, we present a study for the creation of an http data-federation eco-system with caching functionality. By exploiting the volatile-pool concept...

390. The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

Jan Erik Sundermann (Karlsruhe Institute of Technology (KIT))

12/07/2018, 14:30

Track 4 - Data Handling

presentation

The computing center GridKa is serving the ALICE, ATLAS, CMS and LHCb experiments as one of the biggest WLCG Tier-1 centers world wide with compute and storage resources. It is operated by the Steinbuch Centre for Computing at Karlsruhe Institute of Technology in Germany. In April 2017 a new online storage system was put into operation. In its current stage of expansion it offers the HEP...

537. Advancements in data management services for distributed e-infrastructures: the eXtreme-DataCloud project

Daniele Cesini (Universita e INFN, Bologna (IT))

12/07/2018, 14:45

Track 4 - Data Handling

presentation

The development of data management services capable to cope with very large data resources is a key challenge to allow the future e-infrastructures to address the needs of the next generation extreme scale scientific experiments.
To face this challenge, in November 2017 the H2020 “eXtreme DataCloud - XDC” project has been launched. Lasting for 27 months and combining the expertise of 8 large...

105. Using a dynamic data federation for running Belle-II simulation applications in a distributed cloud environment

Dr Marcus Ebert (University of Victoria)

12/07/2018, 15:00

Track 4 - Data Handling

presentation

The dynamic data federation software (Dynafed), developed by CERN IT, provides a federated storage cluster on demand using the HTTP protocol with WebDAV extensions. Traditional storage sites which support an experiment can be added to Dynafed without requiring any changes to the site. Dynafed also supports direct access to cloud storage such as S3 and Azure. We report on the usage of Dynafed...

523. storage events: distributed users, federation and beyond

Paul Millar (DESY)

12/07/2018, 15:15

Track 4 - Data Handling

presentation

Whatever the use case, for federated storage to work well some knowledge from each storage system must exist outside that system. This is needed to allow coordinated activity; e.g., executing analysis jobs on worker nodes with good accessibility to the data.

Currently, this is achieved by clients notifying central services of activity; e.g., a client notifies a replica catalogue after an...

Building timetable...

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

Session

T4 - Data handling

Hall 8

National Palace of Culture

Conveners

T4 - Data handling: S1

T4 - Data handling: S2

T4 - Data handling: S3

T4 - Data handling: S4

T4 - Data handling: S5

T4 - Data handling: S6

T4 - Data handling: S7

Presentation materials

Choose timezone

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

Conveners

T4 - Data handling: S1

T4 - Data handling: S2

T4 - Data handling: S3

T4 - Data handling: S4

T4 - Data handling: S5

T4 - Data handling: S6

T4 - Data handling: S7

Presentation materials