CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Session

Track 4: Data Handling

10 Oct 2016, 11:00

GG C3 (San Francisco Mariott Marquis)

GG C3

San Francisco Mariott Marquis

Track 4: Data Handling: Storage Middleware

Maria Girone (CERN)
Patrick Fuhrmann (DESY)

Track 4: Data Handling: Filesystems and Cloud Storage

Wahid Bhimji (Lawrence Berkeley National Lab. (US))
Maria Girone (CERN)

Track 4: Data Handling: Wider HEP and Beyond

Patrick Fuhrmann (DESY)
Maria Girone (CERN)

Track 4: Data Handling: Experiment Frameworks

Elizabeth Gallas (University of Oxford (GB))
Patrick Fuhrmann (DESY)

Track 4: Data Handling: Experiment Frameworks

Wahid Bhimji (Lawrence Berkeley National Lab. (US))
Elizabeth Gallas (University of Oxford (GB))

Track 4: Data Handling: Data Transfer, Caching and Federation

Maria Girone (CERN)
Wahid Bhimji (Lawrence Berkeley National Lab. (US))

There are no materials yet.

41. DPM Evolution: a Disk Operations Management Engine for DPM

Oliver Keeble (CERN)

10/10/2016, 11:00

Track 4: Data Handling

Oral

The DPM (Disk Pool Manager) project is the most widely deployed solution for storage of large data repositories on Grid sites, and is completing the most important upgrade in its history, with the aim of bringing important new features, performance and easier long term maintainability.
Work has been done to make the so-called "legacy stack" optional, and substitute it with an advanced...
Go to contribution page
44. EOS Developments

Elvin Alin Sindrilaru (CERN)

10/10/2016, 11:15

Track 4: Data Handling

Oral

CERN has been developing and operating EOS as a disk storage solution successfully for 5 years. The CERN deployment provides 135 PB and stores 1.2 billion replicas distributed over two computer centres. Deployment includes four LHC instances, a shared instance for smaller experiments and since last year an instance for individual user data as well. The user instance represents the backbone of...
Go to contribution page
106. XROOT development update - support for metalinks and extreme copy

Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER)

10/10/2016, 11:30

Track 4: Data Handling

Oral

XRootD is a distributed, scalable system for low-latency file access. It is the primary data access framework for the high-energy physics community. One of the latest developments in the project has been to incorporate metalink and segmented file transfer technologies.
We report on the implementation of the metalink metadata format support within XRootD client. This includes both the CLI and...
Go to contribution page
528. dCache, managed Cloud Storage

Patrick Fuhrmann (DESY), Patrick Fuhrmann (Deutsches Elektronen-Synchrotron (DE))

10/10/2016, 11:45

Track 4: Data Handling

Oral

For the previous decade, high performance, high capacity Open Source storage systems have been designed and implemented, accommodating the demanding needs of the LHC experiments. However, with the general move away from the concept of local computer centers, supporting their associated communities, towards large infrastructures, providing Cloud-like solutions to a large variety of different...
Go to contribution page
533. Evaluation of ZFS as an efficient WLCG storage backend

Marcus Ebert (University of Edinburgh (GB))

10/10/2016, 12:00

Track 4: Data Handling

Oral

ZFS is a combination of file system, logical volume manager, and software raid system developed by SUN Microsystems for the Solaris OS. ZFS simplifies the administration of disk storage and on Solaris it has been well regarded for its high performance, reliability, and stability for many years. It is used successfully for enterprise storage administration around the globe, but so far on such...
Go to contribution page
281. dCache on steroids - delegated storage solutions

Tigran Mkrtchyan

10/10/2016, 14:00

Track 4: Data Handling

Oral

For over a decade, dCache.ORG has provided robust software that is used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments and many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from all-in-one...
Go to contribution page
42. Making the most of cloud storage - a toolkit for exploitation by WLCG experiments.

Oliver Keeble (CERN)

10/10/2016, 14:15

Track 4: Data Handling

Oral

Understanding how cloud storage can be effectively used, either standalone or in support of its associated compute, is now an important consideration for WLCG.

We report on a suite of extensions to familiar tools targeted at enabling the integration of cloud object stores into traditional grid infrastructures and workflows. Notable updates include support for a number of object store...
Go to contribution page
556. The deployment of a large scale object store at the RAL Tier 1

Alastair Dewhurst (STFC - Rutherford Appleton Lab. (GB))

10/10/2016, 14:30

Track 4: Data Handling

Oral

Since 2014, the RAL Tier 1 has been working on deploying a Ceph backed object store. The aim is to replace Castor for disk storage. This new service must be scalable to meet the data demands of the LHC to 2020 and beyond. As well as offering access protocols the LHC experiments currently use, it must also provide industry standard access protocols. In order to keep costs down the service...
Go to contribution page
66. CERN data services for LHC computing

Xavier Espinal Curull (CERN)

10/10/2016, 14:45

Track 4: Data Handling

Oral

Dependability, resilience, adaptability, and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the detectors need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the...
Go to contribution page
81. CERN's Ceph infrastructure: OpenStack, NFS, CVMFS, CASTOR, and more!

Xavier Espinal Curull (CERN)

10/10/2016, 15:00

Track 4: Data Handling

Oral

This work will present the status of Ceph-related operations and development within the CERN IT Storage Group: we summarise significant production experience at the petabyte scale as well as strategic developments to integrate with our core storage services. As our primary back-end for OpenStack Cinder and Glance, Ceph has provided reliable storage to thousands of VMs for more than 3 years;...
Go to contribution page
162. CEPHFS: a new generation storage platform for Australian high energy physics

Goncalo Borges (University of Sydney (AU))

10/10/2016, 15:15

Track 4: Data Handling

Oral

CEPH is a cutting edge, open source, self-healing distributed data storage technology which is exciting both the enterprise and academic worlds. CEPH delivers an object storage layer (RADOS), block storage layer, and file system storage in a single unified system. CEPH object and block storage implementations are widely used in a broad spectrum of enterprise contexts, from dynamic provision of...
Go to contribution page
289. OSiRIS: A Distributed Ceph Deployment Using Software Defined Networking for Multi-Institutional Research

Shawn Mc Kee (University of Michigan (US))

10/10/2016, 15:30

Track 4: Data Handling

Oral

We will report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The project’s goal is to provide a single scalable, distributed storage...
Go to contribution page
21. Online & Offline Storage and Processing for the upcoming European XFEL Experiments

Martin Gasthuber (DESY)

11/10/2016, 11:00

Track 4: Data Handling

Oral

For the upcoming experiments at the European XFEL light source facility, a new online and offline data processing and storage infrastructure is currently being built and verified. Based on the experience of the system being developed for the Petra III light source at DESY, presented at the last CHEP conference, we further develop the system to cope with the much higher volumes and rates...
Go to contribution page
340. Storage Quality-of-Service in Cloud-based Scientific Environments: A Standardization Approach

Paul Millar

11/10/2016, 11:15

Track 4: Data Handling

Oral

When preparing the Data Management Plan for larger scientific endeavours, PI’s have to balance between the most appropriate qualities of storage space along the line of the planned data lifecycle, it’s price and the available funding. Storage properties can be the media type, implicitly determining access latency and durability of stored data, the number and locality of replicas, as well as...
Go to contribution page
384. Unified data access to e-Infrastructure, Cloud and personal storage within INDIGO-DataCloud

Lukasz Dutka (Cyfronet)

11/10/2016, 11:30

Track 4: Data Handling

Oral

Nowadays users have a variety of options to get access to storage space, including private resources, commercial Cloud storage services as well as storage provided by e-Infrastructures. Unfortunately, all these services provide completely different interfaces for data management (REST, CDMI, command line) and different protocols for data transfer (FTP, GridFTP, HTTP). The goal of the...
Go to contribution page
513. SciDAC-Data, A Project to Enabling Data Driven Modeling of Exascale Computing

Leonidas Aliaga Soplin (College of William and Mary (US))

11/10/2016, 11:45

Track 4: Data Handling

Oral

The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected, by the Fermilab Data Center, on the organization, movement, and consumption of High Energy Physics data. The project is designed to analyze the analysis patterns and data organization that have been used by the CDF, DØ, NO𝜈A, Minos, Minerva and other...
Go to contribution page
543. Taking HEP data management outside of HEP

Bo Jayatilaka (Fermi National Accelerator Lab. (US))

11/10/2016, 12:00

Track 4: Data Handling

Oral

High Energy Physics experiments have long had to deal with huge amounts of data. Other fields of study are now being faced with comparable volumes of experimental data and have similar requirements to organize access by a distributed community of researchers. Fermilab is partnering with the Simons Foundation Autism Research Initiative (SFARI) to adapt Fermilab’s custom HEP data management...
Go to contribution page
136. The ATLAS EventIndex General Dataflow and Monitoring Infrastructure

Alvaro Fernandez Casani (Instituto de Fisica Corpuscular (ES))

11/10/2016, 14:00

Track 4: Data Handling

Oral

The ATLAS EventIndex has been running in production since mid-2015,
reliably collecting information worldwide about all produced events and storing
them in a central Hadoop infrastructure at CERN. A subset of this information
is copied to an Oracle relational database for fast access.
The system design and its optimization is serving event picking from requests of
a few events up to scales of...
Go to contribution page
337. LHCb trigger streams optimization

Nikita Kazeev (Yandex School of Data Analysis (RU))

11/10/2016, 14:15

Track 4: Data Handling

Oral

The LHCb experiment stores around 10^11 collision events per year. A typical physics analysis deals with a final sample of up to 10^7 events. Event preselection algorithms (lines) are used for data reduction. They are run centrally and check whether an event is useful for a particular physical analysis. The lines are grouped into streams. An event is copied to all the streams its lines belong,...
Go to contribution page
557. Astronomy data delivery and processing services with CADC and CANFAR

Dr Sebastien Fabbro (NRC Herzberg)

11/10/2016, 14:30

Track 4: Data Handling

Oral

The Canadian Advanced Network For Astronomical Research (CANFAR)
is a digital infrastructure that has been operational for the last
six years.

The platform allows astronomers to store, collaborate, distribute and
analyze large astronomical datasets. We have implemented multi-site storage and
in collaboration with an HEP group at University of Victoria, multi-cloud processing.
CANFAR is deeply...
Go to contribution page
147. Experiences with the new ATLAS Distributed Data Management System

Vincent Garonne (University of Oslo (NO))

11/10/2016, 14:45

Track 4: Data Handling

Oral

The ATLAS Distributed Data Management (DDM) system has evolved drastically in the last two years with the Rucio software fully
replacing the previous system before the start of LHC Run-2. The ATLAS DDM system manages now more than 200 petabytes spread on 130
storage sites and can handle file transfer rates of up to 30Hz. In this talk, we discuss our experience acquired in...
Go to contribution page
92. Production Experience with the ATLAS Event Service

Vakho Tsulaia (Lawrence Berkeley National Lab. (US))

11/10/2016, 15:00

Track 4: Data Handling

Oral

The ATLAS Event Service (ES) has been designed and implemented for efficient
running of ATLAS production workflows on a variety of computing platforms, ranging
from conventional Grid sites to opportunistic, often short-lived resources, such
as spot market commercial clouds, supercomputers and volunteer computing.
The Event Service architecture allows real time delivery of fine grained...
Go to contribution page
295. GRID Storage Optimization in Transparent and User-Friendly Way for LHCb datasets

Mikhail Hushchyn (Yandex School of Data Analysis (RU))

12/10/2016, 11:15

Track 4: Data Handling

Oral

The LHCb collaboration is one of the four major experiments at the Large Hadron Collider at CERN. Petabytes of data are generated by the detectors and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make data available to all collaboration members around the world. The data is replicated to the Grid sites in different locations. However, disk storage on the Grid is...
Go to contribution page
328. Dynamo - The dynamic data management system for the distributed CMS computing system

12/10/2016, 11:30

Track 4: Data Handling

Oral

The upgraded Dynamic Data Management framework, Dynamo, is designed to manage the majority of the CMS data in an automated fashion. At the moment all CMS Tier-1 and Tier-2 data centers host about 50 PB of officical CMS production data which are all managed by this system. There are presently two main pools that Dynamo manages: the Analysis pool for user analysis data, and the Production pool...
Go to contribution page
517. Design of the ProtoDUNE experiment data management infrastructure

Maxim Potekhin (Brookhaven National Laboratory (US))

12/10/2016, 11:45

Track 4: Data Handling

Oral

The Deep Underground Neutrino Experiment (DUNE) will employ a uniquely large (40kt) Liquid Argon Time Projection chamber as the main component of its Far Detector. In order to validate this design and characterize the detector performance an ambitious experimental program (called "protoDUNE") has been created which includes a beam test of a large-scale DUNE prototype at CERN. The amount of...
Go to contribution page
530. jade: An End-To-End Data Transfer and Catalog Tool

PATRICK MEADE (University of Wisconsin-Madison)

12/10/2016, 12:00

Track 4: Data Handling

Oral

The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope located at the Geographic South Pole. IceCube collects 1 TB of data every day. An online filtering farm processes this data in real time and selects 10% to be sent via satellite to the main data center at the University of Wisconsin-Madison. IceCube has two year-round on-site operators. New operators are hired every year,...
Go to contribution page
540. Data Management and Database Framework for the MICE Experiment

Janusz Martyniak

12/10/2016, 12:15

Track 4: Data Handling

Oral

The international Muon Ionization Cooling Experiment (MICE) currently operating at the Rutherford Appleton Laboratory in the UK, is designed to demonstrate the principle of muon ionization cooling for application to a future Neutrino Factory or Muon Collider. We present the status of the framework for the movement and curation of both raw and reconstructed data. We also review the...
Go to contribution page
549. Integrating Prediction, Provenance, and Optimization into High Energy Workflows

Malachi Schram

12/10/2016, 12:30

Track 4: Data Handling

Oral

Motivated by the complex workflows within Belle II, we propose an approach for efficient execution of workflows on distributed resources that integrates provenance, performance modeling, and optimization-based scheduling. The key components of this framework include modeling and simulation methods to quantitatively predict workflow component behavior; optimized decision making such as choosing...
Go to contribution page
55. Next Generation high performance, multi-dimensional scalable data transfer

Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Dr Roger Cottrell (SLAC National Accelerator Laboratory), Wei Yang (SLAC National Accelerator Laboratory (US)), Dr Wilko Kroeger (SLAC National Accelerator Laboratory)

13/10/2016, 11:00

Track 4: Data Handling

Oral

The exponentially increasing need for high speed data transfer is driven by big data, cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software that has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a...
Go to contribution page
219. Caching Servers for ATLAS

13/10/2016, 11:15

Track 4: Data Handling

Oral

As many Tier 3 and some Tier 2 centers look toward streamlining operations, they are considering autonomously managed storage elements as part of the solution. These storage elements are essentially file caching servers. They can operate as whole file or data block level caches. Several implementations exist. In this paper we explore using XRootD caching servers that can operate in either...
Go to contribution page
571. HTTP as a Data Access Protocol: Trials with XrootD in CMS' AAA Project

Jean-Roch Vlimant (California Institute of Technology (US))

13/10/2016, 11:30

Track 4: Data Handling

Oral

The main goal of the project to demonstrate the ability of using HTTP data
federations in a manner analogous to today.s AAA infrastructure used from
the CMS experiment. An initial testbed at Caltech has been built and
changes in the CMS software (CMSSW) are being implemented in order to
improve HTTP support. A set of machines is already set up at the Caltech
Tier2 in order to improve the...
Go to contribution page
501. Accessing Data Federations with CVMFS

Brian Paul Bockelman (University of Nebraska (US))

13/10/2016, 11:45

Track 4: Data Handling

Oral

Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically come with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The...
Go to contribution page
131. Using machine learning algorithms to forecast network and system load metrics for ATLAS Distributed Computing

Mario Lassnig (CERN)

13/10/2016, 12:00

Track 4: Data Handling

Oral

The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity
physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human
decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to
automated...
Go to contribution page

Building timetable...

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Session

Track 4: Data Handling

GG C3

San Francisco Mariott Marquis

Conveners

Track 4: Data Handling: Storage Middleware

Track 4: Data Handling: Filesystems and Cloud Storage

Track 4: Data Handling: Wider HEP and Beyond

Track 4: Data Handling: Experiment Frameworks

Track 4: Data Handling: Experiment Frameworks

Track 4: Data Handling: Data Transfer, Caching and Federation

Presentation materials