Conveners
Track 4: Data Handling: Storage Middleware
- Maria Girone (CERN)
- Patrick Fuhrmann (DESY)
Track 4: Data Handling: Filesystems and Cloud Storage
- Wahid Bhimji (Lawrence Berkeley National Lab. (US))
- Maria Girone (CERN)
Track 4: Data Handling: Wider HEP and Beyond
- Patrick Fuhrmann (DESY)
- Maria Girone (CERN)
Track 4: Data Handling: Experiment Frameworks
- Elizabeth Gallas (University of Oxford (GB))
- Patrick Fuhrmann (DESY)
Track 4: Data Handling: Experiment Frameworks
- Wahid Bhimji (Lawrence Berkeley National Lab. (US))
- Elizabeth Gallas (University of Oxford (GB))
Track 4: Data Handling: Data Transfer, Caching and Federation
- Maria Girone (CERN)
- Wahid Bhimji (Lawrence Berkeley National Lab. (US))
-
Oliver Keeble (CERN)10/10/2016, 11:00Track 4: Data HandlingOral
The DPM (Disk Pool Manager) project is the most widely deployed solution for storage of large data repositories on Grid sites, and is completing the most important upgrade in its history, with the aim of bringing important new features, performance and easier long term maintainability.
Go to contribution page
Work has been done to make the so-called "legacy stack" optional, and substitute it with an advanced... -
Elvin Alin Sindrilaru (CERN)10/10/2016, 11:15Track 4: Data HandlingOral
CERN has been developing and operating EOS as a disk storage solution successfully for 5 years. The CERN deployment provides 135 PB and stores 1.2 billion replicas distributed over two computer centres. Deployment includes four LHC instances, a shared instance for smaller experiments and since last year an instance for individual user data as well. The user instance represents the backbone of...
Go to contribution page -
Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER)10/10/2016, 11:30Track 4: Data HandlingOral
XRootD is a distributed, scalable system for low-latency file access. It is the primary data access framework for the high-energy physics community. One of the latest developments in the project has been to incorporate metalink and segmented file transfer technologies.
Go to contribution page
We report on the implementation of the metalink metadata format support within XRootD client. This includes both the CLI and... -
Patrick Fuhrmann (DESY), Patrick Fuhrmann (Deutsches Elektronen-Synchrotron (DE))10/10/2016, 11:45Track 4: Data HandlingOral
For the previous decade, high performance, high capacity Open Source storage systems have been designed and implemented, accommodating the demanding needs of the LHC experiments. However, with the general move away from the concept of local computer centers, supporting their associated communities, towards large infrastructures, providing Cloud-like solutions to a large variety of different...
Go to contribution page -
Marcus Ebert (University of Edinburgh (GB))10/10/2016, 12:00Track 4: Data HandlingOral
ZFS is a combination of file system, logical volume manager, and software raid system developed by SUN Microsystems for the Solaris OS. ZFS simplifies the administration of disk storage and on Solaris it has been well regarded for its high performance, reliability, and stability for many years. It is used successfully for enterprise storage administration around the globe, but so far on such...
Go to contribution page -
Tigran Mkrtchyan10/10/2016, 14:00Track 4: Data HandlingOral
For over a decade, dCache.ORG has provided robust software that is used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments and many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from all-in-one...
Go to contribution page -
Oliver Keeble (CERN)10/10/2016, 14:15Track 4: Data HandlingOral
Understanding how cloud storage can be effectively used, either standalone or in support of its associated compute, is now an important consideration for WLCG.
We report on a suite of extensions to familiar tools targeted at enabling the integration of cloud object stores into traditional grid infrastructures and workflows. Notable updates include support for a number of object store...
Go to contribution page -
Alastair Dewhurst (STFC - Rutherford Appleton Lab. (GB))10/10/2016, 14:30Track 4: Data HandlingOral
Since 2014, the RAL Tier 1 has been working on deploying a Ceph backed object store. The aim is to replace Castor for disk storage. This new service must be scalable to meet the data demands of the LHC to 2020 and beyond. As well as offering access protocols the LHC experiments currently use, it must also provide industry standard access protocols. In order to keep costs down the service...
Go to contribution page -
Xavier Espinal Curull (CERN)10/10/2016, 14:45Track 4: Data HandlingOral
Dependability, resilience, adaptability, and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the detectors need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the...
Go to contribution page -
Xavier Espinal Curull (CERN)10/10/2016, 15:00Track 4: Data HandlingOral
This work will present the status of Ceph-related operations and development within the CERN IT Storage Group: we summarise significant production experience at the petabyte scale as well as strategic developments to integrate with our core storage services. As our primary back-end for OpenStack Cinder and Glance, Ceph has provided reliable storage to thousands of VMs for more than 3 years;...
Go to contribution page -
Goncalo Borges (University of Sydney (AU))10/10/2016, 15:15Track 4: Data HandlingOral
CEPH is a cutting edge, open source, self-healing distributed data storage technology which is exciting both the enterprise and academic worlds. CEPH delivers an object storage layer (RADOS), block storage layer, and file system storage in a single unified system. CEPH object and block storage implementations are widely used in a broad spectrum of enterprise contexts, from dynamic provision of...
Go to contribution page -
Shawn Mc Kee (University of Michigan (US))10/10/2016, 15:30Track 4: Data HandlingOral
We will report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The project’s goal is to provide a single scalable, distributed storage...
Go to contribution page -
Martin Gasthuber (DESY)11/10/2016, 11:00Track 4: Data HandlingOral
For the upcoming experiments at the European XFEL light source facility, a new online and offline data processing and storage infrastructure is currently being built and verified. Based on the experience of the system being developed for the Petra III light source at DESY, presented at the last CHEP conference, we further develop the system to cope with the much higher volumes and rates...
Go to contribution page -
Paul Millar11/10/2016, 11:15Track 4: Data HandlingOral
When preparing the Data Management Plan for larger scientific endeavours, PI’s have to balance between the most appropriate qualities of storage space along the line of the planned data lifecycle, it’s price and the available funding. Storage properties can be the media type, implicitly determining access latency and durability of stored data, the number and locality of replicas, as well as...
Go to contribution page -
Lukasz Dutka (Cyfronet)11/10/2016, 11:30Track 4: Data HandlingOral
Nowadays users have a variety of options to get access to storage space, including private resources, commercial Cloud storage services as well as storage provided by e-Infrastructures. Unfortunately, all these services provide completely different interfaces for data management (REST, CDMI, command line) and different protocols for data transfer (FTP, GridFTP, HTTP). The goal of the...
Go to contribution page -
Leonidas Aliaga Soplin (College of William and Mary (US))11/10/2016, 11:45Track 4: Data HandlingOral
The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected, by the Fermilab Data Center, on the organization, movement, and consumption of High Energy Physics data. The project is designed to analyze the analysis patterns and data organization that have been used by the CDF, DØ, NO𝜈A, Minos, Minerva and other...
Go to contribution page -
Bo Jayatilaka (Fermi National Accelerator Lab. (US))11/10/2016, 12:00Track 4: Data HandlingOral
High Energy Physics experiments have long had to deal with huge amounts of data. Other fields of study are now being faced with comparable volumes of experimental data and have similar requirements to organize access by a distributed community of researchers. Fermilab is partnering with the Simons Foundation Autism Research Initiative (SFARI) to adapt Fermilab’s custom HEP data management...
Go to contribution page -
Alvaro Fernandez Casani (Instituto de Fisica Corpuscular (ES))11/10/2016, 14:00Track 4: Data HandlingOral
The ATLAS EventIndex has been running in production since mid-2015,
Go to contribution page
reliably collecting information worldwide about all produced events and storing
them in a central Hadoop infrastructure at CERN. A subset of this information
is copied to an Oracle relational database for fast access.
The system design and its optimization is serving event picking from requests of
a few events up to scales of... -
Nikita Kazeev (Yandex School of Data Analysis (RU))11/10/2016, 14:15Track 4: Data HandlingOral
The LHCb experiment stores around 10^11 collision events per year. A typical physics analysis deals with a final sample of up to 10^7 events. Event preselection algorithms (lines) are used for data reduction. They are run centrally and check whether an event is useful for a particular physical analysis. The lines are grouped into streams. An event is copied to all the streams its lines belong,...
Go to contribution page -
Dr Sebastien Fabbro (NRC Herzberg)11/10/2016, 14:30Track 4: Data HandlingOral
The Canadian Advanced Network For Astronomical Research (CANFAR)
is a digital infrastructure that has been operational for the last
six years.The platform allows astronomers to store, collaborate, distribute and
Go to contribution page
analyze large astronomical datasets. We have implemented multi-site storage and
in collaboration with an HEP group at University of Victoria, multi-cloud processing.
CANFAR is deeply... -
Vincent Garonne (University of Oslo (NO))11/10/2016, 14:45Track 4: Data HandlingOral
The ATLAS Distributed Data Management (DDM) system has evolved drastically in the last two years with the Rucio software fully
Go to contribution page
replacing the previous system before the start of LHC Run-2. The ATLAS DDM system manages now more than 200 petabytes spread on 130
storage sites and can handle file transfer rates of up to 30Hz. In this talk, we discuss our experience acquired in... -
Vakho Tsulaia (Lawrence Berkeley National Lab. (US))11/10/2016, 15:00Track 4: Data HandlingOral
The ATLAS Event Service (ES) has been designed and implemented for efficient
Go to contribution page
running of ATLAS production workflows on a variety of computing platforms, ranging
from conventional Grid sites to opportunistic, often short-lived resources, such
as spot market commercial clouds, supercomputers and volunteer computing.
The Event Service architecture allows real time delivery of fine grained... -
Mikhail Hushchyn (Yandex School of Data Analysis (RU))12/10/2016, 11:15Track 4: Data HandlingOral
The LHCb collaboration is one of the four major experiments at the Large Hadron Collider at CERN. Petabytes of data are generated by the detectors and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make data available to all collaboration members around the world. The data is replicated to the Grid sites in different locations. However, disk storage on the Grid is...
Go to contribution page -
12/10/2016, 11:30Track 4: Data HandlingOral
The upgraded Dynamic Data Management framework, Dynamo, is designed to manage the majority of the CMS data in an automated fashion. At the moment all CMS Tier-1 and Tier-2 data centers host about 50 PB of officical CMS production data which are all managed by this system. There are presently two main pools that Dynamo manages: the Analysis pool for user analysis data, and the Production pool...
Go to contribution page -
Maxim Potekhin (Brookhaven National Laboratory (US))12/10/2016, 11:45Track 4: Data HandlingOral
The Deep Underground Neutrino Experiment (DUNE) will employ a uniquely large (40kt) Liquid Argon Time Projection chamber as the main component of its Far Detector. In order to validate this design and characterize the detector performance an ambitious experimental program (called "protoDUNE") has been created which includes a beam test of a large-scale DUNE prototype at CERN. The amount of...
Go to contribution page -
PATRICK MEADE (University of Wisconsin-Madison)12/10/2016, 12:00Track 4: Data HandlingOral
The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope located at the Geographic South Pole. IceCube collects 1 TB of data every day. An online filtering farm processes this data in real time and selects 10% to be sent via satellite to the main data center at the University of Wisconsin-Madison. IceCube has two year-round on-site operators. New operators are hired every year,...
Go to contribution page -
Janusz Martyniak12/10/2016, 12:15Track 4: Data HandlingOral
The international Muon Ionization Cooling Experiment (MICE) currently operating at the Rutherford Appleton Laboratory in the UK, is designed to demonstrate the principle of muon ionization cooling for application to a future Neutrino Factory or Muon Collider. We present the status of the framework for the movement and curation of both raw and reconstructed data. We also review the...
Go to contribution page -
Malachi Schram12/10/2016, 12:30Track 4: Data HandlingOral
Motivated by the complex workflows within Belle II, we propose an approach for efficient execution of workflows on distributed resources that integrates provenance, performance modeling, and optimization-based scheduling. The key components of this framework include modeling and simulation methods to quantitatively predict workflow component behavior; optimized decision making such as choosing...
Go to contribution page -
Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Dr Roger Cottrell (SLAC National Accelerator Laboratory), Wei Yang (SLAC National Accelerator Laboratory (US)), Dr Wilko Kroeger (SLAC National Accelerator Laboratory)13/10/2016, 11:00Track 4: Data HandlingOral
The exponentially increasing need for high speed data transfer is driven by big data, cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software that has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a...
Go to contribution page -
13/10/2016, 11:15Track 4: Data HandlingOral
As many Tier 3 and some Tier 2 centers look toward streamlining operations, they are considering autonomously managed storage elements as part of the solution. These storage elements are essentially file caching servers. They can operate as whole file or data block level caches. Several implementations exist. In this paper we explore using XRootD caching servers that can operate in either...
Go to contribution page -
Jean-Roch Vlimant (California Institute of Technology (US))13/10/2016, 11:30Track 4: Data HandlingOral
The main goal of the project to demonstrate the ability of using HTTP data
Go to contribution page
federations in a manner analogous to today.s AAA infrastructure used from
the CMS experiment. An initial testbed at Caltech has been built and
changes in the CMS software (CMSSW) are being implemented in order to
improve HTTP support. A set of machines is already set up at the Caltech
Tier2 in order to improve the... -
Brian Paul Bockelman (University of Nebraska (US))13/10/2016, 11:45Track 4: Data HandlingOral
Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically come with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The...
Go to contribution page -
Mario Lassnig (CERN)13/10/2016, 12:00Track 4: Data HandlingOral
The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity
Go to contribution page
physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human
decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to
automated...