10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Session

Track 4: Data Handling

T4
10 Oct 2016, 11:00
GG C3 (San Francisco Mariott Marquis)

GG C3

San Francisco Mariott Marquis

Conveners

Track 4: Data Handling: Storage Middleware

  • Maria Girone (CERN)
  • Patrick Fuhrmann (DESY)

Track 4: Data Handling: Filesystems and Cloud Storage

  • Wahid Bhimji (Lawrence Berkeley National Lab. (US))
  • Maria Girone (CERN)

Track 4: Data Handling: Wider HEP and Beyond

  • Patrick Fuhrmann (DESY)
  • Maria Girone (CERN)

Track 4: Data Handling: Experiment Frameworks

  • Elizabeth Gallas (University of Oxford (GB))
  • Patrick Fuhrmann (DESY)

Track 4: Data Handling: Experiment Frameworks

  • Wahid Bhimji (Lawrence Berkeley National Lab. (US))
  • Elizabeth Gallas (University of Oxford (GB))

Track 4: Data Handling: Data Transfer, Caching and Federation

  • Maria Girone (CERN)
  • Wahid Bhimji (Lawrence Berkeley National Lab. (US))

Presentation materials

There are no materials yet.

  1. Oliver Keeble (CERN)
    10/10/2016, 11:00
    Track 4: Data Handling
    Oral

    The DPM (Disk Pool Manager) project is the most widely deployed solution for storage of large data repositories on Grid sites, and is completing the most important upgrade in its history, with the aim of bringing important new features, performance and easier long term maintainability.
    Work has been done to make the so-called "legacy stack" optional, and substitute it with an advanced...

    Go to contribution page
  2. Elvin Alin Sindrilaru (CERN)
    10/10/2016, 11:15
    Track 4: Data Handling
    Oral

    CERN has been developing and operating EOS as a disk storage solution successfully for 5 years. The CERN deployment provides 135 PB and stores 1.2 billion replicas distributed over two computer centres. Deployment includes four LHC instances, a shared instance for smaller experiments and since last year an instance for individual user data as well. The user instance represents the backbone of...

    Go to contribution page
  3. Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER)
    10/10/2016, 11:30
    Track 4: Data Handling
    Oral

    XRootD is a distributed, scalable system for low-latency file access. It is the primary data access framework for the high-energy physics community. One of the latest developments in the project has been to incorporate metalink and segmented file transfer technologies.
    We report on the implementation of the metalink metadata format support within XRootD client. This includes both the CLI and...

    Go to contribution page
  4. Patrick Fuhrmann (DESY), Patrick Fuhrmann (Deutsches Elektronen-Synchrotron (DE))
    10/10/2016, 11:45
    Track 4: Data Handling
    Oral

    For the previous decade, high performance, high capacity Open Source storage systems have been designed and implemented, accommodating the demanding needs of the LHC experiments. However, with the general move away from the concept of local computer centers, supporting their associated communities, towards large infrastructures, providing Cloud-like solutions to a large variety of different...

    Go to contribution page
  5. Marcus Ebert (University of Edinburgh (GB))
    10/10/2016, 12:00
    Track 4: Data Handling
    Oral

    ZFS is a combination of file system, logical volume manager, and software raid system developed by SUN Microsystems for the Solaris OS. ZFS simplifies the administration of disk storage and on Solaris it has been well regarded for its high performance, reliability, and stability for many years. It is used successfully for enterprise storage administration around the globe, but so far on such...

    Go to contribution page
  6. Tigran Mkrtchyan
    10/10/2016, 14:00
    Track 4: Data Handling
    Oral

    For over a decade, dCache.ORG has provided robust software that is used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments and many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from all-in-one...

    Go to contribution page
  7. Oliver Keeble (CERN)
    10/10/2016, 14:15
    Track 4: Data Handling
    Oral

    Understanding how cloud storage can be effectively used, either standalone or in support of its associated compute, is now an important consideration for WLCG.

    We report on a suite of extensions to familiar tools targeted at enabling the integration of cloud object stores into traditional grid infrastructures and workflows. Notable updates include support for a number of object store...

    Go to contribution page
  8. Alastair Dewhurst (STFC - Rutherford Appleton Lab. (GB))
    10/10/2016, 14:30
    Track 4: Data Handling
    Oral

    Since 2014, the RAL Tier 1 has been working on deploying a Ceph backed object store. The aim is to replace Castor for disk storage. This new service must be scalable to meet the data demands of the LHC to 2020 and beyond. As well as offering access protocols the LHC experiments currently use, it must also provide industry standard access protocols. In order to keep costs down the service...

    Go to contribution page
  9. Xavier Espinal Curull (CERN)
    10/10/2016, 14:45
    Track 4: Data Handling
    Oral

    Dependability, resilience, adaptability, and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the detectors need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the...

    Go to contribution page
  10. Xavier Espinal Curull (CERN)
    10/10/2016, 15:00
    Track 4: Data Handling
    Oral

    This work will present the status of Ceph-related operations and development within the CERN IT Storage Group: we summarise significant production experience at the petabyte scale as well as strategic developments to integrate with our core storage services. As our primary back-end for OpenStack Cinder and Glance, Ceph has provided reliable storage to thousands of VMs for more than 3 years;...

    Go to contribution page
  11. Goncalo Borges (University of Sydney (AU))
    10/10/2016, 15:15
    Track 4: Data Handling
    Oral

    CEPH is a cutting edge, open source, self-healing distributed data storage technology which is exciting both the enterprise and academic worlds. CEPH delivers an object storage layer (RADOS), block storage layer, and file system storage in a single unified system. CEPH object and block storage implementations are widely used in a broad spectrum of enterprise contexts, from dynamic provision of...

    Go to contribution page
  12. Shawn Mc Kee (University of Michigan (US))
    10/10/2016, 15:30
    Track 4: Data Handling
    Oral

    We will report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The project’s goal is to provide a single scalable, distributed storage...

    Go to contribution page
  13. Martin Gasthuber (DESY)
    11/10/2016, 11:00
    Track 4: Data Handling
    Oral

    For the upcoming experiments at the European XFEL light source facility, a new online and offline data processing and storage infrastructure is currently being built and verified. Based on the experience of the system being developed for the Petra III light source at DESY, presented at the last CHEP conference, we further develop the system to cope with the much higher volumes and rates...

    Go to contribution page
  14. Paul Millar
    11/10/2016, 11:15
    Track 4: Data Handling
    Oral

    When preparing the Data Management Plan for larger scientific endeavours, PI’s have to balance between the most appropriate qualities of storage space along the line of the planned data lifecycle, it’s price and the available funding. Storage properties can be the media type, implicitly determining access latency and durability of stored data, the number and locality of replicas, as well as...

    Go to contribution page
  15. Lukasz Dutka (Cyfronet)
    11/10/2016, 11:30
    Track 4: Data Handling
    Oral

    Nowadays users have a variety of options to get access to storage space, including private resources, commercial Cloud storage services as well as storage provided by e-Infrastructures. Unfortunately, all these services provide completely different interfaces for data management (REST, CDMI, command line) and different protocols for data transfer (FTP, GridFTP, HTTP). The goal of the...

    Go to contribution page
  16. Leonidas Aliaga Soplin (College of William and Mary (US))
    11/10/2016, 11:45
    Track 4: Data Handling
    Oral

    The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected, by the Fermilab Data Center, on the organization, movement, and consumption of High Energy Physics data. The project is designed to analyze the analysis patterns and data organization that have been used by the CDF, DØ, NO𝜈A, Minos, Minerva and other...

    Go to contribution page
  17. Bo Jayatilaka (Fermi National Accelerator Lab. (US))
    11/10/2016, 12:00
    Track 4: Data Handling
    Oral

    High Energy Physics experiments have long had to deal with huge amounts of data. Other fields of study are now being faced with comparable volumes of experimental data and have similar requirements to organize access by a distributed community of researchers. Fermilab is partnering with the Simons Foundation Autism Research Initiative (SFARI) to adapt Fermilab’s custom HEP data management...

    Go to contribution page
  18. Alvaro Fernandez Casani (Instituto de Fisica Corpuscular (ES))
    11/10/2016, 14:00
    Track 4: Data Handling
    Oral

    The ATLAS EventIndex has been running in production since mid-2015,
    reliably collecting information worldwide about all produced events and storing
    them in a central Hadoop infrastructure at CERN. A subset of this information
    is copied to an Oracle relational database for fast access.
    The system design and its optimization is serving event picking from requests of
    a few events up to scales of...

    Go to contribution page
  19. Nikita Kazeev (Yandex School of Data Analysis (RU))
    11/10/2016, 14:15
    Track 4: Data Handling
    Oral

    The LHCb experiment stores around 10^11 collision events per year. A typical physics analysis deals with a final sample of up to 10^7 events. Event preselection algorithms (lines) are used for data reduction. They are run centrally and check whether an event is useful for a particular physical analysis. The lines are grouped into streams. An event is copied to all the streams its lines belong,...

    Go to contribution page
  20. Dr Sebastien Fabbro (NRC Herzberg)
    11/10/2016, 14:30
    Track 4: Data Handling
    Oral

    The Canadian Advanced Network For Astronomical Research (CANFAR)
    is a digital infrastructure that has been operational for the last
    six years.

    The platform allows astronomers to store, collaborate, distribute and
    analyze large astronomical datasets. We have implemented multi-site storage and
    in collaboration with an HEP group at University of Victoria, multi-cloud processing.
    CANFAR is deeply...

    Go to contribution page
  21. Vincent Garonne (University of Oslo (NO))
    11/10/2016, 14:45
    Track 4: Data Handling
    Oral

    The ATLAS Distributed Data Management (DDM) system has evolved drastically in the last two years with the Rucio software fully
    replacing the previous system before the start of LHC Run-2. The ATLAS DDM system manages now more than 200 petabytes spread on 130
    storage sites and can handle file transfer rates of up to 30Hz. In this talk, we discuss our experience acquired in...

    Go to contribution page
  22. Vakho Tsulaia (Lawrence Berkeley National Lab. (US))
    11/10/2016, 15:00
    Track 4: Data Handling
    Oral

    The ATLAS Event Service (ES) has been designed and implemented for efficient
    running of ATLAS production workflows on a variety of computing platforms, ranging
    from conventional Grid sites to opportunistic, often short-lived resources, such
    as spot market commercial clouds, supercomputers and volunteer computing.
    The Event Service architecture allows real time delivery of fine grained...

    Go to contribution page
  23. Mikhail Hushchyn (Yandex School of Data Analysis (RU))
    12/10/2016, 11:15
    Track 4: Data Handling
    Oral

    The LHCb collaboration is one of the four major experiments at the Large Hadron Collider at CERN. Petabytes of data are generated by the detectors and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make data available to all collaboration members around the world. The data is replicated to the Grid sites in different locations. However, disk storage on the Grid is...

    Go to contribution page
  24. 12/10/2016, 11:30
    Track 4: Data Handling
    Oral

    The upgraded Dynamic Data Management framework, Dynamo, is designed to manage the majority of the CMS data in an automated fashion. At the moment all CMS Tier-1 and Tier-2 data centers host about 50 PB of officical CMS production data which are all managed by this system. There are presently two main pools that Dynamo manages: the Analysis pool for user analysis data, and the Production pool...

    Go to contribution page
  25. Maxim Potekhin (Brookhaven National Laboratory (US))
    12/10/2016, 11:45
    Track 4: Data Handling
    Oral

    The Deep Underground Neutrino Experiment (DUNE) will employ a uniquely large (40kt) Liquid Argon Time Projection chamber as the main component of its Far Detector. In order to validate this design and characterize the detector performance an ambitious experimental program (called "protoDUNE") has been created which includes a beam test of a large-scale DUNE prototype at CERN. The amount of...

    Go to contribution page
  26. PATRICK MEADE (University of Wisconsin-Madison)
    12/10/2016, 12:00
    Track 4: Data Handling
    Oral

    The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope located at the Geographic South Pole. IceCube collects 1 TB of data every day. An online filtering farm processes this data in real time and selects 10% to be sent via satellite to the main data center at the University of Wisconsin-Madison. IceCube has two year-round on-site operators. New operators are hired every year,...

    Go to contribution page
  27. Janusz Martyniak
    12/10/2016, 12:15
    Track 4: Data Handling
    Oral

    The international Muon Ionization Cooling Experiment (MICE) currently operating at the Rutherford Appleton Laboratory in the UK, is designed to demonstrate the principle of muon ionization cooling for application to a future Neutrino Factory or Muon Collider. We present the status of the framework for the movement and curation of both raw and reconstructed data. We also review the...

    Go to contribution page
  28. Malachi Schram
    12/10/2016, 12:30
    Track 4: Data Handling
    Oral

    Motivated by the complex workflows within Belle II, we propose an approach for efficient execution of workflows on distributed resources that integrates provenance, performance modeling, and optimization-based scheduling. The key components of this framework include modeling and simulation methods to quantitatively predict workflow component behavior; optimized decision making such as choosing...

    Go to contribution page
  29. Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Dr Roger Cottrell (SLAC National Accelerator Laboratory), Wei Yang (SLAC National Accelerator Laboratory (US)), Dr Wilko Kroeger (SLAC National Accelerator Laboratory)
    13/10/2016, 11:00
    Track 4: Data Handling
    Oral

    The exponentially increasing need for high speed data transfer is driven by big data, cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software that has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a...

    Go to contribution page
  30. 13/10/2016, 11:15
    Track 4: Data Handling
    Oral

    As many Tier 3 and some Tier 2 centers look toward streamlining operations, they are considering autonomously managed storage elements as part of the solution. These storage elements are essentially file caching servers. They can operate as whole file or data block level caches. Several implementations exist. In this paper we explore using XRootD caching servers that can operate in either...

    Go to contribution page
  31. Jean-Roch Vlimant (California Institute of Technology (US))
    13/10/2016, 11:30
    Track 4: Data Handling
    Oral

    The main goal of the project to demonstrate the ability of using HTTP data
    federations in a manner analogous to today.s AAA infrastructure used from
    the CMS experiment. An initial testbed at Caltech has been built and
    changes in the CMS software (CMSSW) are being implemented in order to
    improve HTTP support. A set of machines is already set up at the Caltech
    Tier2 in order to improve the...

    Go to contribution page
  32. Brian Paul Bockelman (University of Nebraska (US))
    13/10/2016, 11:45
    Track 4: Data Handling
    Oral

    Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically come with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The...

    Go to contribution page
  33. Mario Lassnig (CERN)
    13/10/2016, 12:00
    Track 4: Data Handling
    Oral

    The increasing volume of physics data is posing a critical challenge to the ATLAS experiment. In anticipation of high luminosity
    physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human
    decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to
    automated...

    Go to contribution page
Building timetable...