Session

Track 3 Session

13 Apr 2015, 14:00
OIST

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495

Conveners

Track 3 Session: #1 (Databases)

  • Barthelemy Von Haller (CERN)

Track 3 Session: #2 (Databases, Data access protocols)

  • Laurent Aphecetche (Laboratoire de Physique Subatomique et des Technologies Associe)

Track 3 Session: #3 (Hardware and data archival)

  • Latchezar Betev (CERN)

Track 3 Session: #4 (Future use cases)

  • Shaun de Witt (STFC)

Track 3 Session: #5 (Future use cases)

  • Luca Magnoni (CERN)

Description

Data store and access

Presentation materials

There are no materials yet.

  1. Andrea Formica (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR))
    13/04/2015, 14:00
    Track3: Data store and access
    oral presentation
    The ATLAS and CMS Conditions Database infrastructures have served each of the respective experiments well through LHC Run 1, providing efficient access to a wide variety of conditions information needed in online data taking and offline processing and analysis. During the long shutdown between Run 1 and Run 2, we have taken various measures to improve our systems for Run 2. In some cases, a...
    Go to contribution page
  2. Zbigniew Baranowski (CERN)
    13/04/2015, 14:15
    Track3: Data store and access
    oral presentation
    During LHC run 1 ATLAS and LHCb databases have been using Oracle Streams replication technology for their use cases of data movement between online and offline Oracle databases. Moreover ATLAS has been using Streams to replicate conditions data from CERN to selected Tier 1s. GoldenGate is a new technology introduced by Oracle to replace and improve on Streams, by providing better performance,...
    Go to contribution page
  3. Roland Sipos (Eotvos Lorand University (HU))
    13/04/2015, 14:30
    Track3: Data store and access
    oral presentation
    With the restart of the LHC in 2015, the growth of the CMS Conditions dataset will continue, therefore the need of consistent and highly available access to the Conditions makes a great cause to revisit different aspects of the current data storage solutions. We present a study of alternative data storage backends for the Conditions Databases, by evaluating some of the most popular NoSQL...
    Go to contribution page
  4. Federico Stagni (CERN)
    13/04/2015, 14:45
    Track3: Data store and access
    oral presentation
    Nowadays, many database systems are available but they may not be optimized for storing time series data. The DIRAC job monitoring is a typical use case of such time series. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series is not...
    Go to contribution page
  5. Ms Marina Golosova (National Research Centre "Kurchatov Institute")
    13/04/2015, 15:00
    Track3: Data store and access
    oral presentation
    In recent years the concepts of Big Data became well established in IT-technologies. Most systems (for example Distributed Data Management or Workload Management systems) produce metadata that describes actions performed on jobs, stored data or other entities and its volume takes one to the realms of Big Data on many occasions. This metadata can be used to obtain information about the current...
    Go to contribution page
  6. Michael Boehler (Albert-Ludwigs-Universitaet Freiburg (DE))
    13/04/2015, 15:15
    Track3: Data store and access
    oral presentation
    The ATLAS detector consists of several sub-detector systems. Both data taking and Monte Carlo (MC) simulation rely on an accurate description of the detector conditions from every sub system, such as calibration constants, different scenarios of pile-up and noise conditions, size and position of the beam spot, etc. In order to guarantee database availability for critical online applications...
    Go to contribution page
  7. Dr Dario Barberis (Universitร  e INFN Genova (IT))
    13/04/2015, 15:30
    Track3: Data store and access
    oral presentation
    The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on...
    Go to contribution page
  8. Javier Sanchez (Instituto de Fisica Corpuscular (ES))
    13/04/2015, 15:45
    Track3: Data store and access
    oral presentation
    The ATLAS EventIndex contains records of all events processed by ATLAS, in all processing stages. These records include the references to the files containing each event (the GUID of the file) and the internal โ€œpointerโ€ to each event in the file. This information is collected by all jobs that run at Tier-0 or on the Grid and process ATLAS events. Each job produces a snippet of information for...
    Go to contribution page
  9. Marko Bracko (Jozef Stefan Institute (SI))
    13/04/2015, 16:30
    Track3: Data store and access
    oral presentation
    The Belle II experiment, a next-generation B factory experiment at the KEK laboratory, Tsukuba, Japan, is expected to collect an experimental data sample fifty times larger than its predecessor, the Belle experiment. The data taking and processing rates are expected to be at least one order of magnitude larger as well. In order to cope with these large data processing rates and huge data...
    Go to contribution page
  10. Christophe Haen (CERN)
    13/04/2015, 16:45
    Track3: Data store and access
    oral presentation
    In the distributed computing model of LHCb the File Catalog (FC) is a central component that keeps track of each file and replica stored on the Grid. It is federating the LHCb data files in a logical namespace used by all LHCb applications. As a replica catalog, it is used for brokering jobs to sites where their input data is meant to be present, but also by jobs for finding alternative...
    Go to contribution page
  11. Ruben Domingo Gaspar Aparicio (CERN)
    13/04/2015, 17:00
    Track3: Data store and access
    oral presentation
    CERN IT-DB group is migrating its storage platform, mainly NetApp NASโ€™s running on 7-mode but also SAN arrays, to a set of NetApp C-mode clusters. The largest one is made of 14 controllers and it will hold a range of critical databases from administration to accelerators control or experiment control databases. This talk shows our setup: network, monitoring, use of features like transparent...
    Go to contribution page
  12. Jeffrey Michael Dost (Univ. of California San Diego (US))
    13/04/2015, 17:15
    Track3: Data store and access
    oral presentation
    In April of 2014, the UCSD T2 Center deployed hdfs-xrootd-fallback, a UCSD-developed software system that interfaces Hadoop with XRootD to increase reliability of the Hadoop file system. The hdfs-xrootd-fallback system allows a site to depend less on local file replication and more on global replication provided by the XRootD federation to ensure data redundancy. Deploying the software has...
    Go to contribution page
  13. Jakob Blomer (CERN)
    13/04/2015, 17:30
    Track3: Data store and access
    oral presentation
    Fermilab has several physics experiments including NOvA, MicroBooNE, and the Dark Energy Survey that have computing grid-based applications that need to read from a shared set of data files. We call this type of data Auxiliary data to distinguish it from (a) Event data which tends to be different for every job, and (b) Conditions data which tends to be the same for each job in a batch of...
    Go to contribution page
  14. Johannes Elmsheuser (Ludwig-Maximilians-Univ. Muenchen (DE))
    13/04/2015, 17:45
    Track3: Data store and access
    oral presentation
    With the exponential growth of LHC (Large Hadron Collider) data in the years 2010-2012, distributed computing has become the established way to analyze collider data. The ATLAS experiment Grid infrastructure includes more than 130 sites worldwide, ranging from large national computing centres to smaller university clusters. So far the storage technologies and access protocols to the clusters...
    Go to contribution page
  15. Thomas Maier (Ludwig-Maximilians-Univ. Muenchen (DE))
    13/04/2015, 18:00
    Track3: Data store and access
    oral presentation
    I/O is a fundamental determinant in the overall performance of physics analysis and other data-intensive scientific computing. It is, further, crucial to effective resource delivery by the facilities and infrastructure that support data-intensive science. To understand I/O performance, clean measurements in controlled environments are essential, but effective optimization requires as well an...
    Go to contribution page
  16. Oliver Keeble (CERN)
    13/04/2015, 18:15
    Track3: Data store and access
    oral presentation
    The DPM project offers an excellent opportunity for comparative testing of the HTTP and xroot protocols for data analysis. - The DPM storage itself is multi-protocol, allowing comparisons to be performed on the same hardware - The DPM has been instrumented to produce an i/o monitoring stream, familiar from the xrootd project, regardless of the protocol being used for access - The...
    Go to contribution page
  17. Christopher Hollowell (Brookhaven National Laboratory)
    14/04/2015, 16:30
    Track3: Data store and access
    oral presentation
    The RACF (RHIC-ATLAS Computing Facility) has operated a large, multi-purpose dedicated computing facility since the mid-1990's, serving a worldwide, geographically diverse scientific community that is a major contributor to various HEPN projects. A central component of the RACF is the Linux-based worker node cluster that is used for both computing and data storage purposes. It currently has...
    Go to contribution page
  18. Eric Cano (CERN)
    14/04/2015, 16:45
    Track3: Data store and access
    oral presentation
    CERNโ€™s tape-based archive system has collected over 70 Petabytes of data during the first run of the LHC. The Long Shutdown is being used for migrating the complete 100 Petabytes data archive to higher-density tape media. During LHC Run 2, the archive will have to cope with yearly growth rates of up to 40-50 Petabytes. In this contribution, we will describe the scalable architecture for...
    Go to contribution page
  19. Dr Andrew Norman (Fermilab)
    14/04/2015, 17:00
    Track3: Data store and access
    oral presentation
    Many experiments in the HEP and Astrophysics communities generate large extremely valuable datasets, which need to be efficiently cataloged and recorded to archival storage. These datasets, both new and legacy, are often structured in a manner that is not conducive to storage and cataloging with modern data handling systems and large file archive facilities. In this paper we discuss in...
    Go to contribution page
  20. Karsten Schwank (DESY)
    14/04/2015, 17:15
    Track3: Data store and access
    oral presentation
    We report on the status of the data preservation project at DESY for the HERA experiments and present the latest design of the storage which is a central element for bit-preservation. The HEP experiments based at the HERA acceleerator at DESY collected large and unique datasets during the period 1992 to 2007. In addition, corresponding Monte Carlo simulation datasets were produced, which...
    Go to contribution page
  21. David Yu (BNL)
    14/04/2015, 17:30
    Track3: Data store and access
    oral presentation
    Brookhaven National Lab (BNL)โ€™s RHIC and Atlas Computing Facility (RACF), is supporting science experiments such as RHIC as its Tier-0 center and the U.S. ATLAS/LHC as a Tier-1 center. Scientific data is still growing exponentially after each upgrade. The RACF currently manages over 50 petabytes of data on robotic tape libraries, and we expect a 50% increase in data next year. Not only do we...
    Go to contribution page
  22. Luca Mascetti (CERN)
    14/04/2015, 17:45
    Track3: Data store and access
    oral presentation
    CERN IT DSS operates the main storage resources for data taking and physics analysis mainly via three system: AFS, CASTOR and EOS. The total usable space available for users is about 100 PB (with relative ratios 1:20:120). EOS deploys disk resources across the two CERN computer centres (Meyrin and Wigner) with a current ratio 60% to 40%. IT DSS is also providing sizable on-demand resources for...
    Go to contribution page
  23. Mr Andreas Joachim Peters (CERN)
    14/04/2015, 18:00
    Track3: Data store and access
    oral presentation
    Archiving data to tape is a critical operation for any storage system, especially for the EOS system at CERN which holds production data from all major LHC experiments. Each collaboration has an allocated quota it can use at any given time therefore, a mechanism for archiving "stale" data is needed so that storage space is reclaimed for online analysis operations. The archiving tool that we...
    Go to contribution page
  24. Mikhail Hushchyn (Moscow Institute of Physics and Technology, Moscow)
    14/04/2015, 18:15
    Track3: Data store and access
    oral presentation
    The amount of data produced by the LHCb experiment every year consists of several petabytes. This data is kept on disk and tape storage systems. Disks are much faster than tapes, but are way more expensive and hence disk space is limited. It is impossible to fit the whole data taken during the experiment's lifetime on disk, but fortunately fast access to datasets are no longer needed after the...
    Go to contribution page
  25. Manuel Delfino Reznicek (Universitat Autรฒnoma de Barcelona (ES))
    16/04/2015, 09:00
    Track3: Data store and access
    oral presentation
    Several scientific fields, including Astrophysics, Astroparticle Physics, Cosmology, Nuclear and Particle Physics, and Research with Photons, are estimating that by the 2020 decade they will require data handling systems with data volumes approaching the Zettabyte distributed amongst as many as 1018 individually addressable data objects (Zettabyte-Exascale systems). It may be...
    Go to contribution page
  26. Martin Gasthuber (Deutsches Elektronen-Synchrotron (DE))
    16/04/2015, 09:15
    Track3: Data store and access
    oral presentation
    Data taking and analysis infrastructures in HEP have evolved during many years to a well known problem domain. In contrast to HEP, third generations synchrotron light sources, existing and upcoming free electron laser are confronted an explosion in data rates which is primarily driven by recent developments in 2D pixel array detectors. The next generation will produce data in the region...
    Go to contribution page
  27. Dr Patrick Fuhrmann (DESY)
    16/04/2015, 09:30
    Track3: Data store and access
    oral presentation
    With the great success of the dCache Storage Technology in the framework of the World Wide LHC Computing Grid, an increasing number of non HEP communities were attracted to use dCache for their data management infrastructure. As a natural consequence, the dCache team was presented with new use-cases that stimulated the development of interesting dCache features. Perhaps the most important...
    Go to contribution page
  28. Dr Paul Millar (Deutsches Elektronen-Synchrotron (DE))
    16/04/2015, 09:45
    Track3: Data store and access
    oral presentation
    The availability of cheap, easy-to-use sync-and-share cloud services has split the scientific storage world into the traditional big data management systems and the very attractive sync-and-share services. With the former, the location of data is well understood while the latter is mostly operated in the Cloud, resulting in a rather complex legal situation. Beside legal issues, those two...
    Go to contribution page
  29. Mr Andreas Joachim Peters (CERN)
    16/04/2015, 10:00
    Track3: Data store and access
    oral presentation
    EOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC and non-LHC experiments and life-cycle management using JBOD hardware for multi PB storage installations. The EOS design implies a split of hot and cold storage and introduced a change of the traditional HSM functionality based workflows...
    Go to contribution page
  30. Christoph Wissing (Deutsches Elektronen-Synchrotron (DE))
    16/04/2015, 10:15
    Track3: Data store and access
    oral presentation
    The CMS experiment at the LHC relies on 7 Tier-1 centres of the WLCG to perform the majority of its bulk processing activity, and to archive its data. During the first run of the LHC, these two functions were tightly coupled as each Tier-1 was constrained to process only the data archived on its hierarchical storage. This lack of flexibility in the assignment of processing workflows...
    Go to contribution page
  31. Dr Samuel Cadellin Skipsey
    16/04/2015, 11:00
    Track3: Data store and access
    oral presentation
    The *Object Store* model has quickly become the de-facto basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in object store design are similar, but not identical, to concepts in the design of Grid Storage Elements,...
    Go to contribution page
  32. Mr Michael Poat (Brookhaven National Laboratory)
    16/04/2015, 11:15
    Track3: Data store and access
    oral presentation
    The STAR online computing environment is an intensive ever-growing system used for first-hand data collection and analysis. As systems become more sophisticated, they result in a more detailed dense collection of data output and inefficient limited storage systems have become an impediment to fast feedback to the online shift crews relying on data processing at near real-time speed. Motivation...
    Go to contribution page
  33. Dr Hironori Ito (Brookhaven National Laboratory (US))
    16/04/2015, 11:30
    Track3: Data store and access
    oral presentation
    Ceph based storage solutions are becoming increasingly popular within the HEP/NP community over the last few years. With the current status of the Ceph project, both its object storage and block storage layers are production ready on a large scale, and even the Ceph file system (CephFS) storage layer is rapidly getting to that state as well. This contribution contains a thorough review of...
    Go to contribution page
  34. Mr Andreas Joachim Peters (CERN)
    16/04/2015, 11:45
    Track3: Data store and access
    oral presentation
    In 2013, CERN IT evaluated then deployed a petabyte-scale Ceph cluster to support OpenStack use-cases in production. As of fall 2014, this cluster stores around 300 TB of data comprising more than a thousand VM images and a similar number of block device volumes. With more than a year of smooth operations, we will present our experience and tuning best-practices. Beyond the cloud storage...
    Go to contribution page
  35. Mr Andreas Joachim Peters (CERN)
    16/04/2015, 12:00
    Track3: Data store and access
    oral presentation
    The EOS storage software was designed to cover CERN disk-only storage use cases in the medium-term trading scalability against latency. To cover and prepare for long-term requirements the CERN IT data and storage services group (DSS) is actively conducting R&D and open source contributions to experiment with a next generation storage software based on CEPH. CEPH provides a scale-out object...
    Go to contribution page
Building timetable...