Conveners
Track 3 Session: #1 (Databases)
- Barthelemy Von Haller (CERN)
Track 3 Session: #2 (Databases, Data access protocols)
- Laurent Aphecetche (Laboratoire de Physique Subatomique et des Technologies Associe)
Track 3 Session: #3 (Hardware and data archival)
- Latchezar Betev (CERN)
Track 3 Session: #4 (Future use cases)
- Shaun de Witt (STFC)
Track 3 Session: #5 (Future use cases)
- Luca Magnoni (CERN)
Description
Data store and access
Andrea Formica
(CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR))
13/04/2015, 14:00
Track3: Data store and access
oral presentation
The ATLAS and CMS Conditions Database infrastructures have served each of the respective experiments well through LHC Run 1, providing efficient access to a wide variety of conditions information needed in online data taking and offline processing and analysis. During the long shutdown between Run 1 and Run 2, we have taken various measures to improve our systems for Run 2. In some cases, a...
Zbigniew Baranowski
(CERN)
13/04/2015, 14:15
Track3: Data store and access
oral presentation
During LHC run 1 ATLAS and LHCb databases have been using Oracle Streams replication technology for their use cases of data movement between online and offline Oracle databases. Moreover ATLAS has been using Streams to replicate conditions data from CERN to selected Tier 1s. GoldenGate is a new technology introduced by Oracle to replace and improve on Streams, by providing better performance,...
Roland Sipos
(Eotvos Lorand University (HU))
13/04/2015, 14:30
Track3: Data store and access
oral presentation
With the restart of the LHC in 2015, the growth of the CMS Conditions dataset will continue, therefore the need of consistent and highly available access to the Conditions makes a great cause to revisit different aspects of the current data storage solutions.
We present a study of alternative data storage backends for the Conditions Databases, by evaluating some of the most popular NoSQL...
Federico Stagni
(CERN)
13/04/2015, 14:45
Track3: Data store and access
oral presentation
Nowadays, many database systems are available but they may not be optimized for storing time series data. The DIRAC job monitoring is a typical use case of such time series. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated.
Choosing an appropriate database for storing huge amounts of time series is not...
Ms
Marina Golosova
(National Research Centre "Kurchatov Institute")
13/04/2015, 15:00
Track3: Data store and access
oral presentation
In recent years the concepts of Big Data became well established in IT-technologies. Most systems (for example Distributed Data Management or Workload Management systems) produce metadata that describes actions performed on jobs, stored data or other entities and its volume takes one to the realms of Big Data on many occasions. This metadata can be used to obtain information about the current...
Michael Boehler
(Albert-Ludwigs-Universitaet Freiburg (DE))
13/04/2015, 15:15
Track3: Data store and access
oral presentation
The ATLAS detector consists of several sub-detector systems. Both data taking and Monte Carlo (MC) simulation rely on an accurate description of the detector conditions from every sub system, such as calibration constants, different scenarios of pile-up and noise conditions, size and position of the beam spot, etc. In order to guarantee database availability for critical online applications...
Dr
Dario Barberis
(Universitร e INFN Genova (IT))
13/04/2015, 15:30
Track3: Data store and access
oral presentation
The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on...
Javier Sanchez
(Instituto de Fisica Corpuscular (ES))
13/04/2015, 15:45
Track3: Data store and access
oral presentation
The ATLAS EventIndex contains records of all events processed by ATLAS, in all processing stages. These records include the references to the files containing each event (the GUID of the file) and the internal โpointerโ to each event in the file. This information is collected by all jobs that run at Tier-0 or on the Grid and process ATLAS events. Each job produces a snippet of information for...
Marko Bracko
(Jozef Stefan Institute (SI))
13/04/2015, 16:30
Track3: Data store and access
oral presentation
The Belle II experiment, a next-generation B factory experiment at the KEK laboratory, Tsukuba, Japan, is expected to collect an experimental data sample fifty times larger than its predecessor, the Belle experiment. The data taking and processing rates are expected to be at least one order of magnitude larger as well.
In order to cope with these large data processing rates and huge data...
Christophe Haen
(CERN)
13/04/2015, 16:45
Track3: Data store and access
oral presentation
In the distributed computing model of LHCb the File Catalog (FC) is a central component that keeps track of each file and replica stored on the Grid. It is federating the LHCb data files in a logical namespace used by all LHCb applications. As a replica catalog, it is used for brokering jobs to sites where their input data is meant to be present, but also by jobs for finding alternative...
Ruben Domingo Gaspar Aparicio
(CERN)
13/04/2015, 17:00
Track3: Data store and access
oral presentation
CERN IT-DB group is migrating its storage platform, mainly NetApp NASโs running on 7-mode but also SAN arrays, to a set of NetApp C-mode clusters. The largest one is made of 14 controllers and it will hold a range of critical databases from administration to accelerators control or experiment control databases. This talk shows our setup: network, monitoring, use of features like transparent...
Jeffrey Michael Dost
(Univ. of California San Diego (US))
13/04/2015, 17:15
Track3: Data store and access
oral presentation
In April of 2014, the UCSD T2 Center deployed hdfs-xrootd-fallback, a UCSD-developed software system that interfaces Hadoop with XRootD to increase reliability of the Hadoop file system. The hdfs-xrootd-fallback system allows a site to depend less on local file replication and more on global replication provided by the XRootD federation to ensure data redundancy. Deploying the software has...
Jakob Blomer
(CERN)
13/04/2015, 17:30
Track3: Data store and access
oral presentation
Fermilab has several physics experiments including NOvA, MicroBooNE, and
the Dark Energy Survey that have computing grid-based applications that
need to read from a shared set of data files. We call this type of data
Auxiliary data to distinguish it from (a) Event data which tends to be
different for every job, and (b) Conditions data which tends to be the
same for each job in a batch of...
Johannes Elmsheuser
(Ludwig-Maximilians-Univ. Muenchen (DE))
13/04/2015, 17:45
Track3: Data store and access
oral presentation
With the exponential growth of LHC (Large Hadron Collider) data in the years 2010-2012, distributed computing has become the established way to analyze collider data. The ATLAS experiment Grid infrastructure includes more than 130 sites worldwide, ranging from large national computing centres to smaller university clusters. So far the storage technologies and access protocols to the clusters...
Thomas Maier
(Ludwig-Maximilians-Univ. Muenchen (DE))
13/04/2015, 18:00
Track3: Data store and access
oral presentation
I/O is a fundamental determinant in the overall performance of physics analysis and other data-intensive scientific computing. It is, further, crucial to effective resource delivery by the facilities and infrastructure that support data-intensive science. To understand I/O performance, clean measurements in controlled environments are essential, but effective optimization requires as well an...
Oliver Keeble
(CERN)
13/04/2015, 18:15
Track3: Data store and access
oral presentation
The DPM project offers an excellent opportunity for comparative testing of the HTTP and xroot protocols for data analysis.
- The DPM storage itself is multi-protocol, allowing comparisons to be performed on the same hardware
- The DPM has been instrumented to produce an i/o monitoring stream, familiar from the xrootd project, regardless of the protocol being used for access
- The...
2.
Mean PB to Failure -- Initial results from a long-term study of disk storage patterns at the RACF
Christopher Hollowell
(Brookhaven National Laboratory)
14/04/2015, 16:30
Track3: Data store and access
oral presentation
The RACF (RHIC-ATLAS Computing Facility) has operated a large, multi-purpose dedicated computing facility since the mid-1990's, serving a worldwide, geographically diverse scientific community that is a major contributor to various HEPN projects. A central component of the RACF is the Linux-based worker node cluster that is used for both computing and data storage purposes. It currently has...
Eric Cano
(CERN)
14/04/2015, 16:45
Track3: Data store and access
oral presentation
CERNโs tape-based archive system has collected over 70 Petabytes of data during the first run of the LHC. The Long Shutdown is being used for migrating the complete 100 Petabytes data archive to higher-density tape media. During LHC Run 2, the archive will have to cope with yearly growth rates of up to 40-50 Petabytes. In this contribution, we will describe the scalable architecture for...
Dr
Andrew Norman
(Fermilab)
14/04/2015, 17:00
Track3: Data store and access
oral presentation
Many experiments in the HEP and Astrophysics communities generate large extremely valuable datasets, which need to be efficiently cataloged and recorded to archival storage. These datasets, both new and legacy, are often structured in a manner that is not conducive to storage and cataloging with modern data handling systems and large file archive facilities. In this paper we discuss in...
Karsten Schwank
(DESY)
14/04/2015, 17:15
Track3: Data store and access
oral presentation
We report on the status of the data preservation project at DESY for the
HERA experiments and present the latest design of the storage which is a
central element for bit-preservation. The HEP experiments based at the
HERA acceleerator at DESY collected large and unique datasets during the
period 1992 to 2007. In addition, corresponding Monte Carlo simulation
datasets were produced, which...
David Yu
(BNL)
14/04/2015, 17:30
Track3: Data store and access
oral presentation
Brookhaven National Lab (BNL)โs RHIC and Atlas Computing Facility (RACF), is supporting science experiments such as RHIC as its Tier-0 center and the U.S. ATLAS/LHC as a Tier-1 center. Scientific data is still growing exponentially after each upgrade. The RACF currently manages over 50 petabytes of data on robotic tape libraries, and we expect a 50% increase in data next year. Not only do we...
Luca Mascetti
(CERN)
14/04/2015, 17:45
Track3: Data store and access
oral presentation
CERN IT DSS operates the main storage resources for data taking and physics analysis mainly via three system: AFS, CASTOR and EOS. The total usable space available for users is about 100 PB (with relative ratios 1:20:120). EOS deploys disk resources across the two CERN computer centres (Meyrin and Wigner) with a current ratio 60% to 40%. IT DSS is also providing sizable on-demand resources for...
Mr
Andreas Joachim Peters
(CERN)
14/04/2015, 18:00
Track3: Data store and access
oral presentation
Archiving data to tape is a critical operation for any storage system, especially for the EOS system at CERN which holds production data from all major LHC experiments. Each collaboration has an allocated quota it can use at any given time therefore, a mechanism for archiving "stale" data is needed so that storage space is reclaimed for online analysis operations.
The archiving tool that we...
Mikhail Hushchyn
(Moscow Institute of Physics and Technology, Moscow)
14/04/2015, 18:15
Track3: Data store and access
oral presentation
The amount of data produced by the LHCb experiment every year consists of several petabytes. This data is kept on disk and tape storage systems. Disks are much faster than tapes, but are way more expensive and hence disk space is limited. It is impossible to fit the whole data taken during the experiment's lifetime on disk, but fortunately fast access to datasets are no longer needed after the...
Manuel Delfino Reznicek
(Universitat Autรฒnoma de Barcelona (ES))
16/04/2015, 09:00
Track3: Data store and access
oral presentation
Several scientific fields, including Astrophysics, Astroparticle Physics, Cosmology, Nuclear and Particle Physics, and Research with Photons, are estimating that by the 2020 decade they will require data handling systems with data volumes approaching the Zettabyte distributed amongst as many as 1018 individually addressable data objects (Zettabyte-Exascale systems). It may be...
Martin Gasthuber
(Deutsches Elektronen-Synchrotron (DE))
16/04/2015, 09:15
Track3: Data store and access
oral presentation
Data taking and analysis infrastructures in HEP have evolved during many years to a well known problem domain. In contrast to HEP, third generations synchrotron light sources, existing and upcoming free electron laser are confronted an explosion in data rates which is primarily driven by recent developments in 2D pixel array detectors. The next generation will produce data in the region...
Dr
Patrick Fuhrmann
(DESY)
16/04/2015, 09:30
Track3: Data store and access
oral presentation
With the great success of the dCache Storage Technology in the framework of the World Wide LHC Computing Grid, an increasing number of non HEP communities were attracted to use dCache for their data management infrastructure. As a natural consequence, the dCache team was presented with new use-cases that stimulated the development of interesting dCache features.
Perhaps the most important...
Dr
Paul Millar
(Deutsches Elektronen-Synchrotron (DE))
16/04/2015, 09:45
Track3: Data store and access
oral presentation
The availability of cheap, easy-to-use sync-and-share cloud services has split the scientific storage world into the traditional big data management systems and the very attractive sync-and-share services. With the former, the location of data is well understood while the latter is mostly operated in the Cloud, resulting in a rather complex legal situation.
Beside legal issues, those two...
Mr
Andreas Joachim Peters
(CERN)
16/04/2015, 10:00
Track3: Data store and access
oral presentation
EOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC and non-LHC experiments and life-cycle management using JBOD hardware for multi PB storage installations. The EOS design implies a split of hot and cold storage and introduced a change of the traditional HSM functionality based workflows...
Christoph Wissing
(Deutsches Elektronen-Synchrotron (DE))
16/04/2015, 10:15
Track3: Data store and access
oral presentation
The CMS experiment at the LHC relies on 7 Tier-1 centres of the WLCG to perform the majority of its bulk processing activity, and to archive its data. During the first run of the LHC, these two functions were tightly coupled as each Tier-1 was constrained to process only the data archived on its hierarchical storage. This lack of flexibility in the assignment of processing workflows...
Dr
Samuel Cadellin Skipsey
16/04/2015, 11:00
Track3: Data store and access
oral presentation
The *Object Store* model has quickly become the de-facto basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems.
Many of the assumptions in object store design are similar, but not identical, to concepts in the design of Grid Storage Elements,...
Mr
Michael Poat
(Brookhaven National Laboratory)
16/04/2015, 11:15
Track3: Data store and access
oral presentation
The STAR online computing environment is an intensive ever-growing system used for first-hand data collection and analysis. As systems become more sophisticated, they result in a more detailed dense collection of data output and inefficient limited storage systems have become an impediment to fast feedback to the online shift crews relying on data processing at near real-time speed. Motivation...
Dr
Hironori Ito
(Brookhaven National Laboratory (US))
16/04/2015, 11:30
Track3: Data store and access
oral presentation
Ceph based storage solutions are becoming increasingly popular within the HEP/NP community over the last few years. With the current status of the Ceph project, both its object storage and block storage layers are production ready on a large scale, and even the Ceph file system (CephFS) storage layer is rapidly getting to that state as well. This contribution contains a thorough review of...
Mr
Andreas Joachim Peters
(CERN)
16/04/2015, 11:45
Track3: Data store and access
oral presentation
In 2013, CERN IT evaluated then deployed a petabyte-scale Ceph cluster to support OpenStack use-cases in production. As of fall 2014, this cluster stores around 300 TB of data comprising more than a thousand VM images and a similar number of block device volumes. With more than a year of smooth operations, we will present our experience and tuning best-practices.
Beyond the cloud storage...
Mr
Andreas Joachim Peters
(CERN)
16/04/2015, 12:00
Track3: Data store and access
oral presentation
The EOS storage software was designed to cover CERN disk-only storage use cases in the medium-term trading scalability against latency. To cover and prepare for long-term requirements the CERN IT data and storage services group (DSS) is actively conducting R&D and open source contributions to experiment with a next generation storage software based on CEPH.
CEPH provides a scale-out object...