Conveners
Track 3 Session: #1 (Databases)
- Barthelemy Von Haller (CERN)
Track 3 Session: #2 (Databases, Data access protocols)
- Laurent Aphecetche (Laboratoire de Physique Subatomique et des Technologies Associe)
Track 3 Session: #3 (Hardware and data archival)
- Latchezar Betev (CERN)
Track 3 Session: #4 (Future use cases)
- Shaun de Witt (STFC)
Track 3 Session: #5 (Future use cases)
- Luca Magnoni (CERN)
Description
Data store and access
-
Andrea Formica (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR))13/04/2015, 14:00Track3: Data store and accessoral presentationThe ATLAS and CMS Conditions Database infrastructures have served each of the respective experiments well through LHC Run 1, providing efficient access to a wide variety of conditions information needed in online data taking and offline processing and analysis. During the long shutdown between Run 1 and Run 2, we have taken various measures to improve our systems for Run 2. In some cases, a...Go to contribution page
-
Zbigniew Baranowski (CERN)13/04/2015, 14:15Track3: Data store and accessoral presentationDuring LHC run 1 ATLAS and LHCb databases have been using Oracle Streams replication technology for their use cases of data movement between online and offline Oracle databases. Moreover ATLAS has been using Streams to replicate conditions data from CERN to selected Tier 1s. GoldenGate is a new technology introduced by Oracle to replace and improve on Streams, by providing better performance,...Go to contribution page
-
Roland Sipos (Eotvos Lorand University (HU))13/04/2015, 14:30Track3: Data store and accessoral presentationWith the restart of the LHC in 2015, the growth of the CMS Conditions dataset will continue, therefore the need of consistent and highly available access to the Conditions makes a great cause to revisit different aspects of the current data storage solutions. We present a study of alternative data storage backends for the Conditions Databases, by evaluating some of the most popular NoSQL...Go to contribution page
-
Federico Stagni (CERN)13/04/2015, 14:45Track3: Data store and accessoral presentationNowadays, many database systems are available but they may not be optimized for storing time series data. The DIRAC job monitoring is a typical use case of such time series. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series is not...Go to contribution page
-
Ms Marina Golosova (National Research Centre "Kurchatov Institute")13/04/2015, 15:00Track3: Data store and accessoral presentationIn recent years the concepts of Big Data became well established in IT-technologies. Most systems (for example Distributed Data Management or Workload Management systems) produce metadata that describes actions performed on jobs, stored data or other entities and its volume takes one to the realms of Big Data on many occasions. This metadata can be used to obtain information about the current...Go to contribution page
-
Michael Boehler (Albert-Ludwigs-Universitaet Freiburg (DE))13/04/2015, 15:15Track3: Data store and accessoral presentationThe ATLAS detector consists of several sub-detector systems. Both data taking and Monte Carlo (MC) simulation rely on an accurate description of the detector conditions from every sub system, such as calibration constants, different scenarios of pile-up and noise conditions, size and position of the beam spot, etc. In order to guarantee database availability for critical online applications...Go to contribution page
-
Dr Dario Barberis (Universitร e INFN Genova (IT))13/04/2015, 15:30Track3: Data store and accessoral presentationThe EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on...Go to contribution page
-
Javier Sanchez (Instituto de Fisica Corpuscular (ES))13/04/2015, 15:45Track3: Data store and accessoral presentationThe ATLAS EventIndex contains records of all events processed by ATLAS, in all processing stages. These records include the references to the files containing each event (the GUID of the file) and the internal โpointerโ to each event in the file. This information is collected by all jobs that run at Tier-0 or on the Grid and process ATLAS events. Each job produces a snippet of information for...Go to contribution page
-
Marko Bracko (Jozef Stefan Institute (SI))13/04/2015, 16:30Track3: Data store and accessoral presentationThe Belle II experiment, a next-generation B factory experiment at the KEK laboratory, Tsukuba, Japan, is expected to collect an experimental data sample fifty times larger than its predecessor, the Belle experiment. The data taking and processing rates are expected to be at least one order of magnitude larger as well. In order to cope with these large data processing rates and huge data...Go to contribution page
-
Christophe Haen (CERN)13/04/2015, 16:45Track3: Data store and accessoral presentationIn the distributed computing model of LHCb the File Catalog (FC) is a central component that keeps track of each file and replica stored on the Grid. It is federating the LHCb data files in a logical namespace used by all LHCb applications. As a replica catalog, it is used for brokering jobs to sites where their input data is meant to be present, but also by jobs for finding alternative...Go to contribution page
-
Ruben Domingo Gaspar Aparicio (CERN)13/04/2015, 17:00Track3: Data store and accessoral presentationCERN IT-DB group is migrating its storage platform, mainly NetApp NASโs running on 7-mode but also SAN arrays, to a set of NetApp C-mode clusters. The largest one is made of 14 controllers and it will hold a range of critical databases from administration to accelerators control or experiment control databases. This talk shows our setup: network, monitoring, use of features like transparent...Go to contribution page
-
Jeffrey Michael Dost (Univ. of California San Diego (US))13/04/2015, 17:15Track3: Data store and accessoral presentationIn April of 2014, the UCSD T2 Center deployed hdfs-xrootd-fallback, a UCSD-developed software system that interfaces Hadoop with XRootD to increase reliability of the Hadoop file system. The hdfs-xrootd-fallback system allows a site to depend less on local file replication and more on global replication provided by the XRootD federation to ensure data redundancy. Deploying the software has...Go to contribution page
-
Jakob Blomer (CERN)13/04/2015, 17:30Track3: Data store and accessoral presentationFermilab has several physics experiments including NOvA, MicroBooNE, and the Dark Energy Survey that have computing grid-based applications that need to read from a shared set of data files. We call this type of data Auxiliary data to distinguish it from (a) Event data which tends to be different for every job, and (b) Conditions data which tends to be the same for each job in a batch of...Go to contribution page
-
Johannes Elmsheuser (Ludwig-Maximilians-Univ. Muenchen (DE))13/04/2015, 17:45Track3: Data store and accessoral presentationWith the exponential growth of LHC (Large Hadron Collider) data in the years 2010-2012, distributed computing has become the established way to analyze collider data. The ATLAS experiment Grid infrastructure includes more than 130 sites worldwide, ranging from large national computing centres to smaller university clusters. So far the storage technologies and access protocols to the clusters...Go to contribution page
-
Thomas Maier (Ludwig-Maximilians-Univ. Muenchen (DE))13/04/2015, 18:00Track3: Data store and accessoral presentationI/O is a fundamental determinant in the overall performance of physics analysis and other data-intensive scientific computing. It is, further, crucial to effective resource delivery by the facilities and infrastructure that support data-intensive science. To understand I/O performance, clean measurements in controlled environments are essential, but effective optimization requires as well an...Go to contribution page
-
Oliver Keeble (CERN)13/04/2015, 18:15Track3: Data store and accessoral presentationThe DPM project offers an excellent opportunity for comparative testing of the HTTP and xroot protocols for data analysis. - The DPM storage itself is multi-protocol, allowing comparisons to be performed on the same hardware - The DPM has been instrumented to produce an i/o monitoring stream, familiar from the xrootd project, regardless of the protocol being used for access - The...Go to contribution page
-
2. Mean PB to Failure -- Initial results from a long-term study of disk storage patterns at the RACFChristopher Hollowell (Brookhaven National Laboratory)14/04/2015, 16:30Track3: Data store and accessoral presentationThe RACF (RHIC-ATLAS Computing Facility) has operated a large, multi-purpose dedicated computing facility since the mid-1990's, serving a worldwide, geographically diverse scientific community that is a major contributor to various HEPN projects. A central component of the RACF is the Linux-based worker node cluster that is used for both computing and data storage purposes. It currently has...Go to contribution page
-
Eric Cano (CERN)14/04/2015, 16:45Track3: Data store and accessoral presentationCERNโs tape-based archive system has collected over 70 Petabytes of data during the first run of the LHC. The Long Shutdown is being used for migrating the complete 100 Petabytes data archive to higher-density tape media. During LHC Run 2, the archive will have to cope with yearly growth rates of up to 40-50 Petabytes. In this contribution, we will describe the scalable architecture for...Go to contribution page
-
Dr Andrew Norman (Fermilab)14/04/2015, 17:00Track3: Data store and accessoral presentationMany experiments in the HEP and Astrophysics communities generate large extremely valuable datasets, which need to be efficiently cataloged and recorded to archival storage. These datasets, both new and legacy, are often structured in a manner that is not conducive to storage and cataloging with modern data handling systems and large file archive facilities. In this paper we discuss in...Go to contribution page
-
Karsten Schwank (DESY)14/04/2015, 17:15Track3: Data store and accessoral presentationWe report on the status of the data preservation project at DESY for the HERA experiments and present the latest design of the storage which is a central element for bit-preservation. The HEP experiments based at the HERA acceleerator at DESY collected large and unique datasets during the period 1992 to 2007. In addition, corresponding Monte Carlo simulation datasets were produced, which...Go to contribution page
-
David Yu (BNL)14/04/2015, 17:30Track3: Data store and accessoral presentationBrookhaven National Lab (BNL)โs RHIC and Atlas Computing Facility (RACF), is supporting science experiments such as RHIC as its Tier-0 center and the U.S. ATLAS/LHC as a Tier-1 center. Scientific data is still growing exponentially after each upgrade. The RACF currently manages over 50 petabytes of data on robotic tape libraries, and we expect a 50% increase in data next year. Not only do we...Go to contribution page
-
Luca Mascetti (CERN)14/04/2015, 17:45Track3: Data store and accessoral presentationCERN IT DSS operates the main storage resources for data taking and physics analysis mainly via three system: AFS, CASTOR and EOS. The total usable space available for users is about 100 PB (with relative ratios 1:20:120). EOS deploys disk resources across the two CERN computer centres (Meyrin and Wigner) with a current ratio 60% to 40%. IT DSS is also providing sizable on-demand resources for...Go to contribution page
-
Mr Andreas Joachim Peters (CERN)14/04/2015, 18:00Track3: Data store and accessoral presentationArchiving data to tape is a critical operation for any storage system, especially for the EOS system at CERN which holds production data from all major LHC experiments. Each collaboration has an allocated quota it can use at any given time therefore, a mechanism for archiving "stale" data is needed so that storage space is reclaimed for online analysis operations. The archiving tool that we...Go to contribution page
-
Mikhail Hushchyn (Moscow Institute of Physics and Technology, Moscow)14/04/2015, 18:15Track3: Data store and accessoral presentationThe amount of data produced by the LHCb experiment every year consists of several petabytes. This data is kept on disk and tape storage systems. Disks are much faster than tapes, but are way more expensive and hence disk space is limited. It is impossible to fit the whole data taken during the experiment's lifetime on disk, but fortunately fast access to datasets are no longer needed after the...Go to contribution page
-
Manuel Delfino Reznicek (Universitat Autรฒnoma de Barcelona (ES))16/04/2015, 09:00Track3: Data store and accessoral presentationSeveral scientific fields, including Astrophysics, Astroparticle Physics, Cosmology, Nuclear and Particle Physics, and Research with Photons, are estimating that by the 2020 decade they will require data handling systems with data volumes approaching the Zettabyte distributed amongst as many as 1018 individually addressable data objects (Zettabyte-Exascale systems). It may be...Go to contribution page
-
Martin Gasthuber (Deutsches Elektronen-Synchrotron (DE))16/04/2015, 09:15Track3: Data store and accessoral presentationData taking and analysis infrastructures in HEP have evolved during many years to a well known problem domain. In contrast to HEP, third generations synchrotron light sources, existing and upcoming free electron laser are confronted an explosion in data rates which is primarily driven by recent developments in 2D pixel array detectors. The next generation will produce data in the region...Go to contribution page
-
Dr Patrick Fuhrmann (DESY)16/04/2015, 09:30Track3: Data store and accessoral presentationWith the great success of the dCache Storage Technology in the framework of the World Wide LHC Computing Grid, an increasing number of non HEP communities were attracted to use dCache for their data management infrastructure. As a natural consequence, the dCache team was presented with new use-cases that stimulated the development of interesting dCache features. Perhaps the most important...Go to contribution page
-
Dr Paul Millar (Deutsches Elektronen-Synchrotron (DE))16/04/2015, 09:45Track3: Data store and accessoral presentationThe availability of cheap, easy-to-use sync-and-share cloud services has split the scientific storage world into the traditional big data management systems and the very attractive sync-and-share services. With the former, the location of data is well understood while the latter is mostly operated in the Cloud, resulting in a rather complex legal situation. Beside legal issues, those two...Go to contribution page
-
Mr Andreas Joachim Peters (CERN)16/04/2015, 10:00Track3: Data store and accessoral presentationEOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC and non-LHC experiments and life-cycle management using JBOD hardware for multi PB storage installations. The EOS design implies a split of hot and cold storage and introduced a change of the traditional HSM functionality based workflows...Go to contribution page
-
Christoph Wissing (Deutsches Elektronen-Synchrotron (DE))16/04/2015, 10:15Track3: Data store and accessoral presentationThe CMS experiment at the LHC relies on 7 Tier-1 centres of the WLCG to perform the majority of its bulk processing activity, and to archive its data. During the first run of the LHC, these two functions were tightly coupled as each Tier-1 was constrained to process only the data archived on its hierarchical storage. This lack of flexibility in the assignment of processing workflows...Go to contribution page
-
Dr Samuel Cadellin Skipsey16/04/2015, 11:00Track3: Data store and accessoral presentationThe *Object Store* model has quickly become the de-facto basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in object store design are similar, but not identical, to concepts in the design of Grid Storage Elements,...Go to contribution page
-
Mr Michael Poat (Brookhaven National Laboratory)16/04/2015, 11:15Track3: Data store and accessoral presentationThe STAR online computing environment is an intensive ever-growing system used for first-hand data collection and analysis. As systems become more sophisticated, they result in a more detailed dense collection of data output and inefficient limited storage systems have become an impediment to fast feedback to the online shift crews relying on data processing at near real-time speed. Motivation...Go to contribution page
-
Dr Hironori Ito (Brookhaven National Laboratory (US))16/04/2015, 11:30Track3: Data store and accessoral presentationCeph based storage solutions are becoming increasingly popular within the HEP/NP community over the last few years. With the current status of the Ceph project, both its object storage and block storage layers are production ready on a large scale, and even the Ceph file system (CephFS) storage layer is rapidly getting to that state as well. This contribution contains a thorough review of...Go to contribution page
-
Mr Andreas Joachim Peters (CERN)16/04/2015, 11:45Track3: Data store and accessoral presentationIn 2013, CERN IT evaluated then deployed a petabyte-scale Ceph cluster to support OpenStack use-cases in production. As of fall 2014, this cluster stores around 300 TB of data comprising more than a thousand VM images and a similar number of block device volumes. With more than a year of smooth operations, we will present our experience and tuning best-practices. Beyond the cloud storage...Go to contribution page
-
Mr Andreas Joachim Peters (CERN)16/04/2015, 12:00Track3: Data store and accessoral presentationThe EOS storage software was designed to cover CERN disk-only storage use cases in the medium-term trading scalability against latency. To cover and prepare for long-term requirements the CERN IT data and storage services group (DSS) is actively conducting R&D and open source contributions to experiment with a next generation storage software based on CEPH. CEPH provides a scale-out object...Go to contribution page