Tomasz Rybczynski
(AGH University of Science and Technology (PL))
14/10/2013, 13:30
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The LHCb experiment records millions of proton collisions every second, but only a fraction of them are useful for LHCb physics.
In order to filter out the "bad events" a large farm of x86-servers (~2000 nodes) has been put in place. These servers boot from and run from NFS, however they use their local disk to temporarily store data, which cannot be processed in real-time...
Prof.
Gang CHEN
(INSTITUTE OF HIGH ENERGY PHYSICS), Dr
Wenjing Wu
(IHEP, CAS)
14/10/2013, 13:53
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The limitation of scheduling modules and the gradual addition of disk pools in distributed storage systems often result in imbalances among their disk pools in terms of both available space and number of files. This can cause various problems to the storage system such as single point of failure, low system throughput and imbalanced resource utilization and system loads. An algorithm named...
Xavier Espinal Curull
(CERN)
14/10/2013, 14:16
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Data Storage and Services (DSS) group at CERN stores and provides access to the data coming from the LHC and other physics experiments. We implement specialized storage services to provide tools for an optimal data management, based on the evolution of data volumes, the available technologies and the observed experiment and users usage patterns. Our current solutions are CASTOR for...
Christos Filippidis
(Nat. Cent. for Sci. Res. Demokritos (GR))
14/10/2013, 14:39
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Given the current state of I/O and storage systems in petascale systems, incremental solutions in most aspects are unlikely to provide the required capabilities in exascale systems. Traditionally I/O has been considered as a separate activity that is performed before or after the main simulation or analysis computation, or periodically for activities such as check-pointing, but still as...
Dr
Daniel van der Ster
(CERN)
14/10/2013, 15:45
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Emerging storage requirements, such as the need for block storage for both OpenStack VMs and file services like AFS and NFS, have motivated the development of a generic backend storage service for CERN IT. The goals for such a service include (a) vendor neutrality, (b) horizontal scalability with commodity hardware, (c) fault tolerance at the disk, host, and network levels, and (d) support for...
Dr
Jakub Moscicki
(CERN)
14/10/2013, 16:08
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Individual users at CERN are attracted by external file hosting services such as Dropbox. This trend may lead to what is know as the "Dropbox Problem": sensitive organization data stored on servers outside of corporate control, outside of established policies, outside of enforceable SLAs and in unknown geographical locations. Mitigating this risk also provides a good incentive to rethink how...
Dr
Wang Lu
(Institute of High Energy Physics,CAS)
14/10/2013, 16:31
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Object storage systems based on Amazonโs Simple Storage Service (S3) have substantially developed in the last few years. The scalability, durability and elasticity characteristics of those systems make them well suited for a range of use cases where data is written, seldom updated and frequently read. Storage of images, static web sites and backup systems are some of the use cases where S3...
Seppo Sakari Heikkila
(CERN)
14/10/2013, 16:54
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Cloud storage is an emerging architecture aiming to provide increased scalability and access performance, compared to more traditional solutions. CERN is evaluating this promise using Huawei UDS and OpenStack storage deployments, focusing on the needs of high-energy physics. Both deployed setups implement S3, one of the protocols that are emerging as standard in the cloud storage market. A set...
Dr
Jamie Shiers
(CERN)
14/10/2013, 17:25
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The international study group on data preservation in high energy physics, DPHEP, achieved a milestone in 2012 with the publication of its eagerly anticipated large scale report, which contains a description of data preservation activities from all major high energy physics collider-based experiments and laboratories. A central message of the report is that data preservation in HEP is not...
Mike Hildreth
(University of Notre Dame (US))
14/10/2013, 17:48
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Data and Software Preservation for Open Science (DASPOS), represents a first attempt to establish a formal collaboration tying together physicists from the CMS and ATLAS experiments at the LHC and the Tevatron experiments with experts in digital curation, heterogeneous high-throughput storage systems, large-scale computing systems, and grid access and infrastructure. Recently funded by the...
Martin Philipp Hellmich
(University of Edinburgh (GB))
15/10/2013, 13:30
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Recent developments, including low power devices, cluster file systems and cloud storage, represent an explosion in the possibilities for deploying and managing grid storage. In this paper we present how different technologies can be leveraged to build a storage service with differing cost, power, performance, scalability and reliability profiles, using the popular DPM/dmlite storage solution...
Dr
Paul Millar
(Deutsches Elektronen-Synchrotron (DE))
15/10/2013, 13:53
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Storage is a continually evolving environment, with new solutions to both existing problems and new challenges. With over ten years in production use, dCache is also evolving to match this changing landscape. In this paper, we present three areas in which dCache is matching demand and driving innovation.
Providing efficient access to data that maximises both streaming and random-access...
Giacinto Donvito
(Universita e INFN (IT))
15/10/2013, 14:16
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
In this work we will show the testing activity carried on several distributed file-system in order to check the capability of supporting the HEP data analysis
In particular, we focused our attention and our test on HadoopFS, CEPH, and GlusterFS.
All are Open Source software.
HadoopFS is an Apache foundation software and is part of a more general framework, that contains: task...
Andreas Petzold
(KIT - Karlsruhe Institute of Technology (DE))
15/10/2013, 14:39
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The need for storage continues to grow at a dazzling pace and science and society have become dependent on access to digital data. First sites storing an exabyte of data will be reality in a few years. The common storage technology in small and large computer centers continues to be magnetic disks because of their very good price performance ratio. Storage class memory and solid state disk...
Dr
Tony Wildish
(Princeton University (US))
15/10/2013, 15:45
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The data management elements in CMS are scalable, modular, and designed to work together. The main components are PhEDEx, the data transfer and location system; the Dataset Booking System (DBS), a metadata catalogue; and the Data Aggregation Service (DAS), designed to aggregate views and provide them to users and services. Tens of thousands of samples have been cataloged and petabytes of data...
Vincent Garonne
(CERN)
15/10/2013, 16:08
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
Rucio is the next-generation Distributed Data Management (DDM) system benefiting from recent advances in cloud and "Big Data" computing to address HEP experiments scaling requirements. Rucio is an evolution of the ATLAS DDM system Don Quijote 2 (DQ2), which has demonstrated very large scale data management capabilities with more than 140 petabytes spread worldwide across 130 sites, and...
Ilija Vukotic
(University of Chicago (US))
15/10/2013, 16:31
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
In the past year the ATLAS Collaboration has accelerated its program to federate data storage resources using an architecture based on XRootD with its attendant redirection and storage integration services. The main goal of the federation is an improvement in the data access experience for the end user while allowing for more efficient and intelligent use of computing resources by monitoring...
Kenneth Bloom
(University of Nebraska (US))
15/10/2013, 16:54
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
CMS is in the process of deploying an Xrootd based infrastructure to facilitate a global data federation. The services of the federation are available to export data from half the physical capacity and the majority of sites are configured to read data over the federation as a back-up. CMS began with a relatively modest set of use-cases for recovery of failed local file opens, debugging and...
Martin Barisits
(CERN)
15/10/2013, 17:25
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The ATLAS Distributed Data Management system stores more than 140PB of physics data across 100 sites worldwide. To cope with the anticipated ATLAS workload of the coming decade, Rucio, the next-generation data management system has been developed. Replica management, as one of the key aspects of the system, has to satisfy critical performance requirements in order to keep pace with the...
Zbigniew Baranowski
(CERN)
15/10/2013, 17:48
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The Hadoop framework has proven to be an effective and popular approach for dealing with โBig Dataโ and, thanks to its scaling ability and optimised storage access, Hadoop Distributed File System-based projects such as MapReduce or HBase are seen as candidates to replace traditional relational database management systems whenever scalable speed of data processing is a priority. But do these...
Illya Shapoval
(CERN, KIPT),
Marco Clemencic
(CERN)
17/10/2013, 11:00
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The computing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relation between the LHCb Conditions Database (CondDB), which provides versioned, time...
Manuel Giffels
(CERN)
17/10/2013, 11:23
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The Data Bookkeeping Service 3 (DBS 3) provides an improved event meta data catalog for Monte Carlo and recorded data of the CMS (Compact Muon Solenoid) experiment at the Large Hadron Collider (LHC). It provides the necessary information used for tracking datasets, like data processing history, files and runs associated with a given dataset on a scale of about 10^5 datasets and more than 10^7...
Elizabeth Gallas
(University of Oxford (GB))
17/10/2013, 11:46
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The ATLAS Conditions Database, based on the LCG Conditions Database infrastructure, contains a wide variety of information needed in online data taking and offline analysis. The total volume of ATLAS conditions data is in the multi-Terabyte range.
Internally, the active data is divided into 65 separate schemas (each with hundreds of underlying tables) according to overall data taking type,...
Jerome Fulachier
(Centre National de la Recherche Scientifique (FR))
17/10/2013, 12:09
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The โATLAS Metadata Interfaceโ framework (AMI) has been developed in the context of ATLAS, one of the largest scientific collaborations. AMI can be considered to be a mature application, since its basic architecture has been maintained for over 10 years.
In this paper we will briefly describe the architecture and the main uses of the framework within the experiment (Tag Collector for...
Brian Paul Bockelman
(University of Nebraska (US))
17/10/2013, 13:30
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
To efficiently read data over high-latency connections, ROOT-based applications must pay careful attention to user-level usage patterns and the configuration of the I/O layer. Starting in 2010, CMSSW began using and improving several ROOT "best practice" techniques such as enabling the TTreeCache object and avoiding reading events out-of-order. Since then, CMS has been deploying additional...
Dr
Johannes Ebke
(TNG Technology Consulting)
17/10/2013, 13:53
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
In comparison to storing data packed by event, column data stores store event variables or sets of event variables in individual data packs. One well-known example is the CERN ROOT library's TTree, which has a mode where it behaves like a column store. Columnar data stores can offer fast processing of a subset of the event structure or individual variables.
In the experimental Drillbit...
Elizabeth Gallas
(University of Oxford (GB))
17/10/2013, 14:16
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
ATLAS maintains a rich corpus of event-by-event information that provides a global view of virtually all of the billions of events the collaboration has seen or simulated, along with sufficient auxiliary information to navigate to and retrieve data for any event at any production processing stage. ย This unique resource has been employed for a range of purposes, from monitoring, statistics,...
Gancho Dimitrov
(CERN)
17/10/2013, 14:39
Data Stores, Data Bases, and Storage Systems
Oral presentation to parallel session
The ATLAS Distributed Computing (ADC) project delivers production tools and services for ATLAS offline activities such as data placement and data processing on the Grid. The system has been capable of sustaining with high efficiency the needed computing activities during the first run of LHC data taking, and has demonstrated flexibility in reacting promptly to new challenges. Databases are a...