21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Name: 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)
Start: 2015-04-13T09:00:00+09:00
End: 2015-04-17T16:00:00+09:00
Location: OIST

13–17 Apr 2015

OIST

Asia/Tokyo timezone

Session

Track 3 Session

13 Apr 2015, 14:00

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495

Track 3 Session: #1 (Databases)

Barthelemy Von Haller (CERN)

Track 3 Session: #2 (Databases, Data access protocols)

Laurent Aphecetche (Laboratoire de Physique Subatomique et des Technologies Associe)

Track 3 Session: #3 (Hardware and data archival)

Latchezar Betev (CERN)

Track 3 Session: #4 (Future use cases)

Shaun de Witt (STFC)

Track 3 Session: #5 (Future use cases)

Luca Magnoni (CERN)

There are no materials yet.

5. Designing a future Conditions Database based on LHC experience

Andrea Formica (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR))

13/04/2015, 14:00

Track3: Data store and access

oral presentation

The ATLAS and CMS Conditions Database infrastructures have served each of the respective experiments well through LHC Run 1, providing efficient access to a wide variety of conditions information needed in online data taking and offline processing and analysis. During the long shutdown between Run 1 and Run 2, we have taken various measures to improve our systems for Run 2. In some cases, a...
Go to contribution page
42. Evolution of Database Replication Technologies for WLCG

Zbigniew Baranowski (CERN)

13/04/2015, 14:15

Track3: Data store and access

oral presentation

During LHC run 1 ATLAS and LHCb databases have been using Oracle Streams replication technology for their use cases of data movement between online and offline Oracle databases. Moreover ATLAS has been using Streams to replicate conditions data from CERN to selected Tier 1s. GoldenGate is a new technology introduced by Oracle to replace and improve on Streams, by providing better performance,...
Go to contribution page
76. NoSQL technologies for the CMS Conditions Database

Roland Sipos (Eotvos Lorand University (HU))

13/04/2015, 14:30

Track3: Data store and access

oral presentation

With the restart of the LHC in 2015, the growth of the CMS Conditions dataset will continue, therefore the need of consistent and highly available access to the Conditions makes a great cause to revisit different aspects of the current data storage solutions. We present a study of alternative data storage backends for the Conditions Databases, by evaluating some of the most popular NoSQL...
Go to contribution page
328. Evaluation of NoSQL databases for DIRAC monitoring and beyond

Federico Stagni (CERN)

13/04/2015, 14:45

Track3: Data store and access

oral presentation

Nowadays, many database systems are available but they may not be optimized for storing time series data. The DIRAC job monitoring is a typical use case of such time series. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series is not...
Go to contribution page
115. Studies of Big Data meta-data segmentation between relational and non-relational databases

Ms Marina Golosova (National Research Centre "Kurchatov Institute")

13/04/2015, 15:00

Track3: Data store and access

oral presentation

In recent years the concepts of Big Data became well established in IT-technologies. Most systems (for example Distributed Data Management or Workload Management systems) produce metadata that describes actions performed on jobs, stored data or other entities and its volume takes one to the realms of Big Data on many occasions. This metadata can be used to obtain information about the current...
Go to contribution page
203. Evolution of ATLAS conditions data and its management for LHC run-2

Michael Boehler (Albert-Ludwigs-Universitaet Freiburg (DE))

13/04/2015, 15:15

Track3: Data store and access

oral presentation

The ATLAS detector consists of several sub-detector systems. Both data taking and Monte Carlo (MC) simulation rely on an accurate description of the detector conditions from every sub system, such as calibration constants, different scenarios of pile-up and noise conditions, size and position of the beam spot, etc. In order to guarantee database availability for critical online applications...
Go to contribution page
208. The ATLAS EventIndex: architecture, design choices, deployment and first operation experience.

Dr Dario Barberis (Università e INFN Genova (IT))

13/04/2015, 15:30

Track3: Data store and access

oral presentation

The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on...
Go to contribution page
222. Distributed Data Collection for the ATLAS EventIndex.

Javier Sanchez (Instituto de Fisica Corpuscular (ES))

13/04/2015, 15:45

Track3: Data store and access

oral presentation

The ATLAS EventIndex contains records of all events processed by ATLAS, in all processing stages. These records include the references to the files containing each event (the GUID of the file) and the internal “pointer” to each event in the file. This information is collected by all jobs that run at Tier-0 or on the Grid and process ATLAS events. Each job produces a snippet of information for...
Go to contribution page
497. The Belle II Conditions Database

Marko Bracko (Jozef Stefan Institute (SI))

13/04/2015, 16:30

Track3: Data store and access

oral presentation

The Belle II experiment, a next-generation B factory experiment at the KEK laboratory, Tsukuba, Japan, is expected to collect an experimental data sample fifty times larger than its predecessor, the Belle experiment. The data taking and processing rates are expected to be at least one order of magnitude larger as well. In order to cope with these large data processing rates and huge data...
Go to contribution page
324. Federating LHCb datasets using the Dirac File Catalog

Christophe Haen (CERN)

13/04/2015, 16:45

Track3: Data store and access

oral presentation

In the distributed computing model of LHCb the File Catalog (FC) is a central component that keeps track of each file and replica stored on the Grid. It is federating the LHCb data files in a logical namespace used by all LHCb applications. As a replica catalog, it is used for brokering jobs to sites where their input data is meant to be present, but also by jobs for finding alternative...
Go to contribution page
524. Experience in running relational databases on clustered storage

Ruben Domingo Gaspar Aparicio (CERN)

13/04/2015, 17:00

Track3: Data store and access

oral presentation

CERN IT-DB group is migrating its storage platform, mainly NetApp NAS’s running on 7-mode but also SAN arrays, to a set of NetApp C-mode clusters. The largest one is made of 14 controllers and it will hold a range of critical databases from administration to accelerators control or experiment control databases. This talk shows our setup: network, monitoring, use of features like transparent...
Go to contribution page
24. Operational Experience Running Hadoop XRootD Fallback

Jeffrey Michael Dost (Univ. of California San Diego (US))

13/04/2015, 17:15

Track3: Data store and access

oral presentation

In April of 2014, the UCSD T2 Center deployed hdfs-xrootd-fallback, a UCSD-developed software system that interfaces Hadoop with XRootD to increase reliability of the Hadoop file system. The hdfs-xrootd-fallback system allows a site to depend less on local file replication and more on global replication provided by the XRootD federation to ensure data redundancy. Deploying the software has...
Go to contribution page
443. Engineering the CernVM-FileSystem as a High Bandwidth Distributed Filesystem for Auxiliary Physics Data

Jakob Blomer (CERN)

13/04/2015, 17:30

Track3: Data store and access

oral presentation

Fermilab has several physics experiments including NOvA, MicroBooNE, and the Dark Energy Survey that have computing grid-based applications that need to read from a shared set of data files. We call this type of data Auxiliary data to distinguish it from (a) Event data which tends to be different for every job, and (b) Conditions data which tends to be the same for each job in a batch of...
Go to contribution page
157. New data access with HTTP/WebDAV in the ATLAS experiment

Johannes Elmsheuser (Ludwig-Maximilians-Univ. Muenchen (DE))

13/04/2015, 17:45

Track3: Data store and access

oral presentation

With the exponential growth of LHC (Large Hadron Collider) data in the years 2010-2012, distributed computing has become the established way to analyze collider data. The ATLAS experiment Grid infrastructure includes more than 130 sites worldwide, ranging from large national computing centres to smaller university clusters. So far the storage technologies and access protocols to the clusters...
Go to contribution page
171. ATLAS I/O Performance Optimization in As-Deployed Environments

Thomas Maier (Ludwig-Maximilians-Univ. Muenchen (DE))

13/04/2015, 18:00

Track3: Data store and access

oral presentation

I/O is a fundamental determinant in the overall performance of physics analysis and other data-intensive scientific computing. It is, further, crucial to effective resource delivery by the facilities and infrastructure that support data-intensive science. To understand I/O performance, clean measurements in controlled environments are essential, but effective optimization requires as well an...
Go to contribution page
188. Protocol benchmarking for HEP data access using HTTP and Xrootd

Oliver Keeble (CERN)

13/04/2015, 18:15

Track3: Data store and access

oral presentation

The DPM project offers an excellent opportunity for comparative testing of the HTTP and xroot protocols for data analysis. - The DPM storage itself is multi-protocol, allowing comparisons to be performed on the same hardware - The DPM has been instrumented to produce an i/o monitoring stream, familiar from the xrootd project, regardless of the protocol being used for access - The...
Go to contribution page
2. Mean PB to Failure -- Initial results from a long-term study of disk storage patterns at the RACF

Christopher Hollowell (Brookhaven National Laboratory)

14/04/2015, 16:30

Track3: Data store and access

oral presentation

The RACF (RHIC-ATLAS Computing Facility) has operated a large, multi-purpose dedicated computing facility since the mid-1990's, serving a worldwide, geographically diverse scientific community that is a major contributor to various HEPN projects. A central component of the RACF is the Linux-based worker node cluster that is used for both computing and data storage purposes. It currently has...
Go to contribution page
59. Experiences and challenges running CERN's high-capacity tape archive

Eric Cano (CERN)

14/04/2015, 16:45

Track3: Data store and access

oral presentation

CERN’s tape-based archive system has collected over 70 Petabytes of data during the first run of the LHC. The Long Shutdown is being used for migrating the complete 100 Petabytes data archive to higher-density tape media. During LHC Run 2, the archive will have to cope with yearly growth rates of up to 40-50 Petabytes. In this contribution, we will describe the scalable architecture for...
Go to contribution page
462. Archiving Scientific Data outside of the traditional High Energy Physics Domain, using the National Archive Facility at Fermilab

Dr Andrew Norman (Fermilab)

14/04/2015, 17:00

Track3: Data store and access

oral presentation

Many experiments in the HEP and Astrophysics communities generate large extremely valuable datasets, which need to be efficiently cataloged and recorded to archival storage. These datasets, both new and legacy, are often structured in a manner that is not conducive to storage and cataloging with modern data handling systems and large file archive facilities. In this paper we discuss in...
Go to contribution page
228. Data preservation for the HERA Experiments @ DESY using dCache technology

Karsten Schwank (DESY)

14/04/2015, 17:15

Track3: Data store and access

oral presentation

We report on the status of the data preservation project at DESY for the HERA experiments and present the latest design of the storage which is a central element for bit-preservation. The HEP experiments based at the HERA acceleerator at DESY collected large and unique datasets during the period 1992 to 2007. In addition, corresponding Monte Carlo simulation datasets were produced, which...
Go to contribution page
259. Deep Storage for Big Scientific Data

David Yu (BNL)

14/04/2015, 17:30

Track3: Data store and access

oral presentation

Brookhaven National Lab (BNL)’s RHIC and Atlas Computing Facility (RACF), is supporting science experiments such as RHIC as its Tier-0 center and the U.S. ATLAS/LHC as a Tier-1 center. Scientific data is still growing exponentially after each upgrade. The RACF currently manages over 50 petabytes of data on robotic tape libraries, and we expect a 50% increase in data next year. Not only do we...
Go to contribution page
323. Disk storage at CERN

Luca Mascetti (CERN)

14/04/2015, 17:45

Track3: Data store and access

oral presentation

CERN IT DSS operates the main storage resources for data taking and physics analysis mainly via three system: AFS, CASTOR and EOS. The total usable space available for users is about 100 PB (with relative ratios 1:20:120). EOS deploys disk resources across the two CERN computer centres (Meyrin and Wigner) with a current ratio 60% to 40%. IT DSS is also providing sizable on-demand resources for...
Go to contribution page
298. Archiving tools for EOS

Mr Andreas Joachim Peters (CERN)

14/04/2015, 18:00

Track3: Data store and access

oral presentation

Archiving data to tape is a critical operation for any storage system, especially for the EOS system at CERN which holds production data from all major LHC experiments. Each collaboration has an allocated quota it can use at any given time therefore, a mechanism for archiving "stale" data is needed so that storage space is reclaimed for online analysis operations. The archiving tool that we...
Go to contribution page
303. Disk storage management for LHCb based on Data Popularity estimator

Mikhail Hushchyn (Moscow Institute of Physics and Technology, Moscow)

14/04/2015, 18:15

Track3: Data store and access

oral presentation

The amount of data produced by the LHCb experiment every year consists of several petabytes. This data is kept on disk and tape storage systems. Disks are much faster than tapes, but are way more expensive and hence disk space is limited. It is impossible to fit the whole data taken during the experiment's lifetime on disk, but fortunately fast access to datasets are no longer needed after the...
Go to contribution page
75. Architectures and methodologies for future deployment of multi-site Zettabyte-Exascale data handling platforms

Manuel Delfino Reznicek (Universitat Autònoma de Barcelona (ES))

16/04/2015, 09:00

Track3: Data store and access

oral presentation

Several scientific fields, including Astrophysics, Astroparticle Physics, Cosmology, Nuclear and Particle Physics, and Research with Photons, are estimating that by the 2020 decade they will require data handling systems with data volumes approaching the Zettabyte distributed amongst as many as 10¹⁸ individually addressable data objects (Zettabyte-Exascale systems). It may be...
Go to contribution page
498. Architecture of a new data taking and analysis infrastructure and services for the next generation detectors of Petra3 at DESY

Martin Gasthuber (Deutsches Elektronen-Synchrotron (DE))

16/04/2015, 09:15

Track3: Data store and access

oral presentation

Data taking and analysis infrastructures in HEP have evolved during many years to a well known problem domain. In contrast to HEP, third generations synchrotron light sources, existing and upcoming free electron laser are confronted an explosion in data rates which is primarily driven by recent developments in 2D pixel array detectors. The next generation will produce data in the region...
Go to contribution page
234. dCache, evolution by tackling new challenges.

Dr Patrick Fuhrmann (DESY)

16/04/2015, 09:30

Track3: Data store and access

oral presentation

With the great success of the dCache Storage Technology in the framework of the World Wide LHC Computing Grid, an increasing number of non HEP communities were attracted to use dCache for their data management infrastructure. As a natural consequence, the dCache team was presented with new use-cases that stimulated the development of interesting dCache features. Perhaps the most important...
Go to contribution page
285. dCache, Sync-and-Share for Big Data

Dr Paul Millar (Deutsches Elektronen-Synchrotron (DE))

16/04/2015, 09:45

Track3: Data store and access

oral presentation

The availability of cheap, easy-to-use sync-and-share cloud services has split the scientific storage world into the traditional big data management systems and the very attractive sync-and-share services. With the former, the location of data is well understood while the latter is mostly operated in the Cloud, resulting in a rather complex legal situation. Beside legal issues, those two...
Go to contribution page
296. EOS as the present and future solution for data storage at CERN

Mr Andreas Joachim Peters (CERN)

16/04/2015, 10:00

Track3: Data store and access

oral presentation

EOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC and non-LHC experiments and life-cycle management using JBOD hardware for multi PB storage installations. The EOS design implies a split of hot and cold storage and introduced a change of the traditional HSM functionality based workflows...
Go to contribution page
128. Pooling the resources of the CMS Tier-1 sites

Christoph Wissing (Deutsches Elektronen-Synchrotron (DE))

16/04/2015, 10:15

Track3: Data store and access

oral presentation

The CMS experiment at the LHC relies on 7 Tier-1 centres of the WLCG to perform the majority of its bulk processing activity, and to archive its data. During the first run of the LHC, these two functions were tightly coupled as each Tier-1 was constrained to process only the data archived on its hierarchical storage. This lack of flexibility in the assignment of processing workflows...
Go to contribution page
22. Enabling Object Storage via shims for Grid Middleware

Dr Samuel Cadellin Skipsey

16/04/2015, 11:00

Track3: Data store and access

oral presentation

The *Object Store* model has quickly become the de-facto basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in object store design are similar, but not identical, to concepts in the design of Grid Storage Elements,...
Go to contribution page
402. POSIX and Object Distributed Storage system – performance Comparison studies and real-life usage in an experimental data taking context leveraging OpenStack/Ceph.

Mr Michael Poat (Brookhaven National Laboratory)

16/04/2015, 11:15

Track3: Data store and access

oral presentation

The STAR online computing environment is an intensive ever-growing system used for first-hand data collection and analysis. As systems become more sophisticated, they result in a more detailed dense collection of data output and inefficient limited storage systems have become an impediment to fast feedback to the online shift crews relying on data processing at near real-time speed. Motivation...
Go to contribution page
23. Current Status of the Ceph Based Storage Systems at the RACF

Dr Hironori Ito (Brookhaven National Laboratory (US))

16/04/2015, 11:30

Track3: Data store and access

oral presentation

Ceph based storage solutions are becoming increasingly popular within the HEP/NP community over the last few years. With the current status of the Ceph project, both its object storage and block storage layers are production ready on a large scale, and even the Ceph file system (CephFS) storage layer is rapidly getting to that state as well. This contribution contains a thorough review of...
Go to contribution page
287. Ceph-based storage services for Run2 and beyond

Mr Andreas Joachim Peters (CERN)

16/04/2015, 11:45

Track3: Data store and access

oral presentation

In 2013, CERN IT evaluated then deployed a petabyte-scale Ceph cluster to support OpenStack use-cases in production. As of fall 2014, this cluster stores around 300 TB of data comprising more than a thousand VM images and a similar number of block device volumes. With more than a year of smooth operations, we will present our experience and tuning best-practices. Beyond the cloud storage...
Go to contribution page
297. Integrating CEPH in EOS

Mr Andreas Joachim Peters (CERN)

16/04/2015, 12:00

Track3: Data store and access

oral presentation

The EOS storage software was designed to cover CERN disk-only storage use cases in the medium-term trading scalability against latency. To cover and prepare for long-term requirements the CERN IT data and storage services group (DSS) is actively conducting R&D and open source contributions to experiment with a next generation storage software based on CEPH. CEPH provides a scale-out object...
Go to contribution page

Building timetable...

21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Session

Track 3 Session

OIST

Conveners

Track 3 Session: #1 (Databases)

Track 3 Session: #2 (Databases, Data access protocols)

Track 3 Session: #3 (Hardware and data archival)

Track 3 Session: #4 (Future use cases)

Track 3 Session: #5 (Future use cases)

Description

Presentation materials

Choose timezone

21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Conveners

Track 3 Session: #1 (Databases)

Track 3 Session: #2 (Databases, Data access protocols)

Track 3 Session: #3 (Hardware and data archival)

Track 3 Session: #4 (Future use cases)

Track 3 Session: #5 (Future use cases)

Description

Presentation materials