Marco Cattaneo
(CERN)
14/10/2013, 13:30
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The LHCb experiment has taken data between December 2009 and February 2013. The data taking conditions and trigger rate have been adjusted several times to make optimal use of the luminosity delivered by the LHC and to extend the physics potential of the experiment.
By 2012, LHCb was taking data at twice the instantaneous luminosity and 2.5 times the high level trigger rate than originally...
Thomas Kuhr
(KIT)
14/10/2013, 13:52
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The Belle II experiment, a next-generation B factory experiment at KEK, is expected to record a two orders of magnitude larger data volume than its predecessor, the Belle experiment. The data size and rate are comparable to or more than the ones of LHC experiments and requires to change the computing model from the Belle way, where basically all computing resources were provided by KEK, to a...
Simone Campana
(CERN)
14/10/2013, 14:14
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The ATLAS Distributed Computing project (ADC) was established in 2007 to
develop and operate a framework, following the ATLAS computing model, to enable
data storage, processing and bookkeeping on top of the WLCG distributed
infrastructure. ADC development has always been driven by operations and this
contributed to its success. The system has fulfilled the demanding requirements of...
Claudio Grandi
(INFN - Bologna)
14/10/2013, 14:36
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The CMS Computing Model was developed and documented in 2004. Since then the model has evolved to be more flexible and to take advantage of new techniques, but many of the original concepts remain and are in active use. In this presentation we will discuss the changes planned for the restart of the LHC program in 2015. We will discuss the changes planning in the use and definition of the...
Dr
Antonio Maria Perez Calero Yzquierdo
(Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
14/10/2013, 15:45
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
In the next years, processor architectures based on much larger numbers of cores will be most likely the model to continue "Moore's Law" style throughput gains. This not only results in many more jobs in parallel running the LHC Run 1 era monolithic applications. Also the memory requirements of these processes push the workernode architectures to the limit. One solution is parallelizing the...
Maxim Potekhin
(Brookhaven National Laboratory (US))
14/10/2013, 16:07
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The ATLAS Production System is the top level workflow manager which translates physicists' needs for production level processing into actual workflows executed across about a hundred processing sites used globally by ATLAS. As the production workload increased in volume and complexity in recent years (the ATLAS production tasks count is above one million, with each task containing hundreds or...
Tadashi Maeno
(Brookhaven National Laboratory (US))
14/10/2013, 16:29
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
An important foundation underlying the impressive success of data processing and analysis in the ATLAS experiment at the LHC is the Production and Distributed Analysis (PanDA) workload management system. PanDA was designed specifically for ATLAS and proved to be highly successful in meeting all the distributed computing needs of the experiment. However, the core design of PanDA is not...
Dr
Michael Kirby
(Fermi National Accelerator Laboratory)
14/10/2013, 16:51
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The Fabric for Frontier Experiments (FIFE) project is a new far-reaching, major-impact initiative within the Fermilab Scientific Computing Division to drive the future of computing services for Fermilab Experiments. It is a collaborative effort between computing professionals and experiment scientists to produce an end-to-end, fully integrated set of services for computing on the grid and...
Michail Salichos
(CERN)
14/10/2013, 17:25
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
FTS is the service responsible for distributing the majority of LHC data across the WLCG infrastructure. From the experiences of the last decade supporting and monitoring FTS, reliability, robustness and
high-performance data transfers has proved to be of high importance in the Data Management world. We are going to present the current status and features of the new File Transfer Service...
Brian Van Klaveren
(SLAC)
14/10/2013, 17:47
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The SLAC Computing Applications group (SCA) has developed a general
purpose data catalog framework, initially for use by the Fermi Gamma-Ray
Space Telescope, and now in use by several other experiments. The main
features of the data catalog system are:
* Ability to organize datasets in a virtual hierarchy without regard to
physical location or access protocol
* Ability to catalog...
Christopher Jung
(KIT - Karlsruhe Institute of Technology (DE))
15/10/2013, 13:30
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
Data play a central role in most fields of Science. In recent years, the amount of data from experiment, observation, and simulation has increased rapidly and the data complexity has grown. Also, communities and shared storage have become geographically more distributed. Therefore, methods and techniques applied for scientific data need to be revised and partially be replaced, while keeping...
Mrs
Tanya Levshina
(FERMILAB)
15/10/2013, 13:52
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The Open Science Grid (OSG) Public Storage project is focused on improving and simplifying the management of OSG Storage. Currently, OSG doesnโt provide efficient means to manage public storage offered by participating sites. A Virtual Organization (VO) that relies on opportunistic storage has difficulties finding appropriate storage, verifying its availability, and monitoring its...
Dr
Robert Illingworth
(Fermilab)
15/10/2013, 14:14
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
Fermilab Intensity Frontier experiments such as Minerva, NOvA, and MicroBooNE are now using an improved version of the Fermilab SAM data handling system. SAM was originally used by the CDF and D0 experiments for Run II of the Fermilab Tevatron to provide file metadata and location cataloguing, uploading of new files to tape storage, dataset management, file transfers between global processing...
Dr
Adam Lyon
(Fermilab)
15/10/2013, 14:36
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
IFDH (Intensity Fronter Data Handling), is a suite of tools for data movement tasks for Fermilab experiments and is an important part of the FIFE (Fabric for Frontier Experiments) initiative described at this conference. IFDH encompasses moving input data from caches or storage elements to compute nodes (the "last mile" of data movement) and moving output data potentially to those caches as...
Dr
Simon Patton
(LAWRENCE BERKELEY NATIONAL LABORATORY)
15/10/2013, 15:45
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
In March 2012 the Dayabay Neutrino Experiment published the first measurement of the theta_13 mixing angle. The publication of this result occurred 20 days after the last data that appeared in the paper was taken, during which time normal data taking and processing was continuing. This achievement used over forty thousand 'core hours' of CPU and handled eighteen thousand files totaling 16 TBs....
Juan Carlos Diaz Velez
(University of Wisconsin-Madison)
15/10/2013, 16:07
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
IceProd is a data processing and management framework developed by IceCube Neutrino Observatory for processing of Monte Carlo simulations and data. IceProd runs as a separate layer on top of middleware and can take advantage of a variety of computing resources including grids and batch systems such as GLite, Condor, NorduGrid, PBS and SGE. This is accomplished by a set of dedicated daemons...
Graeme Andrew Stewart
(CERN)
15/10/2013, 16:29
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The need to run complex workflows for a high energy physics experiment such as ATLAS has always been present. However, as computing resources have become even more constrained, compared to the wealth of data generated by the LHC, the need to use resources efficiently and manage complex workflows within a single grid job have increased.
In ATLAS, a new Job Transform framework has been...
Donald Petravick
(U)
15/10/2013, 16:51
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The Dark Energy Survey (DES) is designed to probe the origin of the
accelerating universe and help uncover the nature of dark energy by
measuring the 14-billion-year history of cosmic expansion with high
precision. More than 120 scientists from 23 institutions in the United
States, Spain, the United Kingdom, Brazil, Switzerland and and Germany
are working on the project. This...
Elisabetta Vilucchi
(Laboratori Nazionali di Frascati (LNF) - Istituto Nazionale di F)
15/10/2013, 17:25
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
In the ATLAS computing model Grid resources are managed by the PanDA system, a data-driven workload management system designed for production and distributed analysis. Data are stored under various formats in ROOT files and end-user physicists have the choice to use either the ATHENA framework or directly ROOT. The ROOT way to analyze data provide users the possibility to use PROOF to exploit...
Mr
Igor Sfiligoi
(University of California San Diego)
15/10/2013, 17:47
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The User Analysis of the CMS experiment is performed in distributed way using both Grid and dedicated resources. In order to insulate the users from the details of computing fabric, CMS relies on the CRAB (CMS Remote Analysis Builder) package as an abstraction layer. CMS has recently switched from a client-server version of CRAB to a purely client-based solution, with ssh being used to...
Oliver Gutsche
(Fermi National Accelerator Lab. (US))
17/10/2013, 11:00
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
During the first run, CMS collected and processed more than 10B data events and simulated more than 15B events. Up to 100k processor cores were used simultaneously and 100PB of storage was managed. Each month petabytes of data were moved and hundreds of users accessed data samples. In this presentation we will discuss the operational experience from the first run. We will present the workflows...
Wolfgang Ehrenfeld
(Universitaet Bonn (DE))
17/10/2013, 11:22
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
In this presentation we will review the ATLAS Monte Carlo production setup including the different production steps involved in full and fast detector simulation. A report on the Monte Carlo production campaigns during Run 1 and Long Shutdown 1 will be presented, including details on various performance aspects. Important improvements in the workflow and software will be...
Dr
Andrei Tsaregorodtsev
(Centre National de la Recherche Scientifique (FR))
17/10/2013, 11:44
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
DIRAC is a framework for building general purpose distributed computing systems. It was developed originally for the LHCb HEP experiment at CERN and now it is used in several other HEP and astrophysics experiments as well as for user communities in other scientific domains.
There is a large interest from smaller user communities to have a simple to use tool for accessing grid and other...
Igor Sfiligoi
(University of California San Diego)
17/10/2013, 12:06
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
The computing landscape is moving at an accelerated pace to many-core computing.
Nowadays, it is not unusual to get 32 cores on a single physical node.
As a consequence, there is increased pressure in the pilot systems domain to move from purely single-core scheduling and allow multi-core jobs as well.
In order to allow for a gradual transition from single-core to multi-core user jobs, it...
Paul James Laycock
(University of Liverpool (GB))
17/10/2013, 13:30
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
While a significant fraction of ATLAS physicists directly analyse the AOD (Analysis Object Data) produced at the CERN Tier 0, a much larger fraction have opted to analyse data in a flat ROOT format. The large scale production of this Derived Physics Data (DPD) format must cater for both detailed performance studies of the ATLAS detector and object reconstruction, as well as higher level and...
Federica Legger
(Ludwig-Maximilians-Univ. Muenchen (DE))
17/10/2013, 13:52
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
In the LHC operations era, analysis of the multi-petabyte ATLAS data sample by globally distributed physicists is a challenging task. To attain the required scale the ATLAS Computing Model was designed around the concept of grid computing, realized in the Worldwide LHC Computing Grid (WLCG), the largest distributed computational resource existing in the sciences. ATLAS currently stores over...
Marco Mascheroni
(Universita & INFN, Milano-Bicocca (IT))
17/10/2013, 14:14
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
ATLAS, CERN-IT, and CMS embarked on a project to develop a common system for submitting analysis jobs to the distributed computing infrastructure based on elements of PANDA. After an extensive feasibility study and development of a proof-of-concept prototype, the project has a basic infrastructure that can be used to support the analysis use case of both experiments with common services. In...
Mr
Wataru Takase
(High Energy Accelerator Research Organization (KEK), Japan)
17/10/2013, 14:36
Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models
Oral presentation to parallel session
In this paper we report on the setup, deployment and operation of a low-maintenance, policy-driven distributed data management system for scientific data based on the integrated Rule Oriented Data System (iRODS). The system is located at KEK, Tsukuba, Japan with a satellite system at
QMUL, London, UK. The system has been running stably in production for
more than two years with minimal...