David Adams
(BNL)
13/02/2006, 14:00
Distributed Data Analysis
oral presentation
DIAL is a generic framework for distributed analysis. The heart of the system is a
scheduler (also called analysis service) that receives high-level processing requests
expressed in terms of an input dataset and a transformation to act on that dataset.
The scheduler splits the dataset, applies the transformation to each subdataset to
produce a new subdataset, and then merges these to...
Dr
Jรถrn Adamczewski
(GSI)
13/02/2006, 14:20
Distributed Data Analysis
oral presentation
The new version 3 of the ROOT based GSI standard analysis framework GO4 (GSI Object
Oriented Online Offline) has been released. GO4 provides multithreaded remote
communication between analysis process and GUI process, a dynamically configurable
analysis framework, and a Qt based GUI with embedded ROOT graphics.
In the new version 3 a new internal object manager was developed. Its...
Dr
Gennady KUZNETSOV
(Rutherford Appleton Laboratory, Didcot)
13/02/2006, 14:40
Distributed Data Analysis
oral presentation
DIRAC is the LHCb Workload and Data Management system used for Monte Carlo
production, data processing and distributed user analysis. Such a wide variety of
applications requires a general approach to the tasks of job definition,
configuration and management.
In this paper, we present a suite of tools called a Production Console, which is a
general framework for job formulation,...
Gerardo GANIS
(CERN)
13/02/2006, 15:00
Distributed Data Analysis
oral presentation
The Parallel ROOT Facility, PROOF, enables the interactive analysis of distributed
data sets in a transparent way. It exploits the inherent parallelism in data of
uncorrelated events via a multi-tier architecture that optimizes I/O and CPU
utilization in heterogeneous clusters with distributed storage. Being part of the
ROOT framework PROOF inherits the benefits of a performant...
Caitriana Nicholson
(University of Glasgow)
13/02/2006, 16:00
Distributed Data Analysis
oral presentation
Simulations have been performed with the grid simulator OptorSim using the expected
analysis patterns from the LHC experiments and a realistic model of the LCG at LHC
startup, with thousands of user analysis jobs running at over a hundred grid sites.
It is shown, first, that dynamic data replication plays a significant role in the
overall analysis throughput in terms of optimising job...
Dr
Donatella Lucchesi
(INFN Padova), Dr
Francesco Delli Paoli
(INFN Padova)
13/02/2006, 16:20
Distributed Data Analysis
oral presentation
The CDF experiment has a new trigger which selects events depending on the
significance of the track impact parameters. With this trigger a sample of events
enriched of b and c mesons has been selected and it is used for several important
physics analysis like the Bs mixing. The size of the dataset is of about 20 TBytes
corresponding to an integrated luminosity of 1 fb-1 collected by CDF....
Valeria Bartsch
(FERMILAB / University College London)
13/02/2006, 16:40
Distributed Data Analysis
oral presentation
SAM is a data handling system that provides Fermilab HEP experiments of D0, CDF and
MINOS with the means to catalog, distribute and track the usage of their collected
and analyzed data. Annually, SAM serves petabytes of data to physics groups
performing data analysis, data reconstruction and simulation at various computing
centers across the world. Given the volume of the detector data, a...
John Huth
(Harvard University)
13/02/2006, 17:00
Distributed Data Analysis
oral presentation
The ATLAS experiment uses a tiered data Grid architecture that enables possibly
overlapping subsets, or replicas, of original datasets to be located across the ATLAS
collaboration. Many individual elements of these datasets can also be recreated
locally from scratch based on a limited number of inputs. We envision a time when a
user will want to determine which is more expedient,...
Dr
Julia Andreeva
(CERN)
13/02/2006, 17:20
Distributed Data Analysis
oral presentation
The ARDA project focuses in delivering analysis prototypes together with the LHC
experiments. The ARDA/CMS activity delivered a fully-functional analysis prototype
exposed to a pilot community of CMS users. The current integration work of key
components into the CMS system is described: the activity focuses on providing a
coherent monitor layer where information from diverse sources...
Dr
Massimo Lamanna
(CERN)
14/02/2006, 14:00
Distributed Data Analysis
oral presentation
The ARDA project focuses in delivering analysis prototypes together with the LHC
experiments. Each experiment prototype is in principle independent but commonalities
have been observed. The first level of commonality is represented by mature projects
which can be effectively shared across different users. The best example is GANGA,
providing a toolkit to organize usersโ activity,...
Mr
stuart WAKEFIELD
(Imperial College, University of London, London, UNITED KINGDOM)
14/02/2006, 14:20
Distributed Data Analysis
oral presentation
BOSS (Batch Object Submission System) has been developed to provide logging and
bookkeeping and real-time monitoring of jobs submitted to a local farm or a grid
system. The information is persistently stored in a relational database for further
processing. By means of user-supplied filters, BOSS extracts the specific job
information to be logged from the standard streams of the job itself...
Mr
Giulio Eulisse
(Northeastern University, Boston)
14/02/2006, 14:40
Distributed Data Analysis
oral presentation
We describe how a new programming paradigm dubbed AJAX (Asynchronous Javascript and
XML) has enabled us to develop highly-performant web-based graphics applications.
Specific examples are shown of our web clients for: CMS Event Display (real-time
Cosmic Challenge), remote detecotr monitoring with ROOT displays, and performat 3D
displays of GEANT4 descriptions of LHC detectors. The...
Mr
Stuart Paterson
(University of Glasgow / CPPM, Marseille)
14/02/2006, 15:00
Distributed Data Analysis
oral presentation
DIRAC is the LHCb Workload and Data Management system for Monte Carlo simulation,
data processing and distributed user analysis. Using DIRAC, a variety of resources
may be integrated, including individual PC's, local batch systems and the LCG grid.
We report here on the progress made in extending DIRAC for distributed user analysis
on LCG. In this paper we describe the advances in the...
Dr
Dietrich Liko
(CERN)
14/02/2006, 16:00
Distributed Data Analysis
oral presentation
The ATLAS strategy follows a service oriented approach to provide Distributed
Analysis capabilities to its users. Based on initial experiences with an Analysis
service, the ATLAS production system has been evolved to support analysis jobs. As
the ATLAS production system is based on several grid flavours (LCG, OSG and
Nordugrid), analysis jobs will be supported by specific executors on the...
Dr
Isidro Gonzalez Caballero
(Instituto de Fisica de Cantabria (CSIC-UC))
14/02/2006, 16:20
Distributed Data Analysis
oral presentation
A typical HEP analysis in the LHC experiments involves the processing of data
corresponding to several million events, terabytes of information, to be analysed in
the last phases. Currently, processing one million events in a single modern
workstation takes several hours, thus slowing the analysis cycle. The desirable
computing model for a physicist would be closer to a High Performance...
Dr
Conrad Steenberg
(CALIFORNIA INSTITUTE OF TECHNOLOGY)
14/02/2006, 16:40
Distributed Data Analysis
oral presentation
We present the architecture and implementation of a bi-directional system for
monitoring long-running jobs on large computational clusters. JobMon comprises an
asyncronous intra-cluster communication server and a Clarens web service on a head
node, coupled with a job wrapper for each monitored job to provide monitoring
information both periodically and upon request. The Clarens web service...
Mr
Marco Corvo
(Cnaf and Cern)
14/02/2006, 17:00
Distributed Data Analysis
oral presentation
CRAB (Cms Remote Analysis Builder) is a tool, developed by INFN within the CMS
collaboration, which provides to physicists the possibility to analyze large amount
of data exploiting the huge computing power of grid distributed systems. It's
currently used to analyze simulated data needed to prepare the Physics Technical
Design Report. Data produced by CMS are distributed among several...
Oliver Gutsche
(FERMILAB)
15/02/2006, 14:00
Distributed Data Analysis
oral presentation
The CMS computing model provides reconstruction and access to recorded data of the
CMS detector as well as to Monte Carlo (MC) generated data. Due to the increased
complexity, these functionalities will be provided by a tier structure of globally
located computing centers using GRID technologies. In the CMS baseline, user access
to data is provided by the CMS Remote Analysis Builder...
Mr
Ashiq Anjum
(University of the West of England)
15/02/2006, 14:20
Distributed Data Analysis
oral presentation
Results from and progress on the development of a Data Intensive and Network Aware
(DIANA) Scheduling engine primarily for data intensive sciences such as physics
analysis is described. Scientific analysis tasks can involve thousands of
computing, data handling, and network resources and the size of the input and
output files and the amount of overall storage space allotted to a user...
Dr
Ulrik Egede
(IMPERIAL COLLEGE LONDON)
15/02/2006, 14:40
Distributed Data Analysis
oral presentation
Physics analysis of large amounts of data by many users requires the usage of Grid
resources. It is however important that users can see a single environment for
developing and testing algorithms locally and for running on large data samples on
the Grid. The Ganga job wizard, developed by LHCb and ATLAS, provides physicists such
an integrated environment for job preparation, bookkeeping...
Prof.
Kaushik De
(UNIVERSITY OF TEXAS AT ARLINGTON)
15/02/2006, 15:00
Distributed Data Analysis
oral presentation
A new offline processing system for production and analysis, Panda, has been
developed for the ATLAS experiment and deployed in OSG. ATLAS will accrue tens of
petabytes of data per year, and the Panda design is accordingly optimized for data
intensive processing. Its development followed three years of production experience,
the lessons from which drove a markedly different design for the...
Mr
Pavel JAKL
(Nuclear Physics Inst., Academy of Sciences - Czech Republic)
15/02/2006, 16:00
Distributed Data Analysis
oral presentation
With its increasing data samples, the RHIC/STAR experiment has faced a challenging
data management dilemma: solutions using cheap disks attached to processing nodes
have rapidly become economically beneficial over standard centralized storage. At
the cost of data management, the STAR experiment moved to a multiple component
locally distributed data model rendered viable by the...
Mr
Fabrizio Furano
(INFN sez. di Padova)
15/02/2006, 16:20
Distributed Data Analysis
oral presentation
The latencies induced by network communication often play a big role in reducing the
performances of systems which access big amounts of data in a distributed
environment. The problem is present in Local Area Networks, but in Wide Area Networks
is much more evident. It is generally perceived as a critical problem which makes
very difficult to get access to remote data. However, a more...
Dr
Douglas Smith
(STANFORD LINEAR ACCELERATOR CENTER)
15/02/2006, 16:40
Distributed Data Analysis
oral presentation
For the BaBar Computing Group:
Two years ago, the BaBar experiment changed its event store from an object oriented
database system, to one based on ROOT files. A new bookkeeping system was developed
to manage the meta-data of these files. This system has been in constant use since
that time, and has successfully provided the needed meta-data information for users'
analysis jobs,...
Andrew Hanushevsky
(Stanford Linear Accelerator Center)
15/02/2006, 17:00
Distributed Data Analysis
oral presentation
When the BaBar experiment transitioned to using the Root Framework s new data server
architecture, xrootd, was developed to address event analysis needs. This
architecture was deployed at SLAC two years ago and since then has also been deployed
at other BaBar Tier 1 sites: IN2P3, INFN, FZK, and RAL; as well as other non-BaBar
sites: CERN (Alice), BNL (Star), and Cornell (CLEO). As part of...