Conveners
Distributed data analysis and information management: DD 1
- Roger Jones (Lancaster University)
Distributed data analysis and information management: DD 2
- Roger Jones (Lancaster University)
Distributed data analysis and information management: DD 3
- Michael Ernst (BNL)
Distributed data analysis and information management: DD 4
- Ian Fisk (FNAL)
Distributed data analysis and information management: DD 5
- Roger Jones (Lancaster University)
Distributed data analysis and information management: DD 6
- Roger Jones (Lancaster University)
Dr
Andrew Maier
(CERN)
03/09/2007, 14:00
Distributed data analysis and information management
oral presentation
Ganga, the job-management system (http://cern.ch/ganga), developed as an ATLAS- LHCb common project,
offers a simple, efficient and consistent user experience in a variety of heterogeneous environments: from local
clusters to global Grid systems. Ganga helps end-users to organise their analysis activities on the Grid by providing
automatic persistency of the job's metadata. A user has...
Dr
Akram Khan
(Brunel University)
03/09/2007, 14:20
Distributed data analysis and information management
oral presentation
ASAP is a system for enabling distributed analysis for CMS physicists. It was
created with the aim of simplifying the transition from a locally running application
to one that is distributed across the Grid. The experience gained in operating the
system for the past 2 years has been used to redevelop a more robust, performant and
scalable version. ASAP consists of a client for job...
Mr
Jan Fiete Grosse Oetringhaus
(CERN)
03/09/2007, 14:40
Distributed data analysis and information management
oral presentation
ALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - Cern Analysis
Facility) for fast analysis. The system is especially aimed at the prototyping phase of analyses that need a high
number of development iterations and thus desire a short response time. Typical examples are the tuning of cuts
during the development of an analysis as well as...
Dr
Stuart Paterson
(CERN)
03/09/2007, 15:00
Distributed data analysis and information management
oral presentation
The LHCb distributed data analysis system consists of the Ganga job submission
front-end and the DIRAC Workload and Data Management System. Ganga is jointly
developed with ATLAS and allows LHCb users to submit jobs on several backends
including: several batch systems, LCG and DIRAC. The DIRAC API provides a
transparent and secure way for users to run jobs to the Grid and is the default...
Dr
Johannes Elmsheuser
(Ludwig-Maximilians-Universitรคt Mรผnchen)
03/09/2007, 15:20
Distributed data analysis and information management
oral presentation
The distributed data analysis using Grid resources is one of the
fundamental applications in high energy physics to be addressed
and realized before the start of LHC data taking. The needs to
manage the resources are very high. In every experiment up to a
thousand physicist will be submitting analysis jobs into the Grid.
Appropriate user interfaces and helper applications have to be
made...
Mr
Adam Kocoloski
(MIT)
03/09/2007, 15:40
Distributed data analysis and information management
oral presentation
Modern Macintosh computers feature Xgrid, a distributed computing architecture built
directly into Apple's OS X operating system. While the approach is radically
different from those generally expected by the Unix based Grid infrastructures (Open
Science Grid, TeraGrid, EGEE), opportunistic computing on Xgrid is nonetheless a
tempting and novel way to assemble a computing cluster with a...
Leandro Franco
(CERN)
03/09/2007, 16:30
Distributed data analysis and information management
oral presentation
Particle accelerators produce huge amounts of information in every
experiment and such quantity cannot be stored easily in a personal
computer. For that reason, most of the analysis is done using remote
storage servers (this will be particularly true when the Large Hadron
Collider starts its operation in 2007). Seeing how the bandwidth has
increased in the last few years, the biggest...
Lassi Tuura
(Northeastern University)
03/09/2007, 16:50
Distributed data analysis and information management
oral presentation
The CMS experiment will need to sustain uninterrupted high reliability, high throughput and very diverse data
transfer activities as the LHC operations start. PhEDEx, the CMS data transfer system, will be responsible for the
full range of the transfer needs of the experiment. Covering the entire spectrum is a demanding task: from the
critical high-throughput transfers between CERN and...
Dr
Douglas Smith
(Stanford Linear Accelerator Center)
03/09/2007, 17:10
Distributed data analysis and information management
oral presentation
The BaBar high energy experiment has been running for many years now,
and has resulted in a data set of over a petabyte in size, containing
over two million files. The management of this set of data has to
support the requirements of further data production along with a
physics community that has vastly different needs. To support these
needs the BaBar bookkeeping system was developed,...
Andrew Cameron Smith
(CERN)
03/09/2007, 17:30
Distributed data analysis and information management
oral presentation
The LHCb Computing Model describes the dataflow model for all stages in the
processing of real and simulated events and defines the role of LHCb associated Tier1
and Tier2 computing centres. The WLCG โdressed rehearsalโ exercise aims to allow LHC
experiments to deploy the full chain of their Computing Models, making use of all
underlying WLCG services and resources, in preparation for real...
Dr
Roger Jones
(LANCAS)
04/09/2007, 11:00
Distributed data analysis and information management
oral presentation
The ATLAS Computing Model was constructed after early tests and was captured in the ATLAS Computing TDR in
June 2005. Since then, the grid tools and services have evolved and their performance is starting to be understood
through large-scale exercises. As real data taking becomes immanent, the computing model continues to evolve,
with robustness and reliability being the watchwords for...
Dr
Simone Pagan Griso
(University and INFN Padova)
04/09/2007, 11:20
Distributed data analysis and information management
oral presentation
The upgrades of the Tevatron collider and of the CDF detector have considerably
increased the demand on computing resources in particular for Monte Carlo production
for
the CDF experiment. This has forced the collaboration to move beyond the usage of
dedicated resources and start exploiting Grid resources.
The CDF Analysis Farm (CAF) model has been reimplemented into
LcgCAF ...
Dr
Hartmut Stadie
(Universitaet Hamburg)
04/09/2007, 11:40
Distributed data analysis and information management
oral presentation
The detector and collider upgrades for the HERA-II running at DESY have considerably
increased the demand on computing resources for the ZEUS experiment.
To meet the demand, ZEUS commissioned an automated Monte Carlo(MC) production capable
of using Grid resources in November 2004. Since then, more than one billion events
have been simulated and reconstructed on the Grid which corresponds...
Dr
Ashok Agarwal
(University of Victoria)
04/09/2007, 12:00
Distributed data analysis and information management
oral presentation
The present paper highlights the approach used to design and implement a web services
based BaBar Monte Carlo (MC) production grid using Globus Toolkit version 4. The grid
integrates the resources of two clusters at the University of Victoria, using the
ClassAd mechanism provided by the Condor-G metascheduler. Each cluster uses the
Portable Batch System (PBS) as its local resource...
Marco Clemencic
(European Organization for Nuclear Research (CERN))
05/09/2007, 14:00
Distributed data analysis and information management
oral presentation
The LHCb Conditions Database project provides the necessary tools to handle non-event
time-varying data. The main users of conditions are reconstruction and analysis
processes, which are running on the Grid. To allow efficient access to the data, we
need to use a synchronized replica of the content of the database located at the same
site as the event data file, i.e. the LHCb Tier1. The...
Dr
Lee Lueking
(FERMILAB)
05/09/2007, 14:20
Distributed data analysis and information management
oral presentation
The CMS experiment at the LHC has established an infrastructure using the FroNTier
framework to deliver conditions (i.e. calibration, alignment, etc.) data to
processing clients worldwide. FroNTier is a simple web service approach providing
client HTTP access to a central database service. The system for CMS has been
developed to work with POOL which provides object relational mapping...
Alexandre Vaniachine
(Argonne National Laboratory)
05/09/2007, 14:40
Distributed data analysis and information management
oral presentation
In preparation for ATLAS data taking in ATLAS database activities a coordinated shift from
development towards operations has occurred. In addition to development and
commissioning activities in databases, ATLAS is active in the development and deployment
(in collaboration with the WLCG 3D project) of the tools that allow the worldwide
distribution and installation of databases and...
Dr
Douglas Smith
(Stanford Linear Accelerator Center)
05/09/2007, 15:00
Distributed data analysis and information management
oral presentation
There is a need for a large dataset of simulated events for use in
analysis of the data from the BaBar high energy physics experiment.
The largest cycle of this production in the history of the experiment
was just completed in the past year, simulating events against all
detector conditions in the history of the experiment, resulting in over
eleven billion events in eighteen months. ...
Ms
Helen McGlone
(University of Glasgow/CERN)
05/09/2007, 15:20
Distributed data analysis and information management
oral presentation
The ATLAS TAG database is a multi-terabyte event-level metadata selection system,
intended to allow discovery, selection of and navigation to events of interest to an
analysis. The TAG database encompasses file- and relational-database-resident
event-level metadata, distributed across all ATLAS Tiers.
...
Dr
Conrad Steenberg
(Caltech)
05/09/2007, 15:40
Distributed data analysis and information management
oral presentation
We describe how we have used the Clarens Grid Portal Toolkit to develop powerful
application and browser-level interfaces to ROOT and Pythia. The Clarens Toolkit is a
codebase that was initially developed under the auspices of the Grid Analysis
Environment project at Caltech, with the goal of enabling LHC physicists engaged in
analysis to bring the full power of the Grid to their desktops,...
Dr
Lucas Taylor
(Northeastern University, Boston)
06/09/2007, 14:00
Distributed data analysis and information management
oral presentation
The CMS experiment is about to embark on its first physics run at the LHC. To
maximize the effectiveness of physicists and technical experts at CERN and
worldwide and to facilitate their communications, CMS has established several
dedicated and inter-connected operations and monitoring centers. These
include a traditional โControl Roomโ at the CMS site in France, a โCMS Centreโ
for...
Dr
John Kennedy
(LMU Munich)
06/09/2007, 14:20
Distributed data analysis and information management
oral presentation
The ATLAS production system is responsible for the distribution of
O(100,000) jobs per day to over 100 sites worldwide.
The tracking and correlation of errors and resource usage within such a
large distributed system is of extreme importance.
The monitoring system presented here is designed to abstract the
monitoring information away form the central database of jobs....
Dr
Tofigh Azemoon
(Stanford Linear Accelerator Center)
06/09/2007, 14:40
Distributed data analysis and information management
oral presentation
Petascale systems are in existence today and will become widespread in the
next few years. Such systems are inevitably very complex, highly distributed
and heterogeneous. Monitoring a petascale system in real time and
understanding its status at any given moment without impacting its
performance is a highly intricate task. Common approaches and off the shelf
tools are either...
Ricardo Rocha
(CERN)
06/09/2007, 15:00
Distributed data analysis and information management
oral presentation
The ATLAS Distributed Data Management (DDM) system is evolving to
provide a production-quality service for data distribution and data
management support for production and users' analysis.
Monitoring the different components in the system has emerged as one of
the key issues to achieve this goal. Its distributed nature over
different grid infrastructures (EGEE, OSG and NDGF)...
Dr
Fons Rademakers
(CERN)
06/09/2007, 15:20
Distributed data analysis and information management
oral presentation
The goal of PROOF (Parallel ROOt Facility) is to enable interactive
analysis of large data sets in parallel on a distributed cluster or
multi-core machine. PROOF represents a high-performance alternative
to a traditional batch-oriented computing system.
The ALICE collaboration is planning to use PROOF at the CERN Analysis Facility
(CAF) and has been stress testing the system since mid...
Mr
Fabrizio Furano
(INFN sez. di Padova)
06/09/2007, 15:40
Distributed data analysis and information management
oral presentation
HEP data processing and analysis applications typically deal
with the problem of accessing and processing data at high speed.
Recent study, development and test work has shown that the latencies
due to data access can often be hidden by parallelizing them
with the data processing, thus giving the ability
to have applications which process remote data with a high level of...
Dan Flath
(SLAC)
06/09/2007, 16:30
Distributed data analysis and information management
oral presentation
The Data Handling Pipeline ("Pipeline") has been developed for the Gamma-Ray Large
Area Space Telescope (GLAST) launching at the end of 2007. Its goal is to generically
process graphs of dependent tasks, maintaining a full record of its state, history
and data products. In cataloging the relationship between data, analysis results,
software versions, as well as statistics (memory usage,...
Dr
Vitaly Choutko
(Massachusetts Institute of Technology (MIT))
06/09/2007, 16:50
Distributed data analysis and information management
oral presentation
The AMS-02 detector will be installed on ISS ifor at least 3 years. The data will be
transmitted from ISS to NASA Marshall Space Flight Center (MSFC, Huntsvile, Alabama)
and transfered to CERN (Geneva Switzerland) for processing and analysis.
We are presenting the AMS-02 Ground Data Handling scenario and requirements to AMS
ground centers: the Payload Operation and Control Center (POCC)...
Dr
Nicola De Filippis
(INFN Bari)
06/09/2007, 17:10
Distributed data analysis and information management
oral presentation
The Tracker detector has been taking real data with cosmics at the
Tracker Integration Facility (TIF) at CERN.
First DAQ checks and on-line monitoring tasks are executed at the
Tracker Analysis Centre (TAC) which is a dedicated Control Room at TIF with
limited computing resources. A set of software agents were developed
to perform the real-time data conversion in a standard Event...
Mr
Pavel Jakl
(Nuclear Physics Institute, Academy of Sciences of the Czech Republic)
06/09/2007, 17:30
Distributed data analysis and information management
oral presentation
Facing the reality of storage economics, NP experiments such as RHIC/STAR have been
engaged in a shift in the analysis model, and now heavily rely on using cheap disks
attached to processing nodes, as such a model is extremely beneficial over expensive
centralized storage. Additionally, exploiting storage aggregates with enhanced
distributed computing capabilities such as dynamic space...
Daniele Spiga
(Universita degli Studi di Perugia)
06/09/2007, 17:50
Distributed data analysis and information management
oral presentation
Starting from 2007 the CMS experiment will produce several Pbytes of data each
year, to be distributed over many computing centers located in many different
countries. The CMS computing model defines how the data are to be distributed such
that CMS physicists can access them in an efficient manner in order to
perform their physics analyses. CRAB (CMS Remote Analysis Builder) is a...