Choose timezone

Your profile timezone:

Use timezone based on:

Event/category Custom

Select a custom timezone

Login

CHEP 07

2–9 Sept 2007

Victoria, Canada

Europe/Zurich timezone

Please book accomodation as soon as possible.

Support

chep07-support@triumf.ca

Session

Distributed data analysis and information management

DD

3 Sept 2007, 14:00

Victoria, Canada

Victoria, Canada

Distributed data analysis and information management: DD 1

Roger Jones (Lancaster University)

Distributed data analysis and information management: DD 2

Roger Jones (Lancaster University)

Distributed data analysis and information management: DD 3

Michael Ernst (BNL)

Distributed data analysis and information management: DD 4

Ian Fisk (FNAL)

Distributed data analysis and information management: DD 5

Roger Jones (Lancaster University)

Distributed data analysis and information management: DD 6

Roger Jones (Lancaster University)

There are no materials yet.

146. Ganga - a job management and optimising tool

Dr Andrew Maier (CERN)

03/09/2007, 14:00

Distributed data analysis and information management

oral presentation

Ganga, the job-management system (http://cern.ch/ganga), developed as an ATLAS- LHCb common project, offers a simple, efficient and consistent user experience in a variety of heterogeneous environments: from local clusters to global Grid systems. Ganga helps end-users to organise their analysis activities on the Grid by providing automatic persistency of the job's metadata. A user has...

51. ASAP is a system for enabling distributed analysis for the CMS Experiment

Dr Akram Khan (Brunel University)

03/09/2007, 14:20

Distributed data analysis and information management

oral presentation

ASAP is a system for enabling distributed analysis for CMS physicists. It was created with the aim of simplifying the transition from a locally running application to one that is distributed across the Grid. The experience gained in operating the system for the past 2 years has been used to redevelop a more robust, performant and scalable version. ASAP consists of a client for job...

444. The CERN Analysis Facility - A PROOF Cluster for Day-One Physics Analysis

Mr Jan Fiete Grosse Oetringhaus (CERN)

03/09/2007, 14:40

Distributed data analysis and information management

oral presentation

ALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - Cern Analysis Facility) for fast analysis. The system is especially aimed at the prototyping phase of analyses that need a high number of development iterations and thus desire a short response time. Typical examples are the tuning of cuts during the development of an analysis as well as...

176. Distributed Data Analysis in LHCb

Dr Stuart Paterson (CERN)

03/09/2007, 15:00

Distributed data analysis and information management

oral presentation

The LHCb distributed data analysis system consists of the Ganga job submission front-end and the DIRAC Workload and Data Management System. Ganga is jointly developed with ATLAS and allows LHCb users to submit jobs on several backends including: several batch systems, LCG and DIRAC. The DIRAC API provides a transparent and secure way for users to run jobs to the Grid and is the default...

287. Distributed Analysis using GANGA on the EGEE/LCG infrastructure

Dr Johannes Elmsheuser (Ludwig-Maximilians-Universität München)

03/09/2007, 15:20

Distributed data analysis and information management

oral presentation

The distributed data analysis using Grid resources is one of the fundamental applications in high energy physics to be addressed and realized before the start of LHC data taking. The needs to manage the resources are very high. In every experiment up to a thousand physicist will be submitting analysis jobs into the Grid. Appropriate user interfaces and helper applications have to be made...

424. Integrating Xgrid technology into HENP distributed computing model

Mr Adam Kocoloski (MIT)

03/09/2007, 15:40

Distributed data analysis and information management

oral presentation

Modern Macintosh computers feature Xgrid, a distributed computing architecture built directly into Apple's OS X operating system. While the approach is radically different from those generally expected by the Unix based Grid infrastructures (Open Science Grid, TeraGrid, EGEE), opportunistic computing on Xgrid is nonetheless a tempting and novel way to assemble a computing cluster with a...

284. Efficient Access to Remote Data in High Energy Physics

Leandro Franco (CERN)

03/09/2007, 16:30

Distributed data analysis and information management

oral presentation

Particle accelerators produce huge amounts of information in every experiment and such quantity cannot be stored easily in a personal computer. For that reason, most of the analysis is done using remote storage servers (this will be particularly true when the Large Hadron Collider starts its operation in 2007). Seeing how the bandwidth has increased in the last few years, the biggest...

258. Scaling CMS data transfer system for LHC start-up

Lassi Tuura (Northeastern University)

03/09/2007, 16:50

Distributed data analysis and information management

oral presentation

The CMS experiment will need to sustain uninterrupted high reliability, high throughput and very diverse data transfer activities as the LHC operations start. PhEDEx, the CMS data transfer system, will be responsible for the full range of the transfer needs of the experiment. Covering the entire spectrum is a demanding task: from the critical high-throughput transfers between CERN and...

344. Data management in BaBar

Dr Douglas Smith (Stanford Linear Accelerator Center)

03/09/2007, 17:10

Distributed data analysis and information management

oral presentation

The BaBar high energy experiment has been running for many years now, and has resulted in a data set of over a petabyte in size, containing over two million files. The management of this set of data has to support the requirements of further data production along with a physics community that has vastly different needs. To support these needs the BaBar bookkeeping system was developed,...

194. DIRAC: Data Production Management

Andrew Cameron Smith (CERN)

03/09/2007, 17:30

Distributed data analysis and information management

oral presentation

The LHCb Computing Model describes the dataflow model for all stages in the processing of real and simulated events and defines the role of LHCb associated Tier1 and Tier2 computing centres. The WLCG ‘dressed rehearsal’ exercise aims to allow LHC experiments to deploy the full chain of their Computing Models, making use of all underlying WLCG services and resources, in preparation for real...

200. The ATLAS Computing Model

Dr Roger Jones (LANCAS)

04/09/2007, 11:00

Distributed data analysis and information management

oral presentation

The ATLAS Computing Model was constructed after early tests and was captured in the ATLAS Computing TDR in June 2005. Since then, the grid tools and services have evolved and their performance is starting to be understood through large-scale exercises. As real data taking becomes immanent, the computing model continues to evolve, with robustness and reliability being the watchwords for...

297. CDF experience with Monte Carlo production using LCG Grid

Dr Simone Pagan Griso (University and INFN Padova)

04/09/2007, 11:20

Distributed data analysis and information management

oral presentation

The upgrades of the Tevatron collider and of the CDF detector have considerably increased the demand on computing resources in particular for Monte Carlo production for the CDF experiment. This has forced the collaboration to move beyond the usage of dedicated resources and start exploiting Grid resources. The CDF Analysis Farm (CAF) model has been reimplemented into LcgCAF ...

336. ZEUS Grid Usage: Monte Carlo Production and Data Analysis

Dr Hartmut Stadie (Universitaet Hamburg)

04/09/2007, 11:40

Distributed data analysis and information management

oral presentation

The detector and collider upgrades for the HERA-II running at DESY have considerably increased the demand on computing resources for the ZEUS experiment. To meet the demand, ZEUS commissioned an automated Monte Carlo(MC) production capable of using Grid resources in November 2004. Since then, more than one billion events have been simulated and reconstructed on the Grid which corresponds...

48. BaBar MC Production on the Canadian Grid using a Web Services Approach

Dr Ashok Agarwal (University of Victoria)

04/09/2007, 12:00

Distributed data analysis and information management

oral presentation

The present paper highlights the approach used to design and implement a web services based BaBar Monte Carlo (MC) production grid using Globus Toolkit version 4. The grid integrates the resources of two clusters at the University of Victoria, using the ClassAd mechanism provided by the Condor-G metascheduler. Each cluster uses the Portable Batch System (PBS) as its local resource...

265. LHCb Distributed Conditions Database

Marco Clemencic (European Organization for Nuclear Research (CERN))

05/09/2007, 14:00

Distributed data analysis and information management

oral presentation

The LHCb Conditions Database project provides the necessary tools to handle non-event time-varying data. The main users of conditions are reconstruction and analysis processes, which are running on the Grid. To allow efficient access to the data, we need to use a synchronized replica of the content of the database located at the same site as the event data file, i.e. the LHCb Tier1. The...

322. CMS Conditions Data Access using FroNTier

Dr Lee Lueking (FERMILAB)

05/09/2007, 14:20

Distributed data analysis and information management

oral presentation

The CMS experiment at the LHC has established an infrastructure using the FroNTier framework to deliver conditions (i.e. calibration, alignment, etc.) data to processing clients worldwide. FroNTier is a simple web service approach providing client HTTP access to a central database service. The system for CMS has been developed to work with POOL which provides object relational mapping...

186. Development, Deployment and Operations of ATLAS Databases

Alexandre Vaniachine (Argonne National Laboratory)

05/09/2007, 14:40

Distributed data analysis and information management

oral presentation

In preparation for ATLAS data taking in ATLAS database activities a coordinated shift from development towards operations has occurred. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and...

346. Developments in BaBar simulation - life without a database

Dr Douglas Smith (Stanford Linear Accelerator Center)

05/09/2007, 15:00

Distributed data analysis and information management

oral presentation

There is a need for a large dataset of simulated events for use in analysis of the data from the BaBar high energy physics experiment. The largest cycle of this production in the history of the experiment was just completed in the past year, simulating events against all detector conditions in the history of the experiment, resulting in over eleven billion events in eighteen months. ...

161. Building a Scalable Event-Level Metadata System for ATLAS

Ms Helen McGlone (University of Glasgow/CERN)

05/09/2007, 15:20

Distributed data analysis and information management

oral presentation

The ATLAS TAG database is a multi-terabyte event-level metadata selection system, intended to allow discovery, selection of and navigation to events of interest to an analysis. The TAG database encompasses file- and relational-database-resident event-level metadata, distributed across all ATLAS Tiers. ...

69. ROOTlets and Pythia: Grid enabling HEP applications using the Clarens Toolkit

Dr Conrad Steenberg (Caltech)

05/09/2007, 15:40

Distributed data analysis and information management

oral presentation

We describe how we have used the Clarens Grid Portal Toolkit to develop powerful application and browser-level interfaces to ROOT and Pythia. The Clarens Toolkit is a codebase that was initially developed under the auspices of the Grid Analysis Environment project at Caltech, with the goal of enabling LHC physicists engaged in analysis to bring the full power of the Grid to their desktops,...

260. CMS Centers for Control, Monitoring, Offline Operations and Analysis

Dr Lucas Taylor (Northeastern University, Boston)

06/09/2007, 14:00

Distributed data analysis and information management

oral presentation

The CMS experiment is about to embark on its first physics run at the LHC. To maximize the effectiveness of physicists and technical experts at CERN and worldwide and to facilitate their communications, CMS has established several dedicated and inter-connected operations and monitoring centers. These include a traditional “Control Room” at the CMS site in France, a “CMS Centre” for...

317. Monitoring the ATLAS Production System

Dr John Kennedy (LMU Munich)

06/09/2007, 14:20

Distributed data analysis and information management

oral presentation

The ATLAS production system is responsible for the distribution of O(100,000) jobs per day to over 100 sites worldwide. The tracking and correlation of errors and resource usage within such a large distributed system is of extreme importance. The monitoring system presented here is designed to abstract the monitoring information away form the central database of jobs....

310. Real-time Data Access Monitoring in Distributed, Multi-Petabyte Systems

Dr Tofigh Azemoon (Stanford Linear Accelerator Center)

06/09/2007, 14:40

Distributed data analysis and information management

oral presentation

Petascale systems are in existence today and will become widespread in the next few years. Such systems are inevitably very complex, highly distributed and heterogeneous. Monitoring a petascale system in real time and understanding its status at any given moment without impacting its performance is a highly intricate task. Common approaches and off the shelf tools are either...

255. Monitoring the Atlas Distributed Data Management System

Ricardo Rocha (CERN)

06/09/2007, 15:00

Distributed data analysis and information management

oral presentation

The ATLAS Distributed Data Management (DDM) system is evolving to provide a production-quality service for data distribution and data management support for production and users' analysis. Monitoring the different components in the system has emerged as one of the key issues to achieve this goal. Its distributed nature over different grid infrastructures (EGEE, OSG and NDGF)...

307. Latest Developments in the PROOF System

Dr Fons Rademakers (CERN)

06/09/2007, 15:20

Distributed data analysis and information management

oral presentation

The goal of PROOF (Parallel ROOt Facility) is to enable interactive analysis of large data sets in parallel on a distributed cluster or multi-core machine. PROOF represents a high-performance alternative to a traditional batch-oriented computing system. The ALICE collaboration is planning to use PROOF at the CERN Analysis Facility (CAF) and has been stress testing the system since mid...

96. Data access performance through parallelization and vectored access. Some results.

Mr Fabrizio Furano (INFN sez. di Padova)

06/09/2007, 15:40

Distributed data analysis and information management

oral presentation

HEP data processing and analysis applications typically deal with the problem of accessing and processing data at high speed. Recent study, development and test work has shown that the latencies due to data access can often be hidden by parallelizing them with the data processing, thus giving the ability to have applications which process remote data with a high level of...

361. The GLAST Data Handling Pipeline

Dan Flath (SLAC)

06/09/2007, 16:30

Distributed data analysis and information management

oral presentation

The Data Handling Pipeline ("Pipeline") has been developed for the Gamma-Ray Large Area Space Telescope (GLAST) launching at the end of 2007. Its goal is to generically process graphs of dependent tasks, maintaining a full record of its state, history and data products. In cataloging the relationship between data, analysis results, software versions, as well as statistics (memory usage,...

23. Computing and Ground Data Handling for AMS-02 Mission

Dr Vitaly Choutko (Massachusetts Institute of Technology (MIT))

06/09/2007, 16:50

Distributed data analysis and information management

oral presentation

The AMS-02 detector will be installed on ISS ifor at least 3 years. The data will be transmitted from ISS to NASA Marshall Space Flight Center (MSFC, Huntsvile, Alabama) and transfered to CERN (Geneva Switzerland) for processing and analysis. We are presenting the AMS-02 Ground Data Handling scenario and requirements to AMS ground centers: the Payload Operation and Control Center (POCC)...

237. Real-time dataflow and workflow with the CMS Tracker data

Dr Nicola De Filippis (INFN Bari)

06/09/2007, 17:10

Distributed data analysis and information management

oral presentation

The Tracker detector has been taking real data with cosmics at the Tracker Integration Facility (TIF) at CERN. First DAQ checks and on-line monitoring tasks are executed at the Tracker Analysis Centre (TAC) which is a dedicated Control Room at TIF with limited computing resources. A set of software agents were developed to perform the real-time data conversion in a standard Event...

354. Grid data storage on widely distributed worker nodes using Scalla and SRM

Mr Pavel Jakl (Nuclear Physics Institute, Academy of Sciences of the Czech Republic)

06/09/2007, 17:30

Distributed data analysis and information management

oral presentation

Facing the reality of storage economics, NP experiments such as RHIC/STAR have been engaged in a shift in the analysis model, and now heavily rely on using cheap disks attached to processing nodes, as such a model is extremely beneficial over expensive centralized storage. Additionally, exploiting storage aggregates with enhanced distributed computing capabilities such as dynamic space...

314. CRAB (CMS Remote Anaysis Builder)

Daniele Spiga (Universita degli Studi di Perugia)

06/09/2007, 17:50

Distributed data analysis and information management

oral presentation

Starting from 2007 the CMS experiment will produce several Pbytes of data each year, to be distributed over many computing centers located in many different countries. The CMS computing model defines how the data are to be distributed such that CMS physicists can access them in an efficient manner in order to perform their physics analyses. CRAB (CMS Remote Analysis Builder) is a...

Building timetable...