CHEP 06

Name: CHEP 06
Start: 2006-02-13T08:00:00+01:00
End: 2006-02-17T19:30:00+01:00
Location: Tata Institute of Fundamental Research

13–17 Feb 2006

Tata Institute of Fundamental Research

Europe/Zurich timezone

Support

chep06@tifr.res.in

Session

Distributed Data Analysis

DDA

13 Feb 2006, 14:00

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India

There are no materials yet.

39. DIAL: Distributed Interactive Analysis of Large Datasets

David Adams (BNL)

13/02/2006, 14:00

Distributed Data Analysis

oral presentation

DIAL is a generic framework for distributed analysis. The heart of the system is a scheduler (also called analysis service) that receives high-level processing requests expressed in terms of an input dataset and a transformation to act on that dataset. The scheduler splits the dataset, applies the transformation to each subdataset to produce a new subdataset, and then merges these to...

61. Distributed object monitoring for ROOT analyses with GO4 v3

Dr Jörn Adamczewski (GSI)

13/02/2006, 14:20

Distributed Data Analysis

oral presentation

The new version 3 of the ROOT based GSI standard analysis framework GO4 (GSI Object Oriented Online Offline) has been released. GO4 provides multithreaded remote communication between analysis process and GUI process, a dynamically configurable analysis framework, and a Qt based GUI with embedded ROOT graphics. In the new version 3 a new internal object manager was developed. Its...

95. DIRAC Production Manager Tools

Dr Gennady KUZNETSOV (Rutherford Appleton Laboratory, Didcot)

13/02/2006, 14:40

Distributed Data Analysis

oral presentation

DIRAC is the LHCb Workload and Data Management system used for Monte Carlo production, data processing and distributed user analysis. Such a wide variety of applications requires a general approach to the tasks of job definition, configuration and management. In this paper, we present a suite of tools called a Production Console, which is a general framework for job formulation,...

98. PROOF - The Parallel ROOT Facility

Gerardo GANIS (CERN)

13/02/2006, 15:00

Distributed Data Analysis

oral presentation

The Parallel ROOT Facility, PROOF, enables the interactive analysis of distributed data sets in a transparent way. It exploits the inherent parallelism in data of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. Being part of the ROOT framework PROOF inherits the benefits of a performant...

118. Grid Data Management: Simulations of LCG 2008

Caitriana Nicholson (University of Glasgow)

13/02/2006, 16:00

Distributed Data Analysis

oral presentation

Simulations have been performed with the grid simulator OptorSim using the expected analysis patterns from the LHC experiments and a realistic model of the LCG at LHC startup, with thousands of user analysis jobs running at over a hundred grid sites. It is shown, first, that dynamic data replication plays a significant role in the overall analysis throughput in terms of optimising job...

130. A skimming procedure to handle large datasets at CDF

Dr Donatella Lucchesi (INFN Padova), Dr Francesco Delli Paoli (INFN Padova)

13/02/2006, 16:20

Distributed Data Analysis

oral presentation

The CDF experiment has a new trigger which selects events depending on the significance of the track impact parameters. With this trigger a sample of events enriched of b and c mesons has been selected and it is used for several important physics analysis like the Bs mixing. The size of the dataset is of about 20 TBytes corresponding to an integrated luminosity of 1 fb-1 collected by CDF....

143. Automated recovery of data-intensive jobs in D0 and CDF using SAM

Valeria Bartsch (FERMILAB / University College London)

13/02/2006, 16:40

Distributed Data Analysis

oral presentation

SAM is a data handling system that provides Fermilab HEP experiments of D0, CDF and MINOS with the means to catalog, distribute and track the usage of their collected and analyzed data. Annually, SAM serves petabytes of data to physics groups performing data analysis, data reconstruction and simulation at various computing centers across the world. Given the volume of the detector data, a...

206. Resource Predictors in HEP Applications

John Huth (Harvard University)

13/02/2006, 17:00

Distributed Data Analysis

oral presentation

The ATLAS experiment uses a tiered data Grid architecture that enables possibly overlapping subsets, or replicas, of original datasets to be located across the ATLAS collaboration. Many individual elements of these datasets can also be recreated locally from scratch based on a limited number of inputs. We envision a time when a user will want to determine which is more expedient,...

237. CMS/ARDA activity within the CMS distributed computing system

Dr Julia Andreeva (CERN)

13/02/2006, 17:20

Distributed Data Analysis

oral presentation

The ARDA project focuses in delivering analysis prototypes together with the LHC experiments. The ARDA/CMS activity delivered a fully-functional analysis prototype exposed to a pilot community of CMS users. The current integration work of key components into the CMS system is described: the activity focuses on providing a coherent monitor layer where information from diverse sources...

239. ARDA experience in collaborating with the LHC experiments

Dr Massimo Lamanna (CERN)

14/02/2006, 14:00

Distributed Data Analysis

oral presentation

The ARDA project focuses in delivering analysis prototypes together with the LHC experiments. Each experiment prototype is in principle independent but commonalities have been observed. The first level of commonality is represented by mature projects which can be effectively shared across different users. The best example is GANGA, providing a toolkit to organize users’ activity,...

240. Evolution of BOSS, a tool for job submission and tracking

Mr stuart WAKEFIELD (Imperial College, University of London, London, UNITED KINGDOM)

14/02/2006, 14:20

Distributed Data Analysis

oral presentation

BOSS (Batch Object Submission System) has been developed to provide logging and bookkeeping and real-time monitoring of jobs submitted to a local farm or a grid system. The information is persistently stored in a relational database for further processing. By means of user-supplied filters, BOSS extracts the specific job information to be logged from the standard streams of the job itself...

256. Interactive Web-based Analysis Clients using AJAX: with examples for CMS, ROOT and GEANT4

Mr Giulio Eulisse (Northeastern University, Boston)

14/02/2006, 14:40

Distributed Data Analysis

oral presentation

We describe how a new programming paradigm dubbed AJAX (Asynchronous Javascript and XML) has enabled us to develop highly-performant web-based graphics applications. Specific examples are shown of our web clients for: CMS Event Display (real-time Cosmic Challenge), remote detecotr monitoring with ROOT displays, and performat 3D displays of GEANT4 descriptions of LHC detectors. The...

260. DIRAC Infrastructure for Distributed Analysis

Mr Stuart Paterson (University of Glasgow / CPPM, Marseille)

14/02/2006, 15:00

Distributed Data Analysis

oral presentation

DIRAC is the LHCb Workload and Data Management system for Monte Carlo simulation, data processing and distributed user analysis. Using DIRAC, a variety of resources may be integrated, including individual PC's, local batch systems and the LCG grid. We report here on the progress made in extending DIRAC for distributed user analysis on LCG. In this paper we describe the advances in the...

263. The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures

Dr Dietrich Liko (CERN)

14/02/2006, 16:00

Distributed Data Analysis

oral presentation

The ATLAS strategy follows a service oriented approach to provide Distributed Analysis capabilities to its users. Based on initial experiences with an Analysis service, the ATLAS production system has been evolved to support analysis jobs. As the ATLAS production system is based on several grid flavours (LCG, OSG and Nordugrid), analysis jobs will be supported by specific executors on the...

267. Prototype of a Parallel Analysis System for CMS using PROOF

Dr Isidro Gonzalez Caballero (Instituto de Fisica de Cantabria (CSIC-UC))

14/02/2006, 16:20

Distributed Data Analysis

oral presentation

A typical HEP analysis in the LHC experiments involves the processing of data corresponding to several million events, terabytes of information, to be analysed in the last phases. Currently, processing one million events in a single modern workstation takes several hours, thus slowing the analysis cycle. The desirable computing model for a physicist would be closer to a High Performance...

272. JobMon: A Secure, Scalable, Interactive Grid Job Monitor

Dr Conrad Steenberg (CALIFORNIA INSTITUTE OF TECHNOLOGY)

14/02/2006, 16:40

Distributed Data Analysis

oral presentation

We present the architecture and implementation of a bi-directional system for monitoring long-running jobs on large computational clusters. JobMon comprises an asyncronous intra-cluster communication server and a Clarens web service on a head node, coupled with a job wrapper for each monitored job to provide monitoring information both periodically and upon request. The Clarens web service...

273. CRAB: a tool to enable CMS Distributed Analysis

Mr Marco Corvo (Cnaf and Cern)

14/02/2006, 17:00

Distributed Data Analysis

oral presentation

CRAB (Cms Remote Analysis Builder) is a tool, developed by INFN within the CMS collaboration, which provides to physicists the possibility to analyze large amount of data exploiting the huge computing power of grid distributed systems. It's currently used to analyze simulated data needed to prepare the Physics Technical Design Report. Data produced by CMS are distributed among several...

279. Distributed CMS Analysis on the Open Science Grid

Oliver Gutsche (FERMILAB)

15/02/2006, 14:00

Distributed Data Analysis

oral presentation

The CMS computing model provides reconstruction and access to recorded data of the CMS detector as well as to Monte Carlo (MC) generated data. Due to the increased complexity, these functionalities will be provided by a tier structure of globally located computing centers using GRID technologies. In the CMS baseline, user access to data is provided by the CMS Remote Analysis Builder...

275. DIANA Scheduler

Mr Ashiq Anjum (University of the West of England)

15/02/2006, 14:20

Distributed Data Analysis

oral presentation

Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine primarily for data intensive sciences such as physics analysis is described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allotted to a user...

317. Experience with distributed analysis in LHCb

Dr Ulrik Egede (IMPERIAL COLLEGE LONDON)

15/02/2006, 14:40

Distributed Data Analysis

oral presentation

Physics analysis of large amounts of data by many users requires the usage of Grid resources. It is however important that users can see a single environment for developing and testing algorithms locally and for running on large data samples on the Grid. The Ganga job wizard, developed by LHCb and ATLAS, provides physicists such an integrated environment for job preparation, bookkeeping...

347. Panda: Production and Distributed Analysis System for ATLAS

Prof. Kaushik De (UNIVERSITY OF TEXAS AT ARLINGTON)

15/02/2006, 15:00

Distributed Data Analysis

oral presentation

A new offline processing system for production and analysis, Panda, has been developed for the ATLAS experiment and deployed in OSG. ATLAS will accrue tens of petabytes of data per year, and the Panda design is accordingly optimized for data intensive processing. Its development followed three years of production experience, the lessons from which drove a markedly different design for the...

351. From rootd to Xrootd, from physical to logical files: experience on accessing and managing distributed data.

Mr Pavel JAKL (Nuclear Physics Inst., Academy of Sciences - Czech Republic)

15/02/2006, 16:00

Distributed Data Analysis

oral presentation

With its increasing data samples, the RHIC/STAR experiment has faced a challenging data management dilemma: solutions using cheap disks attached to processing nodes have rapidly become economically beneficial over standard centralized storage. At the cost of data management, the STAR experiment moved to a multiple component locally distributed data model rendered viable by the...

368. Latencies and data access. Boosting the performance of distributed applications.

Mr Fabrizio Furano (INFN sez. di Padova)

15/02/2006, 16:20

Distributed Data Analysis

oral presentation

The latencies induced by network communication often play a big role in reducing the performances of systems which access big amounts of data in a distributed environment. The problem is present in Local Area Networks, but in Wide Area Networks is much more evident. It is generally perceived as a critical problem which makes very difficult to get access to remote data. However, a more...

297. BaBar Bookkeeping - experience and use.

Dr Douglas Smith (STANFORD LINEAR ACCELERATOR CENTER)

15/02/2006, 16:40

Distributed Data Analysis

oral presentation

For the BaBar Computing Group: Two years ago, the BaBar experiment changed its event store from an object oriented database system, to one based on ROOT files. A new bookkeeping system was developed to manage the meta-data of these files. This system has been in constant use since that time, and has successfully provided the needed meta-data information for users' analysis jobs,...

407. Performance and Scalbility of xrootd

Andrew Hanushevsky (Stanford Linear Accelerator Center)

15/02/2006, 17:00

Distributed Data Analysis

oral presentation

When the BaBar experiment transitioned to using the Root Framework s new data server architecture, xrootd, was developed to address event analysis needs. This architecture was deployed at SLAC two years ago and since then has also been deployed at other BaBar Tier 1 sites: IN2P3, INFN, FZK, and RAL; as well as other non-BaBar sites: CERN (Alice), BNL (Star), and Cornell (CLEO). As part of...

Building timetable...

Choose timezone

CHEP 06

Support

Presentation materials