13โ€“17 Feb 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

Session

Distributed Data Analysis

DDA
13 Feb 2006, 14:00
Tata Institute of Fundamental Research

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India

Presentation materials

There are no materials yet.

  1. David Adams (BNL)
    13/02/2006, 14:00
    Distributed Data Analysis
    oral presentation
    DIAL is a generic framework for distributed analysis. The heart of the system is a scheduler (also called analysis service) that receives high-level processing requests expressed in terms of an input dataset and a transformation to act on that dataset. The scheduler splits the dataset, applies the transformation to each subdataset to produce a new subdataset, and then merges these to...
    Go to contribution page
  2. Dr Jรถrn Adamczewski (GSI)
    13/02/2006, 14:20
    Distributed Data Analysis
    oral presentation
    The new version 3 of the ROOT based GSI standard analysis framework GO4 (GSI Object Oriented Online Offline) has been released. GO4 provides multithreaded remote communication between analysis process and GUI process, a dynamically configurable analysis framework, and a Qt based GUI with embedded ROOT graphics. In the new version 3 a new internal object manager was developed. Its...
    Go to contribution page
  3. Dr Gennady KUZNETSOV (Rutherford Appleton Laboratory, Didcot)
    13/02/2006, 14:40
    Distributed Data Analysis
    oral presentation
    DIRAC is the LHCb Workload and Data Management system used for Monte Carlo production, data processing and distributed user analysis. Such a wide variety of applications requires a general approach to the tasks of job definition, configuration and management. In this paper, we present a suite of tools called a Production Console, which is a general framework for job formulation,...
    Go to contribution page
  4. Gerardo GANIS (CERN)
    13/02/2006, 15:00
    Distributed Data Analysis
    oral presentation
    The Parallel ROOT Facility, PROOF, enables the interactive analysis of distributed data sets in a transparent way. It exploits the inherent parallelism in data of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. Being part of the ROOT framework PROOF inherits the benefits of a performant...
    Go to contribution page
  5. Caitriana Nicholson (University of Glasgow)
    13/02/2006, 16:00
    Distributed Data Analysis
    oral presentation
    Simulations have been performed with the grid simulator OptorSim using the expected analysis patterns from the LHC experiments and a realistic model of the LCG at LHC startup, with thousands of user analysis jobs running at over a hundred grid sites. It is shown, first, that dynamic data replication plays a significant role in the overall analysis throughput in terms of optimising job...
    Go to contribution page
  6. Dr Donatella Lucchesi (INFN Padova), Dr Francesco Delli Paoli (INFN Padova)
    13/02/2006, 16:20
    Distributed Data Analysis
    oral presentation
    The CDF experiment has a new trigger which selects events depending on the significance of the track impact parameters. With this trigger a sample of events enriched of b and c mesons has been selected and it is used for several important physics analysis like the Bs mixing. The size of the dataset is of about 20 TBytes corresponding to an integrated luminosity of 1 fb-1 collected by CDF....
    Go to contribution page
  7. Valeria Bartsch (FERMILAB / University College London)
    13/02/2006, 16:40
    Distributed Data Analysis
    oral presentation
    SAM is a data handling system that provides Fermilab HEP experiments of D0, CDF and MINOS with the means to catalog, distribute and track the usage of their collected and analyzed data. Annually, SAM serves petabytes of data to physics groups performing data analysis, data reconstruction and simulation at various computing centers across the world. Given the volume of the detector data, a...
    Go to contribution page
  8. John Huth (Harvard University)
    13/02/2006, 17:00
    Distributed Data Analysis
    oral presentation
    The ATLAS experiment uses a tiered data Grid architecture that enables possibly overlapping subsets, or replicas, of original datasets to be located across the ATLAS collaboration. Many individual elements of these datasets can also be recreated locally from scratch based on a limited number of inputs. We envision a time when a user will want to determine which is more expedient,...
    Go to contribution page
  9. Dr Julia Andreeva (CERN)
    13/02/2006, 17:20
    Distributed Data Analysis
    oral presentation
    The ARDA project focuses in delivering analysis prototypes together with the LHC experiments. The ARDA/CMS activity delivered a fully-functional analysis prototype exposed to a pilot community of CMS users. The current integration work of key components into the CMS system is described: the activity focuses on providing a coherent monitor layer where information from diverse sources...
    Go to contribution page
  10. Dr Massimo Lamanna (CERN)
    14/02/2006, 14:00
    Distributed Data Analysis
    oral presentation
    The ARDA project focuses in delivering analysis prototypes together with the LHC experiments. Each experiment prototype is in principle independent but commonalities have been observed. The first level of commonality is represented by mature projects which can be effectively shared across different users. The best example is GANGA, providing a toolkit to organize usersโ€™ activity,...
    Go to contribution page
  11. Mr stuart WAKEFIELD (Imperial College, University of London, London, UNITED KINGDOM)
    14/02/2006, 14:20
    Distributed Data Analysis
    oral presentation
    BOSS (Batch Object Submission System) has been developed to provide logging and bookkeeping and real-time monitoring of jobs submitted to a local farm or a grid system. The information is persistently stored in a relational database for further processing. By means of user-supplied filters, BOSS extracts the specific job information to be logged from the standard streams of the job itself...
    Go to contribution page
  12. Mr Giulio Eulisse (Northeastern University, Boston)
    14/02/2006, 14:40
    Distributed Data Analysis
    oral presentation
    We describe how a new programming paradigm dubbed AJAX (Asynchronous Javascript and XML) has enabled us to develop highly-performant web-based graphics applications. Specific examples are shown of our web clients for: CMS Event Display (real-time Cosmic Challenge), remote detecotr monitoring with ROOT displays, and performat 3D displays of GEANT4 descriptions of LHC detectors. The...
    Go to contribution page
  13. Mr Stuart Paterson (University of Glasgow / CPPM, Marseille)
    14/02/2006, 15:00
    Distributed Data Analysis
    oral presentation
    DIRAC is the LHCb Workload and Data Management system for Monte Carlo simulation, data processing and distributed user analysis. Using DIRAC, a variety of resources may be integrated, including individual PC's, local batch systems and the LCG grid. We report here on the progress made in extending DIRAC for distributed user analysis on LCG. In this paper we describe the advances in the...
    Go to contribution page
  14. Dr Dietrich Liko (CERN)
    14/02/2006, 16:00
    Distributed Data Analysis
    oral presentation
    The ATLAS strategy follows a service oriented approach to provide Distributed Analysis capabilities to its users. Based on initial experiences with an Analysis service, the ATLAS production system has been evolved to support analysis jobs. As the ATLAS production system is based on several grid flavours (LCG, OSG and Nordugrid), analysis jobs will be supported by specific executors on the...
    Go to contribution page
  15. Dr Isidro Gonzalez Caballero (Instituto de Fisica de Cantabria (CSIC-UC))
    14/02/2006, 16:20
    Distributed Data Analysis
    oral presentation
    A typical HEP analysis in the LHC experiments involves the processing of data corresponding to several million events, terabytes of information, to be analysed in the last phases. Currently, processing one million events in a single modern workstation takes several hours, thus slowing the analysis cycle. The desirable computing model for a physicist would be closer to a High Performance...
    Go to contribution page
  16. Dr Conrad Steenberg (CALIFORNIA INSTITUTE OF TECHNOLOGY)
    14/02/2006, 16:40
    Distributed Data Analysis
    oral presentation
    We present the architecture and implementation of a bi-directional system for monitoring long-running jobs on large computational clusters. JobMon comprises an asyncronous intra-cluster communication server and a Clarens web service on a head node, coupled with a job wrapper for each monitored job to provide monitoring information both periodically and upon request. The Clarens web service...
    Go to contribution page
  17. Mr Marco Corvo (Cnaf and Cern)
    14/02/2006, 17:00
    Distributed Data Analysis
    oral presentation
    CRAB (Cms Remote Analysis Builder) is a tool, developed by INFN within the CMS collaboration, which provides to physicists the possibility to analyze large amount of data exploiting the huge computing power of grid distributed systems. It's currently used to analyze simulated data needed to prepare the Physics Technical Design Report. Data produced by CMS are distributed among several...
    Go to contribution page
  18. Oliver Gutsche (FERMILAB)
    15/02/2006, 14:00
    Distributed Data Analysis
    oral presentation
    The CMS computing model provides reconstruction and access to recorded data of the CMS detector as well as to Monte Carlo (MC) generated data. Due to the increased complexity, these functionalities will be provided by a tier structure of globally located computing centers using GRID technologies. In the CMS baseline, user access to data is provided by the CMS Remote Analysis Builder...
    Go to contribution page
  19. Mr Ashiq Anjum (University of the West of England)
    15/02/2006, 14:20
    Distributed Data Analysis
    oral presentation
    Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine primarily for data intensive sciences such as physics analysis is described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allotted to a user...
    Go to contribution page
  20. Dr Ulrik Egede (IMPERIAL COLLEGE LONDON)
    15/02/2006, 14:40
    Distributed Data Analysis
    oral presentation
    Physics analysis of large amounts of data by many users requires the usage of Grid resources. It is however important that users can see a single environment for developing and testing algorithms locally and for running on large data samples on the Grid. The Ganga job wizard, developed by LHCb and ATLAS, provides physicists such an integrated environment for job preparation, bookkeeping...
    Go to contribution page
  21. Prof. Kaushik De (UNIVERSITY OF TEXAS AT ARLINGTON)
    15/02/2006, 15:00
    Distributed Data Analysis
    oral presentation
    A new offline processing system for production and analysis, Panda, has been developed for the ATLAS experiment and deployed in OSG. ATLAS will accrue tens of petabytes of data per year, and the Panda design is accordingly optimized for data intensive processing. Its development followed three years of production experience, the lessons from which drove a markedly different design for the...
    Go to contribution page
  22. Mr Pavel JAKL (Nuclear Physics Inst., Academy of Sciences - Czech Republic)
    15/02/2006, 16:00
    Distributed Data Analysis
    oral presentation
    With its increasing data samples, the RHIC/STAR experiment has faced a challenging data management dilemma: solutions using cheap disks attached to processing nodes have rapidly become economically beneficial over standard centralized storage. At the cost of data management, the STAR experiment moved to a multiple component locally distributed data model rendered viable by the...
    Go to contribution page
  23. Mr Fabrizio Furano (INFN sez. di Padova)
    15/02/2006, 16:20
    Distributed Data Analysis
    oral presentation
    The latencies induced by network communication often play a big role in reducing the performances of systems which access big amounts of data in a distributed environment. The problem is present in Local Area Networks, but in Wide Area Networks is much more evident. It is generally perceived as a critical problem which makes very difficult to get access to remote data. However, a more...
    Go to contribution page
  24. Dr Douglas Smith (STANFORD LINEAR ACCELERATOR CENTER)
    15/02/2006, 16:40
    Distributed Data Analysis
    oral presentation
    For the BaBar Computing Group: Two years ago, the BaBar experiment changed its event store from an object oriented database system, to one based on ROOT files. A new bookkeeping system was developed to manage the meta-data of these files. This system has been in constant use since that time, and has successfully provided the needed meta-data information for users' analysis jobs,...
    Go to contribution page
  25. Andrew Hanushevsky (Stanford Linear Accelerator Center)
    15/02/2006, 17:00
    Distributed Data Analysis
    oral presentation
    When the BaBar experiment transitioned to using the Root Framework s new data server architecture, xrootd, was developed to address event analysis needs. This architecture was deployed at SLAC two years ago and since then has also been deployed at other BaBar Tier 1 sites: IN2P3, INFN, FZK, and RAL; as well as other non-BaBar sites: CERN (Alice), BNL (Star), and Cornell (CLEO). As part of...
    Go to contribution page
Building timetable...