2โ€“9 Sept 2007
Victoria, Canada
Europe/Zurich timezone
Please book accomodation as soon as possible.

Session

Distributed data analysis and information management

DD
3 Sept 2007, 14:00
Victoria, Canada

Victoria, Canada

Conveners

Distributed data analysis and information management: DD 1

  • Roger Jones (Lancaster University)

Distributed data analysis and information management: DD 2

  • Roger Jones (Lancaster University)

Distributed data analysis and information management: DD 3

  • Michael Ernst (BNL)

Distributed data analysis and information management: DD 4

  • Ian Fisk (FNAL)

Distributed data analysis and information management: DD 5

  • Roger Jones (Lancaster University)

Distributed data analysis and information management: DD 6

  • Roger Jones (Lancaster University)

Presentation materials

There are no materials yet.

  1. Dr Andrew Maier (CERN)
    03/09/2007, 14:00
    Distributed data analysis and information management
    oral presentation
    Ganga, the job-management system (http://cern.ch/ganga), developed as an ATLAS- LHCb common project, offers a simple, efficient and consistent user experience in a variety of heterogeneous environments: from local clusters to global Grid systems. Ganga helps end-users to organise their analysis activities on the Grid by providing automatic persistency of the job's metadata. A user has...
    Go to contribution page
  2. Dr Akram Khan (Brunel University)
    03/09/2007, 14:20
    Distributed data analysis and information management
    oral presentation
    ASAP is a system for enabling distributed analysis for CMS physicists. It was created with the aim of simplifying the transition from a locally running application to one that is distributed across the Grid. The experience gained in operating the system for the past 2 years has been used to redevelop a more robust, performant and scalable version. ASAP consists of a client for job...
    Go to contribution page
  3. Mr Jan Fiete Grosse Oetringhaus (CERN)
    03/09/2007, 14:40
    Distributed data analysis and information management
    oral presentation
    ALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - Cern Analysis Facility) for fast analysis. The system is especially aimed at the prototyping phase of analyses that need a high number of development iterations and thus desire a short response time. Typical examples are the tuning of cuts during the development of an analysis as well as...
    Go to contribution page
  4. Dr Stuart Paterson (CERN)
    03/09/2007, 15:00
    Distributed data analysis and information management
    oral presentation
    The LHCb distributed data analysis system consists of the Ganga job submission front-end and the DIRAC Workload and Data Management System. Ganga is jointly developed with ATLAS and allows LHCb users to submit jobs on several backends including: several batch systems, LCG and DIRAC. The DIRAC API provides a transparent and secure way for users to run jobs to the Grid and is the default...
    Go to contribution page
  5. Dr Johannes Elmsheuser (Ludwig-Maximilians-Universitรคt Mรผnchen)
    03/09/2007, 15:20
    Distributed data analysis and information management
    oral presentation
    The distributed data analysis using Grid resources is one of the fundamental applications in high energy physics to be addressed and realized before the start of LHC data taking. The needs to manage the resources are very high. In every experiment up to a thousand physicist will be submitting analysis jobs into the Grid. Appropriate user interfaces and helper applications have to be made...
    Go to contribution page
  6. Mr Adam Kocoloski (MIT)
    03/09/2007, 15:40
    Distributed data analysis and information management
    oral presentation
    Modern Macintosh computers feature Xgrid, a distributed computing architecture built directly into Apple's OS X operating system. While the approach is radically different from those generally expected by the Unix based Grid infrastructures (Open Science Grid, TeraGrid, EGEE), opportunistic computing on Xgrid is nonetheless a tempting and novel way to assemble a computing cluster with a...
    Go to contribution page
  7. Leandro Franco (CERN)
    03/09/2007, 16:30
    Distributed data analysis and information management
    oral presentation
    Particle accelerators produce huge amounts of information in every experiment and such quantity cannot be stored easily in a personal computer. For that reason, most of the analysis is done using remote storage servers (this will be particularly true when the Large Hadron Collider starts its operation in 2007). Seeing how the bandwidth has increased in the last few years, the biggest...
    Go to contribution page
  8. Lassi Tuura (Northeastern University)
    03/09/2007, 16:50
    Distributed data analysis and information management
    oral presentation
    The CMS experiment will need to sustain uninterrupted high reliability, high throughput and very diverse data transfer activities as the LHC operations start. PhEDEx, the CMS data transfer system, will be responsible for the full range of the transfer needs of the experiment. Covering the entire spectrum is a demanding task: from the critical high-throughput transfers between CERN and...
    Go to contribution page
  9. Dr Douglas Smith (Stanford Linear Accelerator Center)
    03/09/2007, 17:10
    Distributed data analysis and information management
    oral presentation
    The BaBar high energy experiment has been running for many years now, and has resulted in a data set of over a petabyte in size, containing over two million files. The management of this set of data has to support the requirements of further data production along with a physics community that has vastly different needs. To support these needs the BaBar bookkeeping system was developed,...
    Go to contribution page
  10. Andrew Cameron Smith (CERN)
    03/09/2007, 17:30
    Distributed data analysis and information management
    oral presentation
    The LHCb Computing Model describes the dataflow model for all stages in the processing of real and simulated events and defines the role of LHCb associated Tier1 and Tier2 computing centres. The WLCG โ€˜dressed rehearsalโ€™ exercise aims to allow LHC experiments to deploy the full chain of their Computing Models, making use of all underlying WLCG services and resources, in preparation for real...
    Go to contribution page
  11. Dr Roger Jones (LANCAS)
    04/09/2007, 11:00
    Distributed data analysis and information management
    oral presentation
    The ATLAS Computing Model was constructed after early tests and was captured in the ATLAS Computing TDR in June 2005. Since then, the grid tools and services have evolved and their performance is starting to be understood through large-scale exercises. As real data taking becomes immanent, the computing model continues to evolve, with robustness and reliability being the watchwords for...
    Go to contribution page
  12. Dr Simone Pagan Griso (University and INFN Padova)
    04/09/2007, 11:20
    Distributed data analysis and information management
    oral presentation
    The upgrades of the Tevatron collider and of the CDF detector have considerably increased the demand on computing resources in particular for Monte Carlo production for the CDF experiment. This has forced the collaboration to move beyond the usage of dedicated resources and start exploiting Grid resources. The CDF Analysis Farm (CAF) model has been reimplemented into LcgCAF ...
    Go to contribution page
  13. Dr Hartmut Stadie (Universitaet Hamburg)
    04/09/2007, 11:40
    Distributed data analysis and information management
    oral presentation
    The detector and collider upgrades for the HERA-II running at DESY have considerably increased the demand on computing resources for the ZEUS experiment. To meet the demand, ZEUS commissioned an automated Monte Carlo(MC) production capable of using Grid resources in November 2004. Since then, more than one billion events have been simulated and reconstructed on the Grid which corresponds...
    Go to contribution page
  14. Dr Ashok Agarwal (University of Victoria)
    04/09/2007, 12:00
    Distributed data analysis and information management
    oral presentation
    The present paper highlights the approach used to design and implement a web services based BaBar Monte Carlo (MC) production grid using Globus Toolkit version 4. The grid integrates the resources of two clusters at the University of Victoria, using the ClassAd mechanism provided by the Condor-G metascheduler. Each cluster uses the Portable Batch System (PBS) as its local resource...
    Go to contribution page
  15. Marco Clemencic (European Organization for Nuclear Research (CERN))
    05/09/2007, 14:00
    Distributed data analysis and information management
    oral presentation
    The LHCb Conditions Database project provides the necessary tools to handle non-event time-varying data. The main users of conditions are reconstruction and analysis processes, which are running on the Grid. To allow efficient access to the data, we need to use a synchronized replica of the content of the database located at the same site as the event data file, i.e. the LHCb Tier1. The...
    Go to contribution page
  16. Dr Lee Lueking (FERMILAB)
    05/09/2007, 14:20
    Distributed data analysis and information management
    oral presentation
    The CMS experiment at the LHC has established an infrastructure using the FroNTier framework to deliver conditions (i.e. calibration, alignment, etc.) data to processing clients worldwide. FroNTier is a simple web service approach providing client HTTP access to a central database service. The system for CMS has been developed to work with POOL which provides object relational mapping...
    Go to contribution page
  17. Alexandre Vaniachine (Argonne National Laboratory)
    05/09/2007, 14:40
    Distributed data analysis and information management
    oral presentation
    In preparation for ATLAS data taking in ATLAS database activities a coordinated shift from development towards operations has occurred. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and...
    Go to contribution page
  18. Dr Douglas Smith (Stanford Linear Accelerator Center)
    05/09/2007, 15:00
    Distributed data analysis and information management
    oral presentation
    There is a need for a large dataset of simulated events for use in analysis of the data from the BaBar high energy physics experiment. The largest cycle of this production in the history of the experiment was just completed in the past year, simulating events against all detector conditions in the history of the experiment, resulting in over eleven billion events in eighteen months. ...
    Go to contribution page
  19. Ms Helen McGlone (University of Glasgow/CERN)
    05/09/2007, 15:20
    Distributed data analysis and information management
    oral presentation
    The ATLAS TAG database is a multi-terabyte event-level metadata selection system, intended to allow discovery, selection of and navigation to events of interest to an analysis. The TAG database encompasses file- and relational-database-resident event-level metadata, distributed across all ATLAS Tiers. ...
    Go to contribution page
  20. Dr Conrad Steenberg (Caltech)
    05/09/2007, 15:40
    Distributed data analysis and information management
    oral presentation
    We describe how we have used the Clarens Grid Portal Toolkit to develop powerful application and browser-level interfaces to ROOT and Pythia. The Clarens Toolkit is a codebase that was initially developed under the auspices of the Grid Analysis Environment project at Caltech, with the goal of enabling LHC physicists engaged in analysis to bring the full power of the Grid to their desktops,...
    Go to contribution page
  21. Dr Lucas Taylor (Northeastern University, Boston)
    06/09/2007, 14:00
    Distributed data analysis and information management
    oral presentation
    The CMS experiment is about to embark on its first physics run at the LHC. To maximize the effectiveness of physicists and technical experts at CERN and worldwide and to facilitate their communications, CMS has established several dedicated and inter-connected operations and monitoring centers. These include a traditional โ€œControl Roomโ€ at the CMS site in France, a โ€œCMS Centreโ€ for...
    Go to contribution page
  22. Dr John Kennedy (LMU Munich)
    06/09/2007, 14:20
    Distributed data analysis and information management
    oral presentation
    The ATLAS production system is responsible for the distribution of O(100,000) jobs per day to over 100 sites worldwide. The tracking and correlation of errors and resource usage within such a large distributed system is of extreme importance. The monitoring system presented here is designed to abstract the monitoring information away form the central database of jobs....
    Go to contribution page
  23. Dr Tofigh Azemoon (Stanford Linear Accelerator Center)
    06/09/2007, 14:40
    Distributed data analysis and information management
    oral presentation
    Petascale systems are in existence today and will become widespread in the next few years. Such systems are inevitably very complex, highly distributed and heterogeneous. Monitoring a petascale system in real time and understanding its status at any given moment without impacting its performance is a highly intricate task. Common approaches and off the shelf tools are either...
    Go to contribution page
  24. Ricardo Rocha (CERN)
    06/09/2007, 15:00
    Distributed data analysis and information management
    oral presentation
    The ATLAS Distributed Data Management (DDM) system is evolving to provide a production-quality service for data distribution and data management support for production and users' analysis. Monitoring the different components in the system has emerged as one of the key issues to achieve this goal. Its distributed nature over different grid infrastructures (EGEE, OSG and NDGF)...
    Go to contribution page
  25. Dr Fons Rademakers (CERN)
    06/09/2007, 15:20
    Distributed data analysis and information management
    oral presentation
    The goal of PROOF (Parallel ROOt Facility) is to enable interactive analysis of large data sets in parallel on a distributed cluster or multi-core machine. PROOF represents a high-performance alternative to a traditional batch-oriented computing system. The ALICE collaboration is planning to use PROOF at the CERN Analysis Facility (CAF) and has been stress testing the system since mid...
    Go to contribution page
  26. Mr Fabrizio Furano (INFN sez. di Padova)
    06/09/2007, 15:40
    Distributed data analysis and information management
    oral presentation
    HEP data processing and analysis applications typically deal with the problem of accessing and processing data at high speed. Recent study, development and test work has shown that the latencies due to data access can often be hidden by parallelizing them with the data processing, thus giving the ability to have applications which process remote data with a high level of...
    Go to contribution page
  27. Dan Flath (SLAC)
    06/09/2007, 16:30
    Distributed data analysis and information management
    oral presentation
    The Data Handling Pipeline ("Pipeline") has been developed for the Gamma-Ray Large Area Space Telescope (GLAST) launching at the end of 2007. Its goal is to generically process graphs of dependent tasks, maintaining a full record of its state, history and data products. In cataloging the relationship between data, analysis results, software versions, as well as statistics (memory usage,...
    Go to contribution page
  28. Dr Vitaly Choutko (Massachusetts Institute of Technology (MIT))
    06/09/2007, 16:50
    Distributed data analysis and information management
    oral presentation
    The AMS-02 detector will be installed on ISS ifor at least 3 years. The data will be transmitted from ISS to NASA Marshall Space Flight Center (MSFC, Huntsvile, Alabama) and transfered to CERN (Geneva Switzerland) for processing and analysis. We are presenting the AMS-02 Ground Data Handling scenario and requirements to AMS ground centers: the Payload Operation and Control Center (POCC)...
    Go to contribution page
  29. Dr Nicola De Filippis (INFN Bari)
    06/09/2007, 17:10
    Distributed data analysis and information management
    oral presentation
    The Tracker detector has been taking real data with cosmics at the Tracker Integration Facility (TIF) at CERN. First DAQ checks and on-line monitoring tasks are executed at the Tracker Analysis Centre (TAC) which is a dedicated Control Room at TIF with limited computing resources. A set of software agents were developed to perform the real-time data conversion in a standard Event...
    Go to contribution page
  30. Mr Pavel Jakl (Nuclear Physics Institute, Academy of Sciences of the Czech Republic)
    06/09/2007, 17:30
    Distributed data analysis and information management
    oral presentation
    Facing the reality of storage economics, NP experiments such as RHIC/STAR have been engaged in a shift in the analysis model, and now heavily rely on using cheap disks attached to processing nodes, as such a model is extremely beneficial over expensive centralized storage. Additionally, exploiting storage aggregates with enhanced distributed computing capabilities such as dynamic space...
    Go to contribution page
  31. Daniele Spiga (Universita degli Studi di Perugia)
    06/09/2007, 17:50
    Distributed data analysis and information management
    oral presentation
    Starting from 2007 the CMS experiment will produce several Pbytes of data each year, to be distributed over many computing centers located in many different countries. The CMS computing model defines how the data are to be distributed such that CMS physicists can access them in an efficient manner in order to perform their physics analyses. CRAB (CMS Remote Analysis Builder) is a...
    Go to contribution page
Building timetable...