Conveners
Distributed data analysis and information management: DD 1
- Roger Jones (Lancaster University)
Distributed data analysis and information management: DD 2
- Roger Jones (Lancaster University)
Distributed data analysis and information management: DD 3
- Michael Ernst (BNL)
Distributed data analysis and information management: DD 4
- Ian Fisk (FNAL)
Distributed data analysis and information management: DD 5
- Roger Jones (Lancaster University)
Distributed data analysis and information management: DD 6
- Roger Jones (Lancaster University)
-
Dr Andrew Maier (CERN)03/09/2007, 14:00Distributed data analysis and information managementoral presentationGanga, the job-management system (http://cern.ch/ganga), developed as an ATLAS- LHCb common project, offers a simple, efficient and consistent user experience in a variety of heterogeneous environments: from local clusters to global Grid systems. Ganga helps end-users to organise their analysis activities on the Grid by providing automatic persistency of the job's metadata. A user has...Go to contribution page
-
Dr Akram Khan (Brunel University)03/09/2007, 14:20Distributed data analysis and information managementoral presentationASAP is a system for enabling distributed analysis for CMS physicists. It was created with the aim of simplifying the transition from a locally running application to one that is distributed across the Grid. The experience gained in operating the system for the past 2 years has been used to redevelop a more robust, performant and scalable version. ASAP consists of a client for job...Go to contribution page
-
Mr Jan Fiete Grosse Oetringhaus (CERN)03/09/2007, 14:40Distributed data analysis and information managementoral presentationALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - Cern Analysis Facility) for fast analysis. The system is especially aimed at the prototyping phase of analyses that need a high number of development iterations and thus desire a short response time. Typical examples are the tuning of cuts during the development of an analysis as well as...Go to contribution page
-
Dr Stuart Paterson (CERN)03/09/2007, 15:00Distributed data analysis and information managementoral presentationThe LHCb distributed data analysis system consists of the Ganga job submission front-end and the DIRAC Workload and Data Management System. Ganga is jointly developed with ATLAS and allows LHCb users to submit jobs on several backends including: several batch systems, LCG and DIRAC. The DIRAC API provides a transparent and secure way for users to run jobs to the Grid and is the default...Go to contribution page
-
Dr Johannes Elmsheuser (Ludwig-Maximilians-Universitรคt Mรผnchen)03/09/2007, 15:20Distributed data analysis and information managementoral presentationThe distributed data analysis using Grid resources is one of the fundamental applications in high energy physics to be addressed and realized before the start of LHC data taking. The needs to manage the resources are very high. In every experiment up to a thousand physicist will be submitting analysis jobs into the Grid. Appropriate user interfaces and helper applications have to be made...Go to contribution page
-
Mr Adam Kocoloski (MIT)03/09/2007, 15:40Distributed data analysis and information managementoral presentationModern Macintosh computers feature Xgrid, a distributed computing architecture built directly into Apple's OS X operating system. While the approach is radically different from those generally expected by the Unix based Grid infrastructures (Open Science Grid, TeraGrid, EGEE), opportunistic computing on Xgrid is nonetheless a tempting and novel way to assemble a computing cluster with a...Go to contribution page
-
Leandro Franco (CERN)03/09/2007, 16:30Distributed data analysis and information managementoral presentationParticle accelerators produce huge amounts of information in every experiment and such quantity cannot be stored easily in a personal computer. For that reason, most of the analysis is done using remote storage servers (this will be particularly true when the Large Hadron Collider starts its operation in 2007). Seeing how the bandwidth has increased in the last few years, the biggest...Go to contribution page
-
Lassi Tuura (Northeastern University)03/09/2007, 16:50Distributed data analysis and information managementoral presentationThe CMS experiment will need to sustain uninterrupted high reliability, high throughput and very diverse data transfer activities as the LHC operations start. PhEDEx, the CMS data transfer system, will be responsible for the full range of the transfer needs of the experiment. Covering the entire spectrum is a demanding task: from the critical high-throughput transfers between CERN and...Go to contribution page
-
Dr Douglas Smith (Stanford Linear Accelerator Center)03/09/2007, 17:10Distributed data analysis and information managementoral presentationThe BaBar high energy experiment has been running for many years now, and has resulted in a data set of over a petabyte in size, containing over two million files. The management of this set of data has to support the requirements of further data production along with a physics community that has vastly different needs. To support these needs the BaBar bookkeeping system was developed,...Go to contribution page
-
Andrew Cameron Smith (CERN)03/09/2007, 17:30Distributed data analysis and information managementoral presentationThe LHCb Computing Model describes the dataflow model for all stages in the processing of real and simulated events and defines the role of LHCb associated Tier1 and Tier2 computing centres. The WLCG โdressed rehearsalโ exercise aims to allow LHC experiments to deploy the full chain of their Computing Models, making use of all underlying WLCG services and resources, in preparation for real...Go to contribution page
-
Dr Roger Jones (LANCAS)04/09/2007, 11:00Distributed data analysis and information managementoral presentationThe ATLAS Computing Model was constructed after early tests and was captured in the ATLAS Computing TDR in June 2005. Since then, the grid tools and services have evolved and their performance is starting to be understood through large-scale exercises. As real data taking becomes immanent, the computing model continues to evolve, with robustness and reliability being the watchwords for...Go to contribution page
-
Dr Simone Pagan Griso (University and INFN Padova)04/09/2007, 11:20Distributed data analysis and information managementoral presentationThe upgrades of the Tevatron collider and of the CDF detector have considerably increased the demand on computing resources in particular for Monte Carlo production for the CDF experiment. This has forced the collaboration to move beyond the usage of dedicated resources and start exploiting Grid resources. The CDF Analysis Farm (CAF) model has been reimplemented into LcgCAF ...Go to contribution page
-
Dr Hartmut Stadie (Universitaet Hamburg)04/09/2007, 11:40Distributed data analysis and information managementoral presentationThe detector and collider upgrades for the HERA-II running at DESY have considerably increased the demand on computing resources for the ZEUS experiment. To meet the demand, ZEUS commissioned an automated Monte Carlo(MC) production capable of using Grid resources in November 2004. Since then, more than one billion events have been simulated and reconstructed on the Grid which corresponds...Go to contribution page
-
Dr Ashok Agarwal (University of Victoria)04/09/2007, 12:00Distributed data analysis and information managementoral presentationThe present paper highlights the approach used to design and implement a web services based BaBar Monte Carlo (MC) production grid using Globus Toolkit version 4. The grid integrates the resources of two clusters at the University of Victoria, using the ClassAd mechanism provided by the Condor-G metascheduler. Each cluster uses the Portable Batch System (PBS) as its local resource...Go to contribution page
-
Marco Clemencic (European Organization for Nuclear Research (CERN))05/09/2007, 14:00Distributed data analysis and information managementoral presentationThe LHCb Conditions Database project provides the necessary tools to handle non-event time-varying data. The main users of conditions are reconstruction and analysis processes, which are running on the Grid. To allow efficient access to the data, we need to use a synchronized replica of the content of the database located at the same site as the event data file, i.e. the LHCb Tier1. The...Go to contribution page
-
Dr Lee Lueking (FERMILAB)05/09/2007, 14:20Distributed data analysis and information managementoral presentationThe CMS experiment at the LHC has established an infrastructure using the FroNTier framework to deliver conditions (i.e. calibration, alignment, etc.) data to processing clients worldwide. FroNTier is a simple web service approach providing client HTTP access to a central database service. The system for CMS has been developed to work with POOL which provides object relational mapping...Go to contribution page
-
Alexandre Vaniachine (Argonne National Laboratory)05/09/2007, 14:40Distributed data analysis and information managementoral presentationIn preparation for ATLAS data taking in ATLAS database activities a coordinated shift from development towards operations has occurred. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and...Go to contribution page
-
Dr Douglas Smith (Stanford Linear Accelerator Center)05/09/2007, 15:00Distributed data analysis and information managementoral presentationThere is a need for a large dataset of simulated events for use in analysis of the data from the BaBar high energy physics experiment. The largest cycle of this production in the history of the experiment was just completed in the past year, simulating events against all detector conditions in the history of the experiment, resulting in over eleven billion events in eighteen months. ...Go to contribution page
-
Ms Helen McGlone (University of Glasgow/CERN)05/09/2007, 15:20Distributed data analysis and information managementoral presentationThe ATLAS TAG database is a multi-terabyte event-level metadata selection system, intended to allow discovery, selection of and navigation to events of interest to an analysis. The TAG database encompasses file- and relational-database-resident event-level metadata, distributed across all ATLAS Tiers. ...Go to contribution page
-
Dr Conrad Steenberg (Caltech)05/09/2007, 15:40Distributed data analysis and information managementoral presentationWe describe how we have used the Clarens Grid Portal Toolkit to develop powerful application and browser-level interfaces to ROOT and Pythia. The Clarens Toolkit is a codebase that was initially developed under the auspices of the Grid Analysis Environment project at Caltech, with the goal of enabling LHC physicists engaged in analysis to bring the full power of the Grid to their desktops,...Go to contribution page
-
Dr Lucas Taylor (Northeastern University, Boston)06/09/2007, 14:00Distributed data analysis and information managementoral presentationThe CMS experiment is about to embark on its first physics run at the LHC. To maximize the effectiveness of physicists and technical experts at CERN and worldwide and to facilitate their communications, CMS has established several dedicated and inter-connected operations and monitoring centers. These include a traditional โControl Roomโ at the CMS site in France, a โCMS Centreโ for...Go to contribution page
-
Dr John Kennedy (LMU Munich)06/09/2007, 14:20Distributed data analysis and information managementoral presentationThe ATLAS production system is responsible for the distribution of O(100,000) jobs per day to over 100 sites worldwide. The tracking and correlation of errors and resource usage within such a large distributed system is of extreme importance. The monitoring system presented here is designed to abstract the monitoring information away form the central database of jobs....Go to contribution page
-
Dr Tofigh Azemoon (Stanford Linear Accelerator Center)06/09/2007, 14:40Distributed data analysis and information managementoral presentationPetascale systems are in existence today and will become widespread in the next few years. Such systems are inevitably very complex, highly distributed and heterogeneous. Monitoring a petascale system in real time and understanding its status at any given moment without impacting its performance is a highly intricate task. Common approaches and off the shelf tools are either...Go to contribution page
-
Ricardo Rocha (CERN)06/09/2007, 15:00Distributed data analysis and information managementoral presentationThe ATLAS Distributed Data Management (DDM) system is evolving to provide a production-quality service for data distribution and data management support for production and users' analysis. Monitoring the different components in the system has emerged as one of the key issues to achieve this goal. Its distributed nature over different grid infrastructures (EGEE, OSG and NDGF)...Go to contribution page
-
Dr Fons Rademakers (CERN)06/09/2007, 15:20Distributed data analysis and information managementoral presentationThe goal of PROOF (Parallel ROOt Facility) is to enable interactive analysis of large data sets in parallel on a distributed cluster or multi-core machine. PROOF represents a high-performance alternative to a traditional batch-oriented computing system. The ALICE collaboration is planning to use PROOF at the CERN Analysis Facility (CAF) and has been stress testing the system since mid...Go to contribution page
-
Mr Fabrizio Furano (INFN sez. di Padova)06/09/2007, 15:40Distributed data analysis and information managementoral presentationHEP data processing and analysis applications typically deal with the problem of accessing and processing data at high speed. Recent study, development and test work has shown that the latencies due to data access can often be hidden by parallelizing them with the data processing, thus giving the ability to have applications which process remote data with a high level of...Go to contribution page
-
Dan Flath (SLAC)06/09/2007, 16:30Distributed data analysis and information managementoral presentationThe Data Handling Pipeline ("Pipeline") has been developed for the Gamma-Ray Large Area Space Telescope (GLAST) launching at the end of 2007. Its goal is to generically process graphs of dependent tasks, maintaining a full record of its state, history and data products. In cataloging the relationship between data, analysis results, software versions, as well as statistics (memory usage,...Go to contribution page
-
Dr Vitaly Choutko (Massachusetts Institute of Technology (MIT))06/09/2007, 16:50Distributed data analysis and information managementoral presentationThe AMS-02 detector will be installed on ISS ifor at least 3 years. The data will be transmitted from ISS to NASA Marshall Space Flight Center (MSFC, Huntsvile, Alabama) and transfered to CERN (Geneva Switzerland) for processing and analysis. We are presenting the AMS-02 Ground Data Handling scenario and requirements to AMS ground centers: the Payload Operation and Control Center (POCC)...Go to contribution page
-
Dr Nicola De Filippis (INFN Bari)06/09/2007, 17:10Distributed data analysis and information managementoral presentationThe Tracker detector has been taking real data with cosmics at the Tracker Integration Facility (TIF) at CERN. First DAQ checks and on-line monitoring tasks are executed at the Tracker Analysis Centre (TAC) which is a dedicated Control Room at TIF with limited computing resources. A set of software agents were developed to perform the real-time data conversion in a standard Event...Go to contribution page
-
Mr Pavel Jakl (Nuclear Physics Institute, Academy of Sciences of the Czech Republic)06/09/2007, 17:30Distributed data analysis and information managementoral presentationFacing the reality of storage economics, NP experiments such as RHIC/STAR have been engaged in a shift in the analysis model, and now heavily rely on using cheap disks attached to processing nodes, as such a model is extremely beneficial over expensive centralized storage. Additionally, exploiting storage aggregates with enhanced distributed computing capabilities such as dynamic space...Go to contribution page
-
Daniele Spiga (Universita degli Studi di Perugia)06/09/2007, 17:50Distributed data analysis and information managementoral presentationStarting from 2007 the CMS experiment will produce several Pbytes of data each year, to be distributed over many computing centers located in many different countries. The CMS computing model defines how the data are to be distributed such that CMS physicists can access them in an efficient manner in order to perform their physics analyses. CRAB (CMS Remote Analysis Builder) is a...Go to contribution page