# CHEP04

Europe/Zurich
Interlaken, Switzerland

#### Interlaken, Switzerland

Description
These are the Web pages providing information for the upcoming Computing in High Energy Physics (CHEP) conference in September 2004. CHEP conferences provide an international forum to exchange information on computing experience and needs for the High Energy Physics community, and to review recent, ongoing and future activities. CHEP conferences are held every 18 months, the previous one being held in San Diego in March 2003.
• Sunday, September 26
• 2:00 PM
Registration Foyer Ost

### Foyer Ost

#### Interlaken, Switzerland

• 6:00 PM
Reception Cocktail Konzerthalle

### Konzerthalle

#### Interlaken, Switzerland

• Monday, September 27
• 8:30 AM
Registration Foyer Ost

### Foyer Ost

#### Interlaken, Switzerland

• Plenary: Session 1 Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

Convener: Guy Wormser (LAL Orsay)
• 1
Speaker: Wolfgang von Rueden (CERN)
• 2
50 years of Computing at CERN
"Where are your Wares" Computing in the broadest sense has a long history, and Babbage (1791-1871), Hollerith (1860-1929) Zuse (1910-1995), many other early pioneers, and the wartime code breakers, all made important breakthroughs. CERN was founded as the first valve-based digital computers were coming onto the market. I will consider 50 years of Computing at CERN from the following viewpoints:- Where did we come from? What happened? Who was involved? Which wares (hardware, software, netware, peopleware and now middleware) were important? Where did computers (not) end up in a physics lab? What has been the impact of computing on particle physics? What about the impact of particle physics computing on other sciences? And the impact of our computing outside the scientific realm? I hope to conclude by looking at where we are going, and by reflecting on why computing is likely to remain challenging for a long time yet. The topic is so vast that my remarks are likely to be either prejudiced or trivial, or both.
Speaker: David Williams
• 3
Run II computing
In support of the Tevatron physics program, the Run II experiments have developed computing models and hardware facilities to support data sets at the petabyte scale, currently corresponding to 500 pb-1 of data and over 2 years of production operations. The systems are complete from online data collection to user analysis, and make extensive use of central services and common solutions developed with the FNAL CD and experiment collaborating institutions, and make use of global facilities to meet the computing needs. We describe the similiarities and differences between computing on CDF and D0 while describing solutions for database and database servers, data handling, movement and storage and job submission mechanisms. The facilities for production computing and analysis and the use of commody fileservers will also be described. Much of the knowledge gained from providing computing at this scale can be abstracted and applied to design and planning for future experiments with large scale computing.
Speaker: A. Boehnlein (FERMI NATIONAL ACCELERATOR LABORATORY)
• 10:30 AM
Coffee Break
• Plenary: Session 2 Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

Convener: Richard Mount (SLAC)
• 4
Computing for Belle
The Belle experiment operates at the KEKB accelerator, a high luminosity asymmetric energy e+ e- machine. KEKB has achieved the world highest luminosity of 1.39 times 10^34 cm-2s-1. Belle accumulates more than 1 million B Bbar pairs in one good day. This corresponds to about 1.2 TB of raw data per day. The amount of the raw and processed data accumulated so far exceeds 1.4 PB. Belle's computing model has been a traditional one and very successful so far. The computing has been managed by minimal number of people using cost effective solutions. Looking at the future, KEKB/Belle plans to improve the luminosity to a few times 10^35 cm- 2s-1, 10 times as much as we obtain now. This presentation describes Belle's efficient computing operations, struggles to manage large amount of raw and physics data, and plans for Belle computing for Super KEKB/Belle.
Speaker: N. KATAYAMA (KEK)
• 5
BaBar computing - From collisions to physics results
The BaBar experiment at SLAC studies B-physics at the Upsilon(4S) resonance using the high-luminosity e+e- collider PEP-II at the Stanford Linear Accelerator Center (SLAC). Taking, processing and analyzing the very large data samples is a significant computing challenge. This presentation will describe the entire BaBar computing chain and illustrate the solutions chosen as well as their evolution with the ever higher luminosity being delivered by PEP-II. This will include data acquisition and software triggering in a high availability, low-deadtime online environment, a prompt, automated calibration pass through the data SLAC and then the full reconstruction of the data that takes place at INFN-Padova within 24 hours. Monte Carlo production takes place in a highly automated fashion in 25+ sites. The resulting real and simulated data is distributed and made available at SLAC and other computing centers. For analysis a much more sophisticated skimming pass has been introduced in the past year, along with a reworked eventstore. This allows 120 highly customized analysis-specific skims to be produced for direct use by the analysis groups. This skim data format is the same eventstore data as that produced directly by the data and monte carlo productions and can be handled and distributed in the same way. The total data volume in BaBar is about 1.5PB.
Speaker: P. ELMER (Princeton University)
• 6
Concepts and technologies used in contemporary DAQ systems
The concepts and technologies applied in data acquisition systems have changed dramatically over the past 15 years. Generic DAQ components and standards such as CAMAC and VME have largely been replaced by dedicated FPGA and ASIC boards, and dedicated real-time operation systems like OS9 or VxWorks have given way to Linux- based trigger processor and event building farms. We have also seen a shift from standard or proprietary bus systems used in event building to GigaBit networks and commodity components, such as PCs. With the advances in processing power, network throughput, and storage technologes, today's data rates in large experiments routinely reach hundreds of MegaBytes/s. We will present examples of contemporary DAQ systems from different experiments, try to identify or categorize new approaches, and will compare the performance and throughput of existing DAQ systems with the projected data rates of the LHC experiments to see how close we have come to accomplish these goals. We will also try to look beyond the field of High-Energy Physics and see if there are trends and technologies out there which are worth keeping an eye on.
Speaker: M. Purschke (Brookhaven National Laboratory)
• 12:30 PM
Lunch
• Computer Fabrics Harder

### Harder

#### Interlaken, Switzerland

• 7
Current Status of Fabric Management at CERN
This paper describes the evolution of fabric management at CERN's T0/T1 Computing Center, from the selection and adoption of prototypes produced by the European DataGrid (EDG) project[1] to enhancements made to them. In the last year of the EDG project, developers and service managers have been working to understand and solve operational and scalability issues. CERN has adopted and strengthened Quattor[2], EDG's installation and configuration management toolsuite, for managing all Linux clusters and servers in the Computing Center, replacing existing legacy management systems. Enhancements to the original prototype include a redundant and scalable server architecture using proxy technology and producing plug-in components for configuring system and LHC computing services. CERN now coordinates the maintenance of Quattor, making it available to other sites. Lemon[3], the EDG fabric monitoring framework, has been progressively deployed onto all managed Linux nodes. We have developed sensors to instrument fabric nodes to provide us with complete performance and exception monitoring information. Performance visualization displays and interfaces to the existing alarm system have also been provided. LEAF[4], the LHC-Era Automated Fabric toolset, comprises the State Management System, a tool to enable high-level configuration commands to be issued to sets of nodes during both hardware and service management Use Cases, and the Hardware Management System, a tool for administering hardware workflows and for visualizing and locating equipment. Finally, we will describe issues currently being addressed and planned future developments.
Speaker: G. Cancio (CERN)
• 8
Developing & Managing a large Linux farm - the Brookhaven Experience
This presentation describes the experiences and the lessons learned by the RHIC/ATLAS Computing Facility (RACF) in building and managing its 2,700+ CPU (and growing) Linux Farm over the past 6+ years. We describe how hardware cost, end-user needs, infrastructure, footprint, hardware configuration, vendor selection, software support and other considerations have played a role in the process of steering the growth of the RACF Linux Farm, and how they help shape our future hardware purchase decisions. As well as a detailed description of the challenges encountered and of the solutions used in managing and configuring a large, heterogenous Linux Farm (2700+ CPU's) in the midst of an ongoing transition from being a generally local resource to a global, Grid-aware resource within a larger, distributed computing environment is provided.
Speaker: Tomasz WLODEK (BNL)
• 9
Lattice QCD Clusters at Fermilab
As part of the DOE SciDAC "National Infrastructure for Lattice Gauge Computing" project, Fermilab builds and operates production clusters for lattice QCD simulations. We currently operate three clusters: a 128-node dual Xeon Myrinet cluster, a 128-node Pentium 4E Myrinet cluster, and a 32-node dual Xeon Infiniband cluster. We will discuss the operation of these systems and examine their performance in detail. We will describe the uniform user runtime environment emerging from the SciDAC collaboration. The design of lattice QCD clusters requires careful attention towards balancing memory bandwidth, floating point throughput, and network performance. We will discuss our investigations of various commodity processors, including Pentium 4E, Xeon, Itanium2, Opteron, and PPC970, in terms of their suitability for building balanced QCD clusters. We will also discuss our early experiences with the emerging Infiniband and PCI Express architectures. Finally, we will examine historical trends in price to performance ratios of lattice QCD clusters, and we will present our predictions and plans for future clusters.
Speaker: Don Petravick
• 10
ScotGrid: A prototype Tier 2 centre
ScotGrid is a prototype regional computing centre formed as a collaboration between the universities of Durham, Edinburgh and Glasgow as part of the UK's national particle physics grid, GridPP. We outline the resources available at the three core sites and our optimisation efforts for our user communities. We discuss the work which has been conducted in extending the centre to embrace new projects both from particle physics and new user communities and explain our methodology for doing this.
Speaker: S. Thorn
• 11
A Regional Analysis Center at the University of Florida
The High Energy Physics Group at the University of Florida is involved in a variety of projects ranging from High Energy Experiments at hadron and electron positron colliders to cutting edge computer science experiments focused on grid computing. In support of these activities members of the Florida group have developed and deployed a local computational facility which consists of several service nodes, computational clusters and disk storage services. The resources contribute collectively or individually to a variety of production and development activities such as the UFlorida Tier2 center for the CMS experiment at the Large Hadron Collider (LHC), Monte Carlo production for the CDF experiment at Fermi Lab, the CLEO experiment, and research on grid computing for the GriPhyN and iVDGL projects. The entire collection of servers, clusters and storage services is managed as a single facility using the ROCKS cluster management system. Managing the facility as a single centrally managed system enhances our ability to relocate and reconfigure the resources as necessary in support of both research and production activities. In this paper we describe the architecture deployed, including details on our local implementation of the ROCKS systems, how this simplifies the maintenance and administration of the facility and finally the advantages and disadvantages of using such a scheme to manage a modest size facility.
Speaker: J. Rodriguez (UNIVERSITY OF FLORIDA)
• 12
CHOS, a method for concurrently supporting multiple operating system
Supporting multiple large collaborations on shared compute farms has typically resulted in divergent requirements from the users on the configuration of these farms. As the frameworks used by these collaborations are adapted to use Grids, this issue will likely have a significant impact on the effectiveness of Grids. To address these issues, a method was developed at Lawrence Berkeley National Lab and is being used in production on the PDSF cluster. This method, termed CHOS, uses a combination of a Linux kernel module, the change root system call, and several utilities to provide access to multiple Linux distributions and versions concurrently on a single system. This method will be presented, along with an explanation on how it is integrated into the login process, grid services, and batch scheduler systems. We will also describe how a distribution is installed and configured to run in this environment and explore some common problems that arise. Finally, we will relate our experience in deploying this framework on a production cluster used by several high energy and nuclear physics collaborations.
Speaker: S. Canon (NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER)
• 4:00 PM
Coffee break
• 13
Network Information and Management Infrastructure Project
Management of large site network such as FNAL LAN presents many technical and organizational challenges. This highly dynamic network consists of around 10 thousand network nodes. The nature of the activities FNAL is involved in and its computing policy require that the network remains as open as reasonably possible both in terms of connectivity to the outside networks and in with respect to procedural simplicity of joining the network by temporary network participants such as visitors notebook computers. The goal of the Network Information and Management Infrastructure project at FNAL is to build software infrastructure which would help network management and computer security teams organize monitoring and management of the network, simplify communication between these entities and users, integrate network management into FNAL computer center management infrastructure. Primary authors: Phil DeMar (FNAL), Igor Mandrichenko (FNAL), Don Petravick(FNAL), Dane Skow (FNAL)
Speaker: P. DeMar (FNAL)
• 14
Implementation of a reliable and expandable on-line storage for compute clusters
The HEP experiments that use the regional center GridKa will handle large amounts of data. Traditional access methods via local disks or large network storage servers show limitations in size, throughput or data management flexibility. High speed interconnects like Fibre Channel, iSCSI or Infiniband as well as parallel file systems are becoming increasingly important in large cluster installations to offer the scalable size and throughput needed for PetaByte storage. At the same time the reliable and proven NFS protocol allows local area storage access via traditional Ethernet very cost effectively. The cluster at GridKa uses the General Parallel File System (GPFS) on a 20 node file server farm that connects to over 1000 FC disks via a Storage Area Network. The 130 TB on-line storage is distributed to the 390 node cluster via NFS. A load balancing system ensures an even load distribution and additionally allows for on-line file server exchange. Discussed are the components of the storage area network, specific Linux tools, and the construction and optimisation of the cluster file system along with the RAID groups. A high availability is obtained and measurements prove high throughput under different conditions. The use of the file system administration and management possibilities is presented as is the implementation and effectiveness of the load balancing system.
Speaker: J. VanWezel (FORSCHUNGZENTRUM KARLSRUHE)
• 15
Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing
Gfarm v2 is designed for facilitating reliable file sharing and high-performance distributed and parallel data computing in a Grid across administrative domains by providing a Grid file system. A Grid file system is a virtual file system that federates multiple file systems. It is possible to share files or data by mounting the virtual file system. This paper discusses the design and implementation of secure, robust, scalable and high-performance Grid file system. The most time-consuming, but also the most typical, task in data computing such as high energy physics, astronomy, space exploration, human genome analysis, is to process a set of files in the same way. Such a process can be typically performed independently on every file in parallel, or at least have good locality. Gfarm v2 supports high-performance distributed and parallel computing for such a process by introducing a "Gfarm file", a new "file-affinity" process scheduling based on file locations, and new parallel file access semantics. An arbitrary group of files possibly dispersed across administrative domains can be managed as a single Gfarm file. Each member file will be accessed in parallel in a new file view called "local file view" by a parallel process possibly allocated by file-affinity scheduling based on replica locations of the member files. File-affinity scheduling and new file view enable the owner computes'' strategy, or move the computation to data'' approach for parallel and distributed data computing of member files of a Gfarm file in a single system image.
Speaker: O. Tatebe (GRID TECHNOLOGY RESEARCH CENTER, AIST)
• 16
Performance analysis of Cluster File System on Linux
With the development of Linux and improvement of PC's performance, PC cluster used as high performance computing system is becoming much popular. The performance of I/O subsystem and cluster file system is critical to a high performance computing system. In this work the basic characteristics of cluster file systems and their performance are reviewed. The performance of four distributed cluster file systems, AFS, NFS, PVFS and CASTOR, were measured. The measurements were carried out on CERN version RedHat 7.3.3 Linux using standard I/O performance benchmarks. Measurements show that for single-server single client configuration, NFS, CASTOR and PVFS have better performance and write rate slightly increases while the record length becomes larger. CASTOR has the best throughput when the number of write processes increases. PVFS and CASTOR are tested on multi-server and multi-client system. The two file systems nicely distribute data I/O to all servers. CASTOR RFIO protocol shows the best utilization of network bandwidth and optimized to large data size files. CASTOR also has the better scalability as a cluster file system. Based on the test some methods are proposed to improve the performance of cluster file system.
Speaker: Y. CHENG (COMPUTING CENTER,INSTITUTE OF HIGH ENERGY PHYSICS,CHINESE ACADEMY OF SCIENCES)
• 17
Chimera - a new, fast, extensible and Grid enabled namespace service
After successful implementation and deployment of the dCache system over the last years, one of the additional required services, the namespace service, is faced additional and completely new requirements. Most of these are caused by scaling the system, the integration with Grid services and the need for redundant (high availability) configurations. The existing system, having only an NFSv2 access path, is easy to understand and well accepted by the users. This single 'access path' limits data management task to make use of classical tools like 'find', 'ls' and others. This is intuitiv for most users, but failed while dealing with millions of entries (files) and more sophisticated organizational schemes (metadata). The new system should support a native programmable interface (deep coupled, but fast), the 'classical' NFS path (now version 3 at least), a dCache native access and the SQL path allowing any type of metadata to be used in complex queries. Extensions with other 'access paths' will be possible. Based on the experience with the current system we highlight on the following requirements: - large file support (64 Bit) + large number of files (> 10^8) - fast - Platform independents (runtime + persistent objects) - Grid name service integration - custom dCache integration - redundant, high available runtime configurations (concurrent backup etc.) - user usable metadata (store and query) - ACL support - pluggable authentication (e.g. GSSAPI) - external processes can register for namespace events (e.g. removal/creation of files The presentation will show a detailed analysis of the requirements, the choosen design and selection of existing components. The current schedule should allow to show the first prototype results.
Speaker: T. Mkrtchyan (DESY)
• Core Software Brunig

### Brunig

#### Interlaken, Switzerland

• 18
A Dynamically Reconfigurable Data Stream Processing System
The paper describes a component-based framework for data stream processing that allows for configuration, tailoring, and run-time system reconfiguration. The system’s architecture is based on a pipes and filters pattern, where data is passed through routes between components. Components process data and add, substitute, and/or remove named data items from a data stream. They can also manipulate data streams by buffering data, compressing/decompressing individual streams, and combining, splitting, or synchronizing multiple data streams. Configurable general- purpose filters for manipulating streams, visualizing data, persisting data, and reading data from various standard data sources are supplemented with many application specific filters, such as DSP, scripting, or instrumentation-specific components. A network of pipes and filters can be dynamically reconfigured at run- time, in response to a preplanned sequence of processing steps, operator intervention, or a change in one or more data streams. Four distinctive methods supporting reconfiguration are provided by the framework: modification of data routes, management of components’ activity states, triggering processing based on the content of the data, or the use of source addressing in components. The framework can be used to build static data stream processing applications such as monitoring or data acquisition systems as well as self-adjusting systems that would adapt their processing algorithm, presentation layer, or data persistency layer in response to changes in input data streams.
Speaker: J. Nogiec (FERMI NATIONAL ACCELERATOR LABORATORY)
• 19
The SEAL Component Model
This paper describes the component model that has been developed in the context of the LCG/SEAL project. This component model is an attempt to handle the increasing complexity in the current data processing applications of LHC experiments. In addition, it should facilitate the software re-use by the integration of software components from LCG and non-LCG into the experiment's applications. The component model provides the basic mechanisms and base classes that facilitate the decomposition of the whole C++ object-oriented application into a number of run-time pluggable software modules with well defined generic behavior, inter-component interaction protocols, run-time configuration and user customization. This new development is based on the ideas and practical experiences of the various software frameworks in use by the different LHC experiments for several years. The design and implementation choices will be described and the practical experiences and difficulties in adopting this model to existing experiment software systems will be outlined.
Speaker: R. Chytracek (CERN)
• 20
The SEAL C++ Reflection System
The C++ programming language has very limited capabilities for reflection information about its objects. In this paper a new reflection system will be presented, which allows complete introspection of C++ objects and has been developed in the context of the CERN/LCG/SEAL project in collaboration with the ROOT project. The reflection system consists of two different parts. The first part is a code generator that produces automatically reflection information from existing C++ classes. This generation of the reflection information is done in a non intrusive way, which means that the original C++ classes definition do not need to be changed or instrumented. The second part of the reflection system is able to load/build this information in memory and provides an API to the user. The user can query reflection information from any C++ class and also interact generically with the objects, like invocation of functions, setting and getting data members or constructing and deleting objects. When designing the different packages it was taken care of having minimal dependencies on external software and a possibility to port the software to different platforms/compilers. A quick overview of the current implementation in use by the LCG SEAL and POOL projects will be given. A more detailed description of the new model, which aims to reflect the complete C++ language and to be a common reflection system used also by the ROOT framework, will be given.
Speaker: S. Roiser (CERN)
• 21
Reflection-Based Python-C++ Bindings
Python is a flexible, powerful, high-level language with excellent interactive and introspective capabilities and a very clean syntax. As such it can be a very effective tool for driving physics analysis. Python is designed to be extensible in low-level C-like languages, and its use as a scientific steering language has become quite widespread. To this end, existing and custom-written C or C++ libraries are bound to the Python environment as so-called extension modules. A number of tools for easing the process of creating such bindings exist, such as SWIG or Boost.Python. Yet, the the process still requires a considerable amount of effort and expertise. The C++ language has little built-in introspective capabilities, but tools such as LCGDict and CINT add this by providing so-called dictionaries: libraries that contain information about the names, entry points, argument types, etc. of other libraries. The reflection information from these dictionaries can be used for the creation of bindings and so the process can be fully automated, as dictionaries are already provided for many end-user libraries for other purposes, such as object persistency. PyLCGDict is a Python extension module that uses LCG dictionaries, as PyROOT uses CINT reflection information, to allow Python users to access C++ libraries with essentially no preparation on the users' behalf. In addition, and in a similar way, PyROOT gives ROOT users access to Python libraries.
Speaker: W. LAVRIJSEN (LBNL)
• 22
AIDA, JAIDA and AIDAJNI: Data Analysis using interfaces
AIDA, Abstract Interfaces for Data Analysis, is a set of abstract interfaces for data analysis components: Histograms, Ntuples, Functions, Fitter, Plotter and other typical analysis categories. The interfaces are currently defined in Java, C++ and Python and implementations exist in the form of libraries and tools using C++ (Anaphe/Lizard, OpenScientist), Java (Java Analysis Studio) and Python (PAIDA). JAIDA is the full implementation of AIDA in Java. It is used internally by JAS3 as its analysis core but it can also be used independently for either batch or interactive processing, or for web applications to access data, make plots and simple data analysis through a browser. Some of the JAIDA features are the ability to open AIDA, ROOT and PAW files and the support of an extensible set of fit methods (chi-square, least squares, binned/unbinned likelihood, etc) to be matched with an extensible set of optimizers including Minuit and Uncmin. AIDAJNI is glue code between C++ and Java that allows any C++ code to access any Java implementation of the AIDA interfaces. For example AIDAJNI is used with Geant4 to access the JAIDA implementation of AIDA. This paper gives an update on the AIDA 3.2.1 interfaces and its corresponding JAIDA implementation. Examples will be provided on how to use JAIDA within JAS3, as a standalone library and from C++ using AIDAJNI. References: http://aida.freehep.org/ http://java.freehep.org/jaida http://java.freehep.org/aidajni http://jas.freehep.org/jas3
Speaker: Victor SERBO (AIDA)
• 23
Go4 analysis design
The GSI online-offline analysis system Go4 is a ROOT based framework for medium energy ion- and nuclear physics experiments. Its main features are a multithreaded online mode with a non-blocking Qt GUI, and abstract user interface classes to set up the analysis process itself which is organised as a list of subsequent analysis steps. Each step has its own event objects and a processor instance. It can handle its event i/o independently. It can be set up by macros or by generic a GUI. With respect to the more complex experiments planned at GSI, a configurable network of steps is required. Multiple IO channels per step and multiple references to steps can be set up by macros or via generic GUI. The required mechanisms are provided by an upgrade of the Go4 analysis step manager using the new ROOT TTasks. Support for IO configuration and references across the task tree is provided.
Speaker: H. Essel (GSI)
• 4:00 PM
Coffee Break
• 24
OpenScientist. Status of the project.
We want to present the status of this project. After quickly remembering the basic choices around GUI, visualization and scriptingm we would like to develop what had been done in order to have an AIDA-3.2.1 complient systen, to visualize Geant4 data (G4Lab module), to visualize ROOT data (Mangrove module), to have an hippodraw module and what had been done in order to run on MacOSX by using the native NextStep (Cocoa) environment.
Speaker: G B. Barrand (CNRS / IN2P3 / LAL)
• 25
The Athena Control Framework in Production, New Developments and Lessons Learned
Athena is the Atlas Control Framework, based on the common Gaudi architecture, originally developed by LHCb. In 2004 two major production efforts, the Data Challenge 2 and the Combined Test-beam reconstruction and analysis were structured as Athena applications. To support the production work we have added new features to both Athena and Gaudi: an "Interval of Validity" service to manage time-varying conditions and detector data; a History service, to manage the provenance information of each event data object; and a toolkit to simulate and analyze the overlay of multiple collisions during the detector sensitive time (pile-up). To support the analysis of simulated and test-beam data in athena we have introduced a python-based scripting interface, based on the CERN LCG tools PyLCGDict, PyRoot and PyBus. The scripting interface allows to fully configure any athena component, interactively browse and modify this configuration, as well as examine the content of any data object in the event or detector store.
Speaker: P. Calafiura (LBNL)
• 26
The AliRoot framework, status and perspectives
The ALICE collaboration at the LHC is developing since 1998 an OO offline framework, written entirely in C++. In 2001 a GRID system (AliEn - ALICE Environment) has been added and successfully integrated with ROOT and the offline. The resulting combination allows ALICE to do most of the design of the detector and test the validity of its computing model by performing large scale Data Challenges, using OO technology in a distributed framework. The early migration of all ALICE users to C++ and the adoption of advanced software development techniques are two of the strong points of the ALICE offline strategy. The offline framework is heavily based on virtual interfaces, which allows the use of different generators and even different Monte- Carlo transport codes with no change in the framework or the scoring, reconstruction and analysis code. This talk presents a review of the development path, current status and future perspectives of the ALICE Offline environment.
Speaker: F. Carminati (CERN)
• 27
Composite Framework for CMS Applications
We present a composite framework which exploits the advantages of the CMS data model and uses a novel approach for building CMS simulation, reconstruction, visualisation and future analysis applications. The framework exploits LCG SEAL and CMS COBRA plug-ins and extends the COBRA framework to pass communications between the GUI and event threads, using SEAL callbacks to navigate through the metadata and event data interactively in a distributed environment. We give examples of current applications based on this framework, including CMS test-beams, geometry description debugging, GEANT4 simulation, event reconstruction, and the verification of reconstruction and higher level trigger algorithms.
Speaker: I. Osborne (Northeastern University, Boston, USA)
• 28
IceTray: a Software Framework for IceCube
IceCube is a cubic kilometer-scale neutrino telescope under construction at the South Pole. The minimalistic nature of the instrument poses several challenges for the software framework. Events occur at random times, and frequently overlap, requiring some modifications of the standard event-based processing paradigm. Computational requirements related to modeling the detector medium necessitate the ability for software components to defer processing events. With minimal information from the detector, events must be reconstructed many times with different hypotheses or methods, and the results compared. The appropriate series of software components required to process an event varies considerably, and can be determined only at run time. Finally, reconstruction algorithms are constantly evolving, with development taking place throughout the collaboration, so it is essential that conversion of private analysis code to online production software be simple and, given the inaccessibility of the experimental site, robust. The IceCube collaboration has developed the IceTray framework, which meets these needs by blending aspects of push- and pull-based architectures to produce a highly modular system which nevertheless allows each software component a significant degree of control over the execution flow.
Speaker: T. DeYoung (UNIVERSITY OF MARYLAND)
• 29
The Offline Framework of the Pierre Auger Observatory
The Pierre Auger Observatory is designed to unveil the nature and the origin of the highest energy cosmic rays. Two sites, one currently under construction in Argentina, and another pending in the Northern hemisphere, will observe extensive air showers using a hybrid detector comprising a ground array of 1600 water Cerenkov tanks overlooked by four atmospheric fluorescence detectors. Though the computing demands of the experiment are less severe than those of traditional high energy physics experiments in terms of data volume and detector complexity, the large geographically dispersed collaboration and the heterogeneous set of simulation and reconstruction requirements confronts the offline software with some special challenges. We have designed and implemented a framework to allow collaborators to contribute algorithms and sequencing instructions to build up the variety of applications they require. The framework includes machinery to manage these user codes, to organize the abundance of user-contributed configuration files, to facilitate multi-format file handling, and to provide access to event and time-dependent detector information which can reside in various data sources. A number of utilities are also provided, including a novel geometry package which allows manipulation of abstract geometrical objects independent of coordinate system choice. The framework is implemented in C++, follows an object oriented paradigm, and takes advantage of some of the more widespread tools that the open source community offers, while keeping the user-side simple enough for C++ non-experts to learn in a reasonable time. The distribution system includes unit and acceptance testing in order to support rapid development of both the core framework and contributed user code. Great attention has been paid to the ease of installation.
Speaker: L. Nellen (I. DE CIENCIAS NUCLEARES, UNAM)
• Distributed Computing Services Theatersaal

### Theatersaal

#### Interlaken, Switzerland

Convener: Conrad Steenberg (CalTech)
• 30
Don Quijote - Data Management for the ATLAS Automatic Production System
As part of the ATLAS Data Challenges 2 (DC2), an automatic production system was introduced and with it a new data management component. The data management tools used for previous Data Challenges were built as separate components from the existing Grid middleware. These tools relied on a database of its own which acted as a replica catalog. With the extensive use of Grid technology expected for the most part of the DC2 production, no longer can a data management tool be independent of the Grid middleware. Each Grid relies on its own replica catalog and not on an ATLAS specific tool. ATLAS DC will attempt to use uniformly the resources provided by three Grids: NorduGrid, US Grid3 and LCG-2. Lecagy system will be supported as well. The proposed solution was to build a data management proxy system which consists of a common high-level interface, whose implementation depends on each Grid's replica and metadata catalog as well as the storage backend (mainly "classic" GridFTP servers and SRM). Don Quijote provides management of replicas in a services oriented architecture, across the several "flavours" of Grid middleware used by ATLAS DC. With a higher-level interface common across several Grids (and legacy systems) a user (such as the new automatic production system) can seamlessly manage replicas independently of their hosting environment. Given the services-based architecture, a lightweight command line tool is capable of interacting uniformly within each Grid and between Grids (e.g. moving files from LCG-2 to US Grid 3 while maintaining attributes such as the Global Unique Identifier).
Speaker: M. Branco (CERN)
• 31
Managed Data Storage and Data Access Services for Data Grids
The LHC needs to achieve reliable high performance access to vastly distributed storage resources across the network. USCMS has worked with Fermilab-CD and DESY-IT on a storage service that was deployed at several sites. It provides Grid access to heterogeneous mass storage systems and synchronization between them. It increases resiliency by insulating clients from storage and network failures, and facilitates file sharing and network traffic shaping. This new storage service is implemented as a Grid Storage Element (SE). It consists of dCache as the core storage system and an implementation of the Storage Resource Manager (SRM), that together allow both local and Grid based access to the mass storage facilities. It provides advanced functionalities for managing, accessing and distributing collaboration data. USCMS is using this system both as Disk Resource Manager at Tier-1 and Tier-2 sites, and as Hierarchical Resource Manager with Enstore as tape back-end at the Fermilab Tier-1. It is used for providing shared managed disk pools at sites and for streaming data between the CERN Tier-0, the Fermilab Tier-1 and U.S. Tier-2 centers Applications can reserve space for a time period, ensuring space availability when the application runs. Worker nodes without WAN connection can trigger data replication to the SE and then access data via the LAN. Moving the SE functionality off the worker nodes reduces load and improves reliability of the compute farm elements significantly. We describe architecture, components, and experience gained in CMS production and the DC04 Data Challenge.
Speaker: M. Ernst (DESY)
• 32
FroNtier: High Performance Database Access Using Standard Web Components in a Scalable Multi-tier Architecture
A high performance system has been assembled using standard web components to deliver database information to a large number (thousands?) of broadly distributed clients. The CDF Experiment at Fermilab is building processing centers around the world imposing a high demand load on their database repository. For delivering read-only data, such as calibrations, trigger information and run conditions data, we have abstracted the interface that clients use to retrieve database objects. A middle tier is deployed that translates client requests into database specific queries and returns the data to the client as HTTP datagrams. The database connection management, request translation, and data encoding are accomplished in servlets running under Tomcat. Squid Proxy caching layers are deployed near the Tomcat servers as well as close to the clients to significantly reduce the load on the database and provide a scalable deployment model. This system is highly scalable, readily deployable, and has a very low administrative overhead for data delivery to a large, distributed audience. Details of how the system is built and used will be presented including its architecture, design, interfaces, administration, and performance measurements.
Speaker: L. Lueking (FERMILAB)
• 33
On Distributed Database Deployment for the LHC Experiments
While there are differences among the LHC experiments in their views of the role of databases and their deployment, there is relatively widespread agreement on a number of principles: 1. Physics codes will need access to database-resident data. The need for database access is not confined to middleware and services: physics-related data will reside in databases. 2. Database-resident data will be distributed, and replicated. A single, centralized database, at CERN or elsewhere, does not suffice. 3. Distributed deployment infrastructure should be open to the use of different technologies as appropriate at the various Tier N sites. A variety of approaches to distributed deployment have been explored in the context of individual experiments; indeed, a degree of distributed deployment has been integral to the computing model tests of some experiments (cf. ATLAS) in their 2004 data challenges. Approaches to replication have also been investigated in the context of specific databases, often with vendor-specific replication tools (e.g., Oracle Replication via Streams for the LCG File Catalog and the Oracle instantiation of the LCG conditions database; MySQL tools for replication in the MySQL instantiation of the LCG conditions database). XML exchange mechanisms have also been discussed. Distributed database deployment, though, is more than a middleware and applications software issue—a successful strategy must involve those who will be responsible for systems deployment and administration at LHC grid sites. We describe the status of ongoing work in this area, and discuss the prospects for components of a common approach to distributed deployment in the time frame of the 2005 LHC data challenges.
Speaker: Dirk Duellmann
• 34
Experiences with Data Indexing services supported by the NorduGrid middleware
The NorduGrid middleware, ARC, has integrated support for querying and registering to Data Indexing services such as the Globus Replica Catalog and Globus Replica Location Server. This support allows one to use these Data Indexing services for for example brokering during job-submission, automatic registration of files and many other things. This integrated support is complemented by a set of command-line tools for registering to and querying these Data Indexing services. In this talk we will describe experiences with these Data Indexing services both from a daily work point of view and in production environments such as the Atlas Data-Challenges 1 and 2. We will describe the advantages of such Data Indexing services as well as their shortcomings. Finally we will present a proposal for an extended Data Indexing service which should deal with the shortcomings described. The development of such a Data Indexing service is being planned at the moment.
Speaker: O. Smirnova (Lund University, Sweden)
• 35
Evolution of LCG-2 Data Management
LCG-2 is the collective name for the set of middleware released for use on the LHC Computing Grid in December 2003. This middleware, based on LCG-1, had already several improvements in the Data Management area. These included the introduction of the Grid File Access Library(GFAL), a POSIX-like I/O Interface, along with MSS integration via the Storage Resource Manager(SRM)interface. LCG-2 was used in the Spring 2004 data challenges by all four LHC experiments. This produced the first useful feedback on scalability and functionality problems in the middleware, especially with regards to data management. One of the key goals for the Data Challenges in 2004 is to show that the LCG can handle the data for the LHC, even if the computing model is still quite simple. In light of the feedback from the data challenges, and in conjunction with the LHC experiments, a strategy for the improvements required in the data management area was developed. The aim of these improvements was to allow both easier interaction and better performance from the experiment frameworks and other middleware such as POOL. In this talk, we will first introduce the design of the current data management solution in LCG-2. We will cover the problems and issues highlighted by the data challenges, as well as the strategy for the required improvements to allow LCG-2 to handle effectively data management at LCG volumes. In particular, we will highlight the new APIs provided, and the integration of GFAL and the EDG Replica Manager functionality with ROOT.
Speaker: J-P. Baud (CERN)
• 4:00 PM
Coffee break
• 36
The Next Generation Root File Server
As the BaBar experiment shifted its computing model to a ROOT-based framework, we undertook the development of a high-performance file server as the basis for a fault-tolerant storage environment whose ultimate goal was to minimize job failures due to server failures. Capitalizing on our five years of experience with extending Objectivity's Advanced Multithreaded Server (AMS), elements were added to remove as many obstacles to server performance and fault-tolerance as possible. The final outcome was xrootd, upwardly and downwardly compatible with the current file server, rootd. This paper describes the essential protocol elements that make high performance and fault-tolerance possible; including asynchronous parallel requests, stream multiplexing, data pre-fetch, automatic data segmenting, and the framework for a structured peer-to-peer storage model that allows massive server scaling and client recovery from multiple failures. The internal architecture of the server is also described to explain how high performance was maintained and full compatibility was achieved. Now in production at Stanford Linear Accelerator Center, Rutherford Appleton Laboratory (RAL), INFN, and IN2P3; xrootd has shown that our design provides what we set out to achieve. The xrootd server is now part of the standard ROOT distribution so that other experiments can benefit from this data serving model within a standard HEP event analysis framework.
Speaker: A. Hanushevsky (SLAC)
• 37
Production mode Data-Replication framework in STAR using the HRM Grid
The STAR experiment utilizes two major computing facilities for its data processing needs - the RCF at Brookhaven and the PDSF at LBNL/NERSC. The sharing of data between these facilities utilizes data grid services for file replication, and the deployment of these services was accomplished in conjunction with the Particle Physics Data Grid (PPDG). For STAR's 2004 run it will be necessary to replicate ~100 TB. The file replication is based on Hierarchical Resource Managers (HRMs) along with Globus tools for security (GSI) and data transport (GridFTP). HRMs are grid middleware developed by the Scientific Data Management group at LBNL, and STAR file replication consists of an HRM interfaced to HPSS at each site with GridFTP transfers between the HRMs. Each site also has its own installation of the STAR file and metadata catalog, which is implemented in MySQL. Queries to the catalogs are used to generate file transfer requests. Single requests typically consist of many thousands of files with a volume of hundreds of GBs. The HRMs implement a plugin to a Replica Registration Service (or RRS) which is utilized for automatic registration of new files as they are successfully transferred across sites. This allows STAR users immediate use of the distributed data. Data transfer statistics and system architecture will be presented.
Speaker: E. Hjort (LAWRENCE BERKELEY LABORATORY)
• 38
Storage Resource Managers at Brookhaven
Providing Grid applications with effective access to large volumes of data residing on a multitude of storage systems with very different characteristics prompted the introduction of storage resource managers (SRM). Their purpose is to provide consistent and efficient wide-area access to storage resources unconstrained by their particular implementation (tape, large disk arrays, dispersed small disks). To assess their viability in the context of the US Atlas Tier 1 facility at Brookhaven, two implementations of SRM were tested: dCache (FNAL/DESY joint project) and HRM/DRM (NERSC Berkeley). Both systems included a connection to the local HPSS mass data store providing Grid access to the main tape repository. In addition, dCache offered storage aggregation of dispersed small disks (local drives on computing farm nodes). An overview of our experience with both systems will be presented, including details about configurations, performance, inter-site transfers, interoperability and limitations.
Speaker: Ofer RIND
• 39
File-Metadata Management System for the LHCb Experiment
The LHCb experiment needs to store all the information about the datasets and their processing history of recorded data resulting from particle collisions at the LHC collider at CERN as well as of simulated data. To achieve this functionality a design based on data warehousing techniques was chosen, where several user-services can be implemented and optimized individually without losing functionality nor performance. This approach results in an experiment- independent and flexible system. It allows fast access to the catalogue of available data, to detailed history information and to the catalogue of data replicas. Queries can be made based on these three sets of information. A flexible underlying database schema allows the implementation and evolution of these services without the need to change the basic database schema. The consequent implementation of interfaces based on XML-RPC allows to access and to modify the stored information using a well defined encapsulating API.
Speaker: C. CIOFFI (Oxford University)
• 40
Data Management in EGEE
Data management is one of the cornerstones in the distributed production computing environment that the EGEE project aims to provide for a European e-Science infrastructure. We have designed a set of services based on previous experience in other Grid projects, trying to address the requirements of our user communities. In this paper we summarize the most fundamental requirements and constraints as well as the security, reliability, stability and robustness considerations that have driven the architecture and the particular choice for service decomposition in our service-oriented architecture. We discuss the interaction of our services with each other, their deployment models and how failures are being managed. The three service groups for data management services are the Storage Element, the Data Scheduling and the Catalog services. The Storage Element exposes interfaces to Grid managed storage, with the appropriate semantics in the Grid distributed environment. The Catalog services contain all the metadata related to data: The File Catalog maintains a file-system-like view of the files in the Grid in a logical user namespace, the Replica Catalog keeps track of identical copies of the files distributed in different Storage Elements and the Metadata Catalog keeps application specific information about the files. The Data Scheduling services take care of controlled data transfer and keep the information in the Catalog services consistent with what is actually available in the Storage Elements, acting as the binding between the two. We conclude with first experiences and examples of use-cases for High Energy Physics applications.
Speaker: K. Nienartowicz (CERN)
• 41
SAMGrid Integration of SRMs
SAMGrid is the shared data handling framework of the two large Fermilab Run II collider experiments: DZero and CDF. In production since 1999 at D0, and since mid-2004 at CDF, the SAMGrid framework has been adapted over time to accommodate a variety of storage solutions and configurations, as well as the differing data processing models of these two experiments. This has been very successful for both experiments. Backed by primary data repositories of approximately 1 PB in size for each experiment, the SAMGrid framework delivers over 100 TB/day to DZero and CDF analyses at Fermilab and around the world. Each of the storage systems used with SAMGrid, however, has distinct interfaces, protocols, and behaviors. This led to different levels of integration of the various storage devices into the framework, which complicated the exploitation of their functionality and limited in some cases SAMGrid expansion across the experiments' Grid. In an effort to simplify the SAMGrid storage interfaces, SAMGrid has adopted the Storage Resource Manager (SRM) concept as the universal interface to all storage devices. This has simplified the SAMGrid framework, expecially the implementation of storage device interactions. It prepares the SAMGrid framework for future storage solutions equipped with SRM interfaces, without the need for long and risky software integration projects. In principle, any storage device with an SRM interface can be used now with the SAMGrid framework. The integration of SRMs is an important further step towards evolving the SAMGrid framework into a co-operating collection of distinct, modular grid-oriented services. To date, SRMs for Enstore, dCache, local caches, and permanent disk locations are tested and in production use. This report outlines how the SRMs were integrated into the existing SAMGrid framework without disturbing on-going operations, and describes our operational experience with SAMGrid and SRMs in the field.
Speaker: R. Kennedy (FERMI NATIONAL ACCELERATOR LABORATORY)
• Distributed Computing Systems and Experiences Ballsaal

### Ballsaal

#### Interlaken, Switzerland

• 42
The evolution of the distributed Event Reconstruction Control System in BaBar
The Event Reconstruction Control System of the BaBar experiment was redesigned in 2002, to satisfy the following major requirements: flexibility and scalability. Because of its very nature, this system is continuously maintained to implement the changing policies, typical of a complex, distributed production enviromnent. In 2003, a major revolution in the BaBar computing model, the Computing Model 2, brought a particularly vast set of new requirements in various respects, many of which had to be discovered during the early production effort, and promptly dealt with. Particularly, the reconstruction pipeline was expanded with the addition of a third stage. The first fast calibration stage was kept running at SLAC, USA, while the two stages doing most of the computation were moved to the ~400 CPU reconstruction facility of INFN, Italy. In this paper, we summarize the extent and nature of the evolution of the Control System, and we demonstrate how the modular, well engineered architecture of the system allowed to efficiently adapt and expand it, while making great reuse of existing code, leaving virtually intact the core layer, and exploiting the "engineering for flexibility" philosophy.
Speaker: A. Ceseracciu (SLAC / INFN PADOVA)
• 43
Production Management Software for the CMS Data Challenge
One of the goals of CMS Data Challenge in March-April 2004 (DC04) was to run reconstruction for sustained period at 25 Hz input rate with distribution of the produced data to CMS T1 centers for further analysis. The reconstruction was run at the T0 using CMS production software, of which the main components are RefDB (CMS Monte Carlo 'Reference Database' with Web interface) and McRunjob (a framework for creation and submission of large numbers of Monte Carlo jobs ). This paper presents an overview of CMS production cycle , describing production tools, covering data processing, bookkeeping and publishing issues, in the context of their use during the T0 reconstruction part of DC04.
Speaker: J. Andreeva (UC Riverside)
• 44
ATLAS Production System in ATLAS Data Challenge 2
In order to validate the Offline Computing Model and the complete software suite, ATLAS is running a series of Data Challenges (DC). The main goals of DC1 (July 2002 to April 2003) were the preparation and the deployment of the software required for the production of large event samples, and the production of those samples as a worldwide distributed activity. DC2 (May 2004 until October 2004) is divided into three phases: (i) Monte Carlo data are produced using GEANT4 on three different Grids, LCG, Grid3 and NorduGrid; (ii) simulate the first pass reconstruction of data expected in 2007, also called Tier0 exercise, using the MC sample; and (iii) test the Distributed Analysis model. A new automated data production system has been developed for DC2. The major design objectives are minimal human involvement, maximal robustness, and interoperability with several grid flavors and legacy systems. A central component of the production system is the production database holding information about all jobs. Multiple instances of a 'supervisor' component pick up unprocessed jobs from this database, distribute them to 'executor' processes, and verify them after execution. The 'executor' components interface to a particular grid or legacy flavour. The job distribution model is a combination of push and pull. A data management system keeps track of all produced data and allows for file transfers. The basic elements of the production system are described. Experience with the use of the system in world-wide DC2 production of ten million events will be presented. We also present how the three Grid flavors are operated and monitored. Finally we discuss the first attempts on using the Distributed Analysis system.
Speaker: L. GOOSSENS (CERN)
• 45
A GRID approach for Gravitational Waves Signal Analysis with a Multi-Standard Farm Prototype
The standard procedures for the extraction of gravitational wave signals coming from coalescing binaries provided by the output signal of an interferometric antenna may require computing powers generally not available in a single computing centre or laboratory. A way to overcome this problem consists in using the computing power available in different places as a single geographically distributed computing system. This solution is now effective within the GRID environment, that allows distributing the required computing effort for specific data analysis procedure among different sites according to the available computing power. Within this environment we developed a system prototype with application software for the experimental tests of a geographically distributed computing system for the analysis of gravitational wave signal from coalescing binary systems. The facility has been developed as a general purpose system that uses only standard hardware and software components, so that it can be easily upgraded and configured. In fact, it can be partially or totally configured as a GRID farm, as MOSIX farm or as MPI farm. All these three configurations may coexist since the facility can be split into configuration subsets. A full description of this farm is reported, together with the results of the performance tests and planned developments.
Speaker: S. Pardi (DIPARTIMENTO DI MATEMATICA ED APPLICAZIONI "R.CACCIOPPOLI")
• 46
The architecture of the AliEn system
AliEn (ALICE Environment) is a Grid framework developed by the Alice Collaboration and used in production for almost 3 years. From the beginning, the system was constructed using Web Services and standard network protocols and Open Source components. The main thrust of the development was on the design and implementation of an open and modular architecture. A large part of the component came from state-of-the- art modules available in the Open Source domain. Thus, in a very short time, the ALICE experiment had a prototype Grid that, while constantly evolving, has allowed large distributed simulation and reconstruction vital to the design of the experiment hardware and software to be performed with very limited manpower. This proved to be the correct path to which many Grid project and initiatives are now converging. The architecture of AliEn inspired the ARDA report and subsequently AliEn provided the foundation of components for the first EGEE prototype. This talk presents the architecture of the original AliEn system, describes its evolution. A critical review of the major technology choices, their implementation and the development process is also presented.
Speaker: P. Buncic (CERN)
• 47
A distributed, Grid-based analysis system for the MAGIC telescope
The observation of high-energetic gamma-rays with ground based air cerenkov telescopes is one of the most exciting areas in modern astro particle physics. End of the year 2003 the MAGIC telescope started operation.The low energy threshold for gamma-rays together with different background sources leads to a considerable amount of data. The analysis will be done in different institutes spread over Europe. The production of Monte Carlo events including the simulation of Cerenkov light in the atmosphere is very computing intensive and another challenge for a collaboration like MAGIC. Therefore the MAGIC telescope collaborations will take the opportunity to use Grid technology to set up a distributed computational and data intensive analysis system with nowadays available technology. The basic architecture of such a distributed, Europe wide Grid system will be presented. First implementation results will be shown. This Grid might be the starting point for a wider distributed astro particle Grid in Europe.
Speaker: H. Kornmayer (FORSCHUNGSZENTRUM KARLSRUHE (FZK))
• 4:00 PM
Coffee break
• 48
HEP Applications Experience with the European DataGrid Middleware and Testbed
The European DataGrid (EDG) project ran from 2001 to 2004, with the aim of producing middleware which could form the basis of a production Grid, and of running a testbed to demonstrate the middleware. HEP experiments (initially the four LHC experiments and subsequently BaBar and D0) were involved from the start in specifying requirements, and subsequently in evaluating the performance of the middleware, both with generic tests and through increasingly complex data challenges. A lot of experience has therefore been gained which may be valuable to future Grid projects, in particular LCG and EGEE which are using a substantial amount of the middleware developed in EDG. We report our experiences with job submission, data management and mass storage, information and monitoring systems, Virtual Organisation management and Grid operations, and compare them with some typical Use Cases defined in the context of LCG. We also describe some of the main lessons learnt from the project, in particular in relation to configuration, fault-tolerance, interoperability and scalability, as well as the software development process itself, and point out some areas where further work is needed. We also make some comments on how these issues are being addressed in LCG and EGEE.
Speaker: S. Burke (Rutherford Appleton Laboratory)
• 49
Deploying and operating LHC Computing Grid 2 (LCG2) During Data Challenges
LCG2 is a large scale production grid formed by more than 40 worldwide distributed sites. The aggregated number of CPUs exceeds 3000 several MSS systems are integrated in the system. Almost all sites form an independent administrative domain. On most of the larger sites the local computing resources have been integrated into the grid. The system has been used for large scale production by LHC experiments for several month. During the operation the software went through several versions and had to be upgraded including non backward compatible upgrades. We report on the experience gained setting up the service, integrating sites and operating it under the load of the production.
Speaker: M. Schulz (CERN)
• 50
The Open Science Grid (OSG)
The U.S.LHC Tier-1 and Tier-2 laboratories and universities are developing production Grids to support LHC applications running across a worldwide Grid computing system. Together with partners in computer science, physics grid projects and running experiments, we will build a common national production grid infrastructure which is open in its architecture, implementation and use. The OSG model builds upon the successful approach of last year’s joint Grid2003 project. The Grid3 shared infrastructure has for over eight months given significant computational resources and throughput to more than six applications, including ATLAS and CMS data challenges, SDSS, LIGO and Biology analyses and computer science demonstrators. To move towards LHC-scale data management, access and analysis capabilities, we will need to increase the scale, services, and sustainability of the current infrastructure by an order of magnitude. This requires a significant upgrade in its functionalities and technologies. The OSG roadmap is a strategy and work plan to build the U.S.LHC computing enterprise as a fully usable, sustainable and robust grid, which is part of the LHC global computing infrastructure and open to partners. The approach is to federate with other application communities in the U.S. to build a shared infrastructure open to other sciences and capable of being modified and improved to respond to needs of other applications, including CDF, D0, BaBar and RHIC experiments. We describe the application driven engineered services of the OSG, short term plans and status, and the roadmap for a consortium, its partnerships and national focus.
Speaker: R. Pordes (FERMILAB)
• 51
Use of Condor and GLOW for CMS Simulation Production
The University of Wisconsin distributed computing research groups developed a software system called Condor for high throughput computing using commodity hardware. An adaptation of this software, Condor-G, is part of Globus grid computing toolkit. However, original Condor has additional features that allows building of an enterprise level grid. Several UW departments have Condor computing pools that are integrated in such a way as to flock jobs from one pool to another as resources become available. An interdisciplinary team of UW researchers recently built a new distributed computing facility, the Grid Laboratory of Wisconsin (GLOW). In total Condor pools in the UW have about 2000 Intel CPUs (P-III and Xeon) which are available for scientific computation. By exploiting special features of Condor such as checkpointing and remote IO we have generated over 10 million fully simulated CMS events. We were able to harness about 260 CPU-days per day for a period of 2 months when we were operational late fall. We have scaled to using 500 CPUs concurrently when opportunity to exploit unused resources in laboratories on our campus. We have built a scalable job submission and tracking system called Jug using Python and mySQL which enabled us to scale to run hundreds of jobs simultaneously. Jug also ensured that the data generated is transferred to US Tier-I center at Fermilab. We have also built a portal to our resources and participated in Grid2003 project. We are currently adapting our environment for providing analysis resources. In this paper we will discuss our experience and observations regarding the use of opportunistic resources, and generalize them to wider grid computing context.
Speaker: S. Dasu (UNIVERSITY OF WISCONSIN)
• 52
Application of the SAMGrid Test-Harness for Performance Evaluation and Tuning of a Distributed Cluster Implementation of Data Handling Services
The SAMGrid team has recently refactored its test harness suite for greater flexibility and easier configuration. This makes possible more interesting applications of the test harness, for component tests, integration tests, and stress tests. We report on the architecture of the test harness and its recent application to stress tests of a new analysis cluster at Fermilab, to explore the extremes of analysis use cases and the relevant parameters for tuning in the SAMGrid station services. This reimplementation of the test harness is a python framework which usesXML for configuration and small plug-in python modules for specific test purposes. One current testing application is running on a 128-CPU analysis cluster with access to 6 TB distributed cache and also to a 2 TB centralized cache, permitting studies of different cache strategies. We have studied the service parameters which affect the performance of retrieving data from tape storage as well. The use cases studied vary from those which will require rapid file delivery with short processing time per file, to the opposite extreme of long processing time per file. We also show how the same harness can be used to run regular unit tests on a production system to aid early fault detection and diagnosis.These results are interesting for their implications with regard to Grid operations, and illustrate the type of monitoring and test facilities required to accomplish such performance tuning.
Speaker: A. Lyon (FERMI NATIONAL ACCELERATOR LABORATORY)
• 53
Job Submission & Monitoring on the PHENIX Grid*
The PHENIX collaboration records large volumes of data for each experimental run (now about 1/4 PB/year). Efficient and timely analysis of this data can benefit from a framework for distributed analysis via a growing number of remote computing facilities in the collaboration. The grid architecture has been, or is being deployed at most of these facilities. The experience being obtained in the transition to the Grid infrastructure with minimum of manpower is presented with particular emphasis on job monitoring and job submission in multi cluster environment. The integration of the existing subsystems (from Globus project, from several HEP collaborations), large application libraries, and other software tools to render the resulting architecture stable, robust, and useful for the end user is also discussed.
Speaker: A. Shevel (STATE UNIVERSITY OF NEW YORK AT STONY BROOK)
• Event Processing Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

• 54
LCG Generator
In the framework of the LCG Simulation Project, we present the Generator Services Sub-project, launched in 2003 under the oversight of the LHC Monte Carlo steering group (MC4LHC). The goal of the Generator Services Subproject is to guarantee the physics generator support for the LHC experiments. Work is divided into four work packages: Generator library; Storage, event interfaces and particle services; Public event files and event database; Validation and tuning. The current status and the future plans in the four different work packages are presented. Some emphasis is put on the Monte Carlo Generator Library (GENSER) and on the Monte Carlo Generator Database (MCDB). GENSER is the central code repository for Monte Carlo generators and generator tools. it was the first CVS repository in the LCG Simulation project and it is currently distributed in AFS. GENSER comprises release and building tools for librarian and end users. GENSER is going to gradually replace the obsolete CERN library in Monte Carlo generators support. MCDB is a public database for the configuration, book-keeping and storage of the generator level event files. The generator events often need to be prepared and documented by Monte Carlo experts. MCDB aims at facilitating the communication between Monte-Carlo experts and end-users. Its use can be optionally extended to the official event production of the LHC experiments.
Speaker: Dr P. Bartalini (CERN)
• 55
FAMOS, a FAst MOnte-Carlo Simulation for CMS
An object-oriented FAst MOnte-Carlo Simulation (FAMOS) has recently been developed for CMS to allow rapid analyses of all final states envisioned at the LHC while keeping a high degree of accuracy for the detector material description and the related particle interactions. For example, the simulation of the material effects in the tracker layers includes charged particle energy loss by ionization and multiple scattering, electron Bremsstrahlung and photon conversion. The particle showers are developed in the calorimeters with an emulation of GFLASH, finely interfaced with the calorimeter geometry (e.g., crystal positions, cracks, rear leakage, etc). As the same software framework is used for FAMOS and ORCA (the full Object-oriented Reconstruction software for CMS Analysis), the various Physics Objects (electrons, photons, muons, taus, jets, missing ET, charged particle tracks, ...) can be accessed with a similar code with both fast and full simulation, thus allowing any analysis algorithm to be transported from FAMOS to ORCA (and later, to data analysis and DST reading) or vice-versa without any additional work. Altogether, a gain in CPU time of about a hundred can be achieved with respect to the full simulation, with little loss in precision.
Speaker: Dr F. Beaudette (CERN)
• 56
Applications of the FLUKA Monte Carlo code in High Energy and Accelerator Physics
The FLUKA Monte Carlo transport code is being used for different applications in High Energy, Cosmic Ray and Accelerator Physics. Here we review some of the ongoing projects which are based on this simulation tool. In particular, as far as accelerator physics is concerned, we wish to summarize the work in progress for the LHC and the CNGS project. From the point of view of experimental activity, a part the activity going in the framework of LHC detectors, we wish to discuss as a major example the application of FLUKA to the ICARUS Liquid Argon TPC. Upgrades in cosmic ray calculations, to demonstrate the capability of FLUKA to reproduce existing experimental data, are also presented
Speaker: G. Battistoni (INFN Milano, Italy)
• 57
Update On the Status of the FLUKA Monte Carlo Transport Code
The FLUKA Monte Carlo transport code is a well-known simulation tool in High Energy Physics. FLUKA is a dynamic tool in the sense that it is being continually updated and improved by the authors. Here we review the progresses achieved in the last year on the physics models. From the point of view of hadronic physics, most of the effort is still in the field of nucleus--nucleus interactions. The currently available version of FLUKA already includes the internal capability to simulate inelastic nuclear interactions beginning with lab kinetic energies of 100 MeV/A up the the highest accessible energies by means of the DPMJET-II.5 event generator to handle the interactions for >5 GeV/A and rQMD for energies below that. The new developments concern, at high energy, the embedding of the DPMJET-III generator, which represent a major change with respect to the DPMJET-II structure. This will also allow to achieve a better consistency between the nucleus-nucleus section with the original FLUKA model for hadron-nucleus collisions. Work is also in progress to implenent a third event generator model based on the Master Boltzmann Equation approach, in order to extend the energy capability from 100 MeV/A down to the threshold for these reactions. In addition to these extended physics capabilities, structural changes to the programs input and scoring capabilities are continually being upgraded. In particular we want to mention the upgrades in the geometry packages, now capable of reaching higher levels of abstraction. Work is also proceeding to provide direct import into ROOT of the FLUKA output files for analysis and to deploy a user-friendly GUI input interface.
Speaker: L. Pinsky (UNIVERSITY OF HOUSTON)
• 58
Geant4: status and recent developments
Geant4 is relied upon in production for increasing number of HEP experiments and for applications in several other fields. Its capabilities continue to be extended, as its performance and modelling are enhanced. This presentation will give an overview of recent developments in diverse areas of the toolkit. These will include, amongst others, the optimisation for complex setups using different production thresholds, improvements in the propagation in fields, and highlights from the physics processes and event biasing. In addition it will note the physics validation effort undertaken in collaboration with a number of experiments, groups and users.
Speaker: Dr J. Apostolakis (CERN)
• 59
Physics validation of the simulation packages in a LHC-wide effort
In the framework of the LCG Simulation Physics Validation Project, we present comparison studies between the GEANT4 and FLUKA shower packages and LHC sub-detector test-beam data. Emphasis is given to the response of LHC calorimeters to electrons, photons, muons and pions. Results of "simple-benchmark" studies, where the above simulation packages are compared to data from nuclear facilities, are also shown.
Speaker: A. Ribon (CERN)
• 4:00 PM
Coffee break
• 60
Overview and new developments in Geant4 electromagnetic physics
We will summarize the recent and current activities of the Geant4 working group responsible of the standard package of electromagnetic physics. The major recent activities include an design iteration in energy loss and multiple scattering domain providing "process versus models" approach, and development of the following physics models: multiple scattering, ultra relativistic muon physics, photo- bsorbtion-ionisation model, ion ionisation, optical processes. An automatic acceptance suite of validation of physics is under development. Also we will comment on evolution of the concept of physics list.
Speaker: Prof. V. Ivantchenko (CERN, ESA)
• 61
Precision electromagnetic physics in Geant4: the atomic relaxation models
Various experimental configurations - such as, for instance, some gaseous detectors, require a high precision simulation of electromagnetic physics processes, accounting not only for the primary interactions of particles with matter, but also capable of describing the secondary effects deriving from the de-excitation of atoms, where primary collisions may have created vacancies. The Geant4 Simulation Toolkit encompasses a set of models to handle the atomic relaxation induced by the photoelectric effect, Compton scattering and ionization, with the production of X-ray fluorescence and of Auger electrons. We describe the physics models implemented in Geant4 to handle the atomic relaxation, the object-oriented design of the software and the validation of the models with respect to test beam data. In particular, we present a novel development of an original model for particle induced X-ray emission, to be released for the first time in the summer of 2004. We illustrate applications of Geant4 atomic relaxation models for physics reach studies in a real-life experimental context
Speaker: M.G. Pia (INFN GENOVA)
• 62
Ion transport simulation using Geant4 hadronic physics
The transportation of ions in matter is subject of much interest in not only high-energy ion-ion collider experiments such as RHIC and LHC but also many other field of science, engineering and medical applications. Geant4 is a tool kit for simulation of passage of particles through matter and its OO designs makes it easy to extend its capability for ion transports. To simulate ions interaction, we had to develop two major functionalities to Geant4. One is cross section calculators and the other is final stage generators for ion-ion interactions. For cross sections calculator, several empirical cross section formulas for the total reaction cross section of ion-ion interactions were investigated. And for final stage generator, binary cascade and quark-gluon string model of Geant4 were improved so that ions reaction with matter can also be calculated. Having successfully developed both functionalities, Geant4 can be applied to ion transportation problems. In the presentation we will explain cross section and final stage generator in detail and show comparisons with experimental data.
Speaker: Dr T. Koi (SLAC)
• 63
Adding Kaons to the Bertini Cascade Model
A version of the Bertini cascade model for hadronic interactions is part of the Geant4 toolkit, and may be used to simulate pion-, proton-, and neutron-induced reactions in nuclei. It is typically valid for incident energies of 10 GeV and below, making it especially useful for the simulation of hadronic calorimeters. In order to generate the intra-nuclear cascade, the code depends on tabulations of exclusive channel cross section data, parameterized angular distributions and phase-space generation of multi-particle final states. To provide a more detailed treatment of hadronic calorimetry, and kaon interactions in general, this model is being extended to include incident kaons up to an energy of 15 GeV. Exclusive channel cross sections, up to and including six-body final states, will be included for K+, K-, K0, K0bar, lambda, sigma+, sigma0, sigma-, xi0 and xi-. K+nucleon and K-nucleon cross sections are taken from various cross section catalogs, while most of the cross sections for incident K0, K0bar and hyperons are estimated from isospin and strangeness considerations. Because there is little data for incident hyperon cross sections, use of the extended model will be restricted to incident K+, K-, K0S and K0L. Hyperon cross sections are included only to handle the secondary interactions of hyperons created in the intra-nuclear cascade.
• 64
CHIPS based hadronization of quark-gluon strings
Quark-gluon strings are usually fragmented on the light cone in hadrons (PITHIA, JETSET) or in small hadronic clusters which decay in hadrons (HERWIG). In both cases the transverse momentum distribution is parameterized as an unknown function. In CHIPS the colliding hadrons stretch Pomeron ladders to each other and, when the Pomeron ladders meet in the rapidity space, they create Quasmons (hadronic clusters bigger then Amati-Veneziano clusters of HERWIG). The Quasmon size and the corresponding transverse momentum distributions are tuned by the Drell-Yan mu+mu- pairs. The final Quasmon fragmentation in CHIPS is tuned by the e+e- and proton-antiproton annihilation, which is already published.
Speaker: M. Kosov (CERN)
• 65
Synergia: A Modern Tool for Accelerator Physics Simulation
Computer simulations play a crucial role in both the design and operation of particle accelerators. General tools for modeling single-particle accelerator dynamics have been in wide use for many years. Multi-particle dynamics are much more computationally demanding than single-particle dynamics, requiring supercomputers or parallel clusters of PCs. Because of this, simulations of multi- particle dynamics have been much more specialized. Although several multi-particle simulation tools are now available, they tend to cover a narrow range of topics. Most also present difficulties for the end user ranging from platform portability to arcane interfaces. In this presentation, we discuss Synergia, a multi-particle accelerator simulation tool developed at Fermilab, funded by the DOE SciDAC program. Synergia was designed to cover a variety of physics processes while presenting a flexible and humane interface to the end user. It is a hybrid application, primarily based on the existing packages mxyzptlk/beamline and Impact. Our presentation covers Synergia's physics capabilities and human interface. We focus on the computational problems we encountered and solved in the process of building an application out of codes written in Fortran 90, C++, and wrapped with a Python front-end. We also discuss some approaches we have used in the visualization of the high-dimensional data that comes out of a particle accelerator simulations, especially our work with OpenDX.
Speaker: Dr P. Spentzouris (FERMI NATIONAL ACCELERATOR LABORATORY)
• Online Computing Jungfrau

### Jungfrau

#### Interlaken, Switzerland

• 66
New experiences with the ALICE High Level Trigger Data Transport Framework
The Alice High Level Trigger (HLT) is foreseen to consist of a cluster of 400 to 500 dual SMP PCs at the start-up of the experiment. It's input data rate can be up to 25GB/s. This has to be reduced to at most 1.2 GB/s before the data is sent to DAQ through event selection, filtering, and data compression. For these processing purposes, the data is passed through the cluster in several stages and groups for successive merging until, at the last stage, fully processed complete events are available. For the transport of the data through the stages of the cluster, a software framework is being developed consisting of multiple components. These components can be connected via a common interface to form complex configurations that define the data flow in the cluster. For the framework, new benchmark results are available as well as experience from tests and data challenges run in Heidelberg. The framework is scheduled to be used during upcoming testbeam experiments.
Speaker: T.M. Steinbeck (KIRCHHOFF INSTITUTE OF PHYSICS, RUPRECHT-KARLS-UNIVERSITY HEIDELBERG, for the Alice Collaboration)
• 67
The Architecture of the ZEUS Second Level Global Tracking Trigger
The architecture and performance of the ZEUS Global Track Trigger (GTT) are described. Data from the ZEUS silicon Micro Vertex detector's HELIX readout chips, corresponding to 200k channels, are digitized by 3 crates of ADCs and PowerPC VME board computers push cluster data for second level trigger processing and strip data for event building via Fast and GigaEthernet network connections. Additional tracking information from the central tracking chamber and forward straw tube tracker are interfaced into the 12 dual CPU PC farm of the global track trigger where track and vertex finding is performed by separately threaded algorithms. The system is data driven at the ZEUS first level trigger rates <500Hz, generating trigger results after a mean time of 10ms. The GTT integration into the ZEUS second level trigger and recent performance are reviewed.
Speaker: M. Sutton (UNIVERSITY COLLEGE LONDON)
• 68
A Level-2 trigger algorithm for the identification of muons in the Atlas Muon Spectrometer
The Atlas Level-2 trigger provides a software-based event selection after the initial Level-1 hardware trigger. For the muon events, the selection is decomposed in a number of broad steps: first, the Muon Spectrometer data are processed to give physics quantities associated to the muon track (standalone features extraction) then, other detector data are used to refine the extracted features. The "muFast" algorithm performs the standalone feature extraction, providing a first reduction of the muon event rate from Level-1. It confirms muon track candidates with a precise measurement of the muon momentum. The algorithm is designed to be both conceptually simple and fast so as to be readily implemented in the demanding online environment in which the Level-2 selection code will run. Never-the-less its physics performance approaches, in some cases, those of the offline reconstruction algorithms. This paper describes the implemented algorithm together with the software techniques employed to increase its timing performance.
Speaker: A. Di Mattia (INFN)
• 69
An Embedded Linux System Based on PowerPC
This article introduces a Embedded Linux System based on vme series PowerPC as well as the base method on how to establish the system. The goal of the system is to build a test system of VMEbus device. It also can be used to setup the data acquisition and control system. Two types of compiler are provided by the developer system according to the features of the system and the PowerPC. At the top of the article some typical embedded Operation system will be introduced and the features of different system will be provided. And then the method on how to build a embedded Linux system as well as the key technique will be discussed in detail. Finally a successful data acquisition example will be given based on the test system.
Speaker: M. Ye (INSTITUTE OF HIGH ENERGY PHYSICS, ACADEMIA SINICA)
• 70
The introduction to BES computing environment
BES is an experiment on Beijing Electron-Positron Collider (BEPC). BES computing environment consists of PC/Linux cluster and mainly relies on the free software. OpenPBS and Ganglia are used as job schedule and monitor system. With helps from CERN IT Division, CASTOR was implemented as storage management system. BEPC is being upgraded and luminosity will increase one hundred times comparing to current machine. The data produced by new BES-III detector will be about 700 Terabytes per year. To meet the computing demand, we proposed a solution based on PC/Linux/Cluster and SAN technology. CASTOR will be used to manage the storage resources of SAN. We started to develop a graphical interface for CASTOR. Some tests on data transmission performance of SAN environment were carried out. The result shows that I/O performance of SAN is better than that of traditional storage connection method including IDE, SCSI etc and it can satisfy BESIII experiment’s demand for data processing.
Speaker: G. CHEN (COMPUTING CENTER,INSTITUTE OF HIGH ENERGY PHYSICS,CHINESE ACADEMY OF SCIENCES)
• 71
The DAQ system for the Fluorescence Detectors of the Pierre Auger Observatory
S.Argiro(1), A. Kopmann (2), O.Martineau (2), H.-J. Mathes (2) for the Pierre Auger Collaboration (1) INFN, Sezione Torino (2) Forschungszentrum Karlsruhe The Pierre Auger Observatory currently under construction in Argentina will investigate extensive air showers at energies above 10^18 eV. It consists of a ground array of 1600 Cherenkov water detectors and 24 fluorescence telescopes to discover the nature and origin of cosmic rays at these ultra-high energies. The ground array is overlooked by 4 different fluorescence buildings which are equipped with 6 telescopes each. An independent local data acquisition (DAQ) is running in each building to readout 480 channels per telescope. In addition, a central DAQ merges data coming from the water detectors and all fluorescence buildings. The system architecture follows the object oriented paradigm and has been implemented using several of the most widespread open source tools for interprocess communication, data storage and user interfaces. Each local DAQ is connected with further sub-systems for calibration, for monitoring of atmospheric parameters and slow control. The latter is responsible for general safety functions and the experiment control. After a prototype phase to validate the system concept the Observatory is taking data in the final setup since September 2003. The data taking will continue during the construction phase and the integration of all sub-systems. We present the design and the present status of the system currently running in two different buildings with a total of 8 telescopes installed.
Speaker: H-J. Mathes (FORSCHUNGSZENTRUM KARLSRUHE, INSTITUT FüR KERNPHYSIK)
• 4:00 PM
break
• 72
Migrating PHENIX databases from object to relational model
To benefit from substantial advancements in Open Source database technology and ease deployment and development concerns with Objectivity/DB, the Phenix experiment at RHIC is migrating its principal databases from Objectivity to a relational database management system (RDBMS). The challenge of designing a relational DB schema to store a wide variety of calibration classes was solved by using ROOT I/O and storing each calibration object opaquely as a BLOB ( Binary Large OBject ). Calibration metadata is stored as built-in types to allow fast index-based database search. To avoid a database back-end dependency the application was made ODBC-compliant (Open DataBase Connectivity is a standard database interface). An existent well-designed calibration DB API allowed users to be shielded from the underlying database technology change. Design choices and experience with transferring a large amount of Objectivity data into relational DB will be presented.
Speaker: I. Sourikova (BROOKHAVEN NATIONAL LABORATORY)
• 73
The PHENIX Event Builder
The PHENIX detector consists of 14 detector subsystems. It is designed such that individual subsystems can be read out independently in parallel as well as a single unit. The DAQ used to read the detector is a highly-pipelined parallel system. Because PHENIX is interested in rare physics events, the DAQ is required to have a fast trigger, deep buffering, and very high bandwidth. The PHENIX Event Builder is a critical part of the back-end of the PHENIX DAQ. It is reponsible for assembling event fragments from each subsystem into complete events ready for archiving. It allows subsystems to be read out either in parallel or simultaneously and supports a high rate of archiving. In addition, it implements an environment where Level-2 trigger algorithms may be optionally executed, providing the ability to tag and/or filter rare physics events. The Event Builder is a set of three Windows NT/2000 multithreaded executables that run on a farm of over 100 dual-cpu 1U servers. All control and data messaging is transported over a Foundry Layer2/3 Gigabit switch. Capable of recording a wide range of event sizes from central Au-Au to p-p interactions, data archiving rates of over 400 MB/s at 2 KHz event rates have been achieved in the recent Run 4 at RHIC. Further improvements in performance are expected from migrating to Linux for Run 5. The PHENIX Event Builder design and implementation, as well as performance and plans for future development, will be discussed.
Speaker: D. Winter (COLUMBIA UNIVERSITY)
• 74
The DZERO Run II Level 3 Trigger and Data Aquisition System
The DZERO Level 3 Trigger and Data Aquisition (L3DAQ) system has been running continuously since Spring 2002. DZERO is loacated at one of the two interaction points in the Fermilab Tevatron Collider. The L3DAQ moves front-end readout data from VME crates to a trigger processor farm. It is built upon a Cisco 6509 Ethernet switch, standard PCs, and commodity VME single board computers. We will report on operating experience, performance, and upgrades. In particular, issues related to hardware quality, networking and security, and an expansion of the trigger farm will be discussed.
Speaker: D Chapin (Brown University)
• 75
Testbed Management for the ATLAS TDAQ
The talk presents the experience gathered during the testbed administration (~100 PC and 15+ switches) for the ATLAS Experiment at CERN. It covers the techniques used to resolve the HW/SW conflicts, network related problems, automatic installation and configuration of the cluster nodes as well as system/service monitoring in the heterogeneous dynamically changing cluster environment. Techniques range from manual actions to the fully automated procedures based on tools like Kickstart, SystemImager, Nagios, MRTG and Spectrum. Booting diskless nodes using EtherBoot, PXEboot is also investigated as a possible technique of managing Atlas Production Farms. Kernel customization techniques (building, deploying, distribution policy) allow users to freely choose proffered kernel flavors without sysadmin intervention. At the same time administrator retains full control over entire testbed. The overall experience has shown that the proper use of the open-source tools addresses very well the needs of the ATLAS Trigger DAQ community. This approach may also be interesting for addressing certain aspects of GRID Farm Management.
Speaker: M. ZUREK (CERN, IFJ KRAKOW)
• 76
Integration of ATLAS Software in the Combined Beam Test
The ATLAS collaboration had a Combined Beam Test from May until October 2004. Collection and analysis of data required integration of several software systems that are developed as prototypes for the ATLAS experiment, due to start in 2007. Eleven different detector technologies were integrated with the Data Acquisition system and were taking data synchronously. The DAQ was integrated with the High Level Trigger software, which will perform online selection of ATLAS events. The data quality was monitored at various stages of the Trigger and DAQ chain. The data was stored in a format foreseen for ATLAS and was analyzed using a prototype of the experiments' offline software, using the Athena framework. Parameters recorded by the Detector Control System were recorded in a prototype of the ATLAS Conditions Data Base and were made available for the offline analysis of the collected event data. The combined beam test provided a unique opportunity to integrate and to test the prototype of ATLAS online and offline software in its complete functionality.
Speaker: M. Dobson (CERN)
• 77
Performance of the ATLAS DAQ DataFlow system
The ATLAS Trigger and DAQ system is designed to use the Region of Interest (RoI)mechanism to reduce the initial Level 1 trigger rate of 100 kHz down to about 3.3 kHz Event Building rate. The DataFlow component of the ATLAS TDAQ system is responsible for the reading of the detector specific electronics via 1600 point to point readout links, the collection and provision of RoI to the Level 2 trigger, the building of events accepted by the Level 2 trigger and their subsequent input to the Event Filter system where they are subject to further selection criteria. To validate the design and implementation of the DAQ DataFlow system, a prototype setup representing 20% of the final system, has been put together at CERN. Thisbaseline prototype contains 68 PCs running Linux, and exchanging data via a 64-portand a 31-port Gigabit Ethernet switches for Event Building and RoI Collection. The system performance is measured by playing back simulated data through the system andrunning prototype algorithms in the Level 2 trigger. In parallel a full discrete event model of the system has been developed and tuned to the testbed results as an aid to studying the system performance at and beyond the size of the prototype setup. Measurements will be presented on the performance of the prototype setup, showing that the components of the current integrated system implementation can already sustain the their nominal ATLAS requirements using existing hardware and Gigabit network technology: 20 kHz RoI Collection rate per readout link, 3 kHz Event Building rate and 70 Mbyte/s throughput per event building node. The use of these results to calibrate the model will also be presented along with the model predications for the performance of the final DAQ DataFlow system.
Speaker: G. unel (UNIVERSITY OF CALIFORNIA AT IRVINE AND CERN)
• Tuesday, September 28
• Plenary: Session 3 Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

Convener: Mirco Mazzucato (INFN)
• 78
The LCG Project - Preparing for Startup
The talk will cover briefly the current status of the LHC Computing Grid project and will discuss the main challenges facing us as we prepare for the startup of LHC.
Speaker: Les Robertson (CERN)
• 79
Operating the LCG and EGEE Production Grids for HEP
In September 2003 the first LCG-1 service was put into production at most of the large Tier 1 sites and was quickly expanded up to 30 Tier 1 and Tier 2 sites by the end of the year. Several software upgrades were made and the LCG-2 service was put into production in time for the experiment data challenges that began in February 2004 and continued for several months. In particular LCG-2 introduced transparent access to mass storage and managed disk-only storage elements, and a first release of the Grid File Access library. Much valuable experience was gained during the data challenges in all aspects from the functionality and use of the middleware, to the deployment, maintenance, and operation of the services at many sites. Based on this experience a program of work to address the functional and operational issues is being implemented. The goal is to focus on essential areas such as data management and to build by the end of 2004 a basic grid system capable of handling the basic needs of LHC computing, providing direction for future middleware and service development. The LCG-2 infrastructure also forms the production service of EGEE. This involves supporting new application communities, bringing in new sites not associated with HEP and evolving a full scale 24x7 user and operational support structure. We will describe the EGEE infrastructure, how it supports and interacts with LCG, and how we expect the infrastructure to evolve over the next year of the EGEE project.
Speaker: I. Bird (CERN)
• 80
Grid3: An Application Grid Laboratory for Science
The U.S. Trillium Grid projects in collaboration with High Energy Experiment groups from the Large Hadron Collider (LHC), ATLAS and CMS, Fermi-Lab's BTeV, members of the LIGO , SDSS collaborations and groups from other scientific disciplines and computational centers have deployed a multi-VO, application-driven grid laboratory ("Grid3"). The grid laboratory has sustained for several months the production- level services required by the participating experiments. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. The Grid3 infrastructure was deployed from grid level services provided by groups and applications within the collaboration. The services were organized into four distinct "grid level services" including: Grid3 Packaging, Monitoring and Information systems, User Authentication and the iGOC Grid Operations Center. In this paper we describe the Grid3 operational model, deployment strategies, and site installation and configuration procedures. We describe the grid middleware components used, how the components were packaged and deployed on sites each under its own loacl administrative domain, and how the pieces fit together to form the Grid3 grid infrastructure.
• Poster Session 1: Online Computing, Computer Fabrics, Wide Area Networking Coffee

### Coffee

#### Interlaken, Switzerland

Online Computing

• 81
A database prototype for managing computer systems configurations
We describe a database solution in a web application to centrally manage the configuration information of computer systems. It extends the modular cluster management tool Quattor with a user friendly web interface. System configurations managed by Quattor are described with the aid of PAN, a declarative language with a command line and a compiler interface. Using a relational schema, we are able to build a database for efficient data storage and configuration data processing. The relational schema ensures the consistency of the described model while the standard database interface ensures the fast retrieval of configuration information and statistic data. The web interface simplifies the typical administration and routine operations tasks, e.g. definition of new types, configuration comparisons and updates etc. We present a prototype built on the above ideas and used to manage a cluster of developer workstations and specialised services in CMS.
Speaker: Z. Toteva (Sofia University/CERN/CMS)
• 82
AutoBlocker: A system for detecting and blocking of network scanning based on analysis of netflow data.
In a large campus network, such as Fermilab's ten thousand nodes, scanning initiated from either outside of or within the campus network raises security concerns, may have very serious impact on network performance, and even disrupt normal operation of many services. In this paper we introduce a system for detecting and automatic blocking of excessive traffic of different nature, scanning, DoS attacks, virus infected computers. The system, called AutoBlocker, is a distributed computing system based on quasi-real time analysis of network flow data collected from the border router and core routers. AutoBlocker also has an interface to accept alerts from the IDS systems (e.g. BRO, SNORT) that are based on other technologies. The system has multiple configurable alert levels for the detection of anomalous behavior and configurable trigger criteria for automated blocking of the scans at the core or border routers. It has been in use at Fermilab for about 2 years, and become a very valuable tool to curtail scan activity within the Fermilab campus network.
Speaker: A. Bobyshev (FERMILAB)
• 83
Boosting the data logging rates in run 4 of the PHENIX experiment
With the improvements in CPU and disk speed over the past years, we were able to exceed the original design data logging rate of 40MB/s by a factor of 3 already for the Run 3 in 2002. For the Run 4 in 2003, we increased the raw disk logging capacity further to about 400MB/s. Another major improvement was the implementation of compressed data logging. The PHENIX raw data, after application of the standard data reduction techniques, were found to be further compressible by utilities like gzip by almost a factor of 2, and we defined a PHENIX standard of a compressed raw data format. The buffers that make up a raw data file consist of buffers that would get compressed and the resulting smaller data volume written out to disk. For a long time, this proved to be much too slow to be usable in the DAQ, until we could shift the compression to the event builder machines and so distributed the load over many fast CPU's. We also selected a different compression algorithm, LZO, which is about a factor of 4 faster than the "compress2" algorithm used internally in gzip. With the compression, the raw data volume shrinks to about 60% of the original size, boosting the original data rate before compression to more than 700MB/s. We will the present the techniques and architecture, and the impact this has had on the data taking in Run 4.
Speaker: Martin purschke
• 84
Cluster architectures used to provide CERN central CVS services
There are two cluster architecture approaches used at CERN to provide central CVS services. The first one (http://cern.ch/cvs) depends on AFS for central storage of repositories and offers automatic load-balancing and fail-over mechanisms. The second one (http://cern.ch/lcgcvs) is an N + 1 cluster based on local file systems, using data replication and not relying on AFS. It does not provide either dynamic load-balancing or automatic fail-over. Instead a series of tools were developed for repository relocation in case of fail-over and for manual load- balancing. Both architectures are used in production at CERN and project managers can chose one or the other, depending on their needs. If, eventually, one architecture proves to be significantly better, the other one may be phased out. This paper presents in detail both approaches and describes their relative advantages and drawbacks, as well as some data about them (number of repositories, average repository size, etc).
Speaker: M. Guijarro (CERN)
• 85
Control and state logging for the PHENIX DAQ System
The PHENIX DAQ system is managed by a control system responsible for the configuration and monitoring of the PHENIX detector hardware and readout software. At its core, the control system, called Runcontrol, is a process that manages the various components by way of a distributed architecture using CORBA. The control system, called Runcontrol, is a set of process that manages virtually all detector components through a distributed architecture base on CORBA. A key aspect of the distributed control system, the messaging system, is the ability to access critical detector state information, and deliver it to operators and applications of the control system. The goal of the system is to concentrate all output messages of the distributed processes, which would normally end up in log files or on a terminal, in a central place. The messages may originate from or be received by applications running on any of the multiple platforms which are in use including Linux, Windows, Solaris, and VxWorks. Listener applications allow the DAQ operators to get a comprehensive overview of all messages they are interested in, and also allows scripts or other programs to take automated action in response to certain messages. Messages are formatted to contain information about the source of the message, the message type, and its severity. Applications written to provide filtering of messages by the DAQ operators by type, severity and source will be presented. We will discuss the mechanism underlying this system, present examples of the use, and discuss performance and reliability issues.
Speaker: Martin purschke
• 86
Designing a Useful Email System
Email is an essential part of daily work. The FNAL gateways process in excess of 700,000 messages per week. Amomng those messages are many containing viruses and unwanted spam. This paper outlines the FNAL email system configuration. We will discuss how we have defined our systems to provide optimum uptime as well as protection against viruses, spam and unauthorized users.
Speaker: J. Schmidt (Fermilab)
• 87
Distributed Filesystem Evaluation and Deployment at the US-CMS Tier-1 Center
The scalable serving of shared filesystems across large clusters of computing resources continues to be a difficult problem in high energy physics computing. The US CMS group at Fermilab has performed a detailed evaluation of hardware and software solutions to allow filesysystem access to data from computing systems. The goal of the evaluation was to arrive at a solution that was able to meet the growing needs of the US-CMS Tier-1 facility. The system needed to be scalable and be able to grow with the increasing size of the facility, load balanced and with high performance for data access, reliable and redundant with protection against failures, and manageable and supportable given a reasonable level of effort. Over the course of a one year evaluation the group developed a suite of tools to analysis performance and reliability under load conditions, and then applied these tools to evaluations systems at Fermilab. In this presentation we will describe the suite of tools developed, the results of the evaluation process, the system and architecture that were eventually chosen, and the experience so far supporting a user community.
Speaker: L. Lisa Giacchetti (FERMILAB)
• 88
Experience with CORBA communication middleware in the ATLAS DAQ
As modern High Energy Physics (HEP) experiments require more distributed computing power to fulfill their demands, the need for an efficient distributed online services for control, configuration and monitoring in such experiments becomes increasingly important. This paper describes the experience of using standard Common Object Request Broker Architecture (CORBA) middleware for providing a high performance and scalable software, which will be used for the online control, configuration and monitoring in the ATLAS Data Acquisition (DAQ) system. It also presents the experience, which was gained from using several CORBA implementations and replacing one CORBA broker with another. Finally the paper introduces results of the large scale tests, which have been done on the cluster of more then 300 nodes, demonstrating the performance and scalability of the ATLAS DAQ online services. These results show that the CORBA standard is truly appropriate for the highly efficient online distributed computing in the area of modern HEP experiments.
Speaker: S. Kolos (CERN)
• 89
Experiences Building a Distributed Monitoring System
The NGOP Monitoring Project at FNAL has developed a package which has demonstrated the capability to efficiently monitor tens of thousands of entities on thousands of hosts, and has been in operation for over 4 years. The project has met the majority of its initial reqirements, and also the majority of the requirements discovered along the way. This paper will describe what worked, and what did not, in the first 4 years of the NGOP Project at Fermilab; and we hope will provide valuable lessons for others considering undertaking even larger (GRID-scale) monitoring projects.
Speaker: J. Fromm (Fermilab)
• 90
Future processors: What is on the horizon for HEP farms?
In 1995 I predicted that the dual-processor PC would start invading HEP computing and a couple of years later the x86-based PC was omnipresent in our computing facilities. Today, we cannot imagine HEP computing without thousands of PCs at the heart. This talk will look at some of the reasons why we may one day be forced to leave this sweet-spot. This would be not because we (the HEP community) want to, but rather because other market forces may pull in different directions. Amongst such forces, I will review the new generation of powerful game consoles where IBM's Power processor is currently making strong inroads. Then I will look at the huge mobile market where low-powered processing rules rather than power-hungry DP Xeon/Xeon-like processors, and thirdly I will explore in my talk the promise of enterprise servers with a large number of processors on each die (so-called Core Multi-Processors). For all the scenarios, we must, of course, keep in mind that HEP can only move when the price-performance ratio is right.
Speaker: S. Jarp (CERN)
• 91
InGRID - Installing GRID
The "gridification" of a computing farm is usually a complex and time consuming task. Operating system installation, grid specific software, configuration files customization can turn into a large problem for site managers. This poster introduces InGRID, a solution used to install and maintain grid software on small/medium size computing farms. Grid elements installation with InGRID consists in three steps. In the first step nodes are installed using RedHat Kickstart, an installation method that automate most of a Linux distribution installation, including disk partitioning, boot loader configuration, network configuration, base package selection. Grid specific software is than integrated using apt4rpm, a package management wrapper over the rpm commands. Apt automatically manages packages dependencies, and is able to download, install and upgrade RPMs from a central software repository. Finally, grid configuration files are customized through LCFGng, a system to setup and maintain Unix machines, that can configure many system files, execute scripts, create users, etc.
Speaker: F.M. Taurino (INFM - INFN)
• 92
Integrating Mutiple PC Farms into an uniform computing System with Maui
These are several on-going experiments at IHEP, such as BES, YBJ, and CMS collaboration with CERN. each experiment has its own computing system, these computing systems run separately. This leads to a very low CPU utilization due to different usage period of each experiment. The Grid technology is a very good candidate for integrating these separate computing systems into a "single image", but it is too early to be put into a production system as it is not stable and user-friendly as well. A realistic choice is to implement such an integration and sharing with Maui, an advacned scheduler. Each PC farm is thought as a partition, which is assigned high priority to its owner users with preemtor feature. this paper will describe the detail of implementation with Maui scheduler, as well as the entire system architecture and configuration and fuctions.
Speaker: G. Sun (INSTITUE OF HIGH ENERGY PHYSICS)
• 93
Linux for the CLEO-c Online system
The CLEO collaboration at the Cornell electron positron storage ring CESR has completed its transition to the CLEO-c experiment. This new program contains a wide array of Physics studies of $e^+e^-$ collisions at center of mass energies between 3 GeV and 5 GeV. New challenges await the CLEO-c Online computing system, as the trigger rates are expected to rise from < 100 Hz to around 300 Hz at the J/Psi production threshold, with a moderate increase in data throughput requirements. While the current Solaris and VxWorks based readout system will perform adequately under those conditions, there is a desire to improve the performance of the central components to extend monitoring capabilities and provide larger safety margins. The solution, as in most modern particle detector systems, is to deploy Linux on Intel architecture computers for the performance critical applications. For reasons of hardware and software availability, the existing CLEO Online and Offline computing environment has been ported to the Linux platform. This development allows the described challenge to be met. In this presentation, we will report on our experiences adapting the CLEO Online computing system for operation under Linux. Issues regarding third party software and code portability will be addressed. Performance measurements will be presented.
Speaker: H. Schwarthoff (CORNELL UNIVERSITY)
• 94
Managing software licences for a large research laboratory
Speaker: N. Hoeimyr (CERN IT)
• 95
Methodologies and techniques for analysis of network flow data
Network flow data gathered on border routers and core network switch/routers is used at Fermilab for statistical analysis of traffic patterns, passive network monitoring, and estimation of network performance characteristics. Flow data is also a critical tool in the investigation of computer security incidents. Development and enhancement of flow- based tools is on-going effort. The current state of flow analysis is based on the open source Flow-Tools package. This paper describes the most recent developments in flow analysis at Fermilab. Our goal is to provide a multidimensional view of network traffic patterns, with a detailed breakdown based on site, experiment, domain, subnet, hosts, protocol, or application. The latest analysis tool provides a descriptive and graphical representation of network traffic broken down by combinations of experiment and DNS domain. The tool can be utilized in real-time mode, as well as to provide a historical view. Another tool analyzes flow data to provide performance characteristics of completed multistream GridFTP data transfers. The current prototype provides a web interface for dynamic administration of the flow reports. We will describe and discuss the new features that we plan on developing in future enhancements to our flow analysis tool set.
Speaker: A. Bobyshev (FERMILAB)
• 96
Monitoring the CDF distributed computing farms
CDF is deploying a version of its analysis facility (CAF) at several globally distributed sites. On top of the hardware at each of these sites is either an FBSNG or Condor batch manager and a SAM data handling system which in some cases also makes use of dCache. The jobs which run at these sites also make use of a central database located at Fermilab. Each of these systems has its own monitoring. In order to maintain and effectively use the distributed system, it isimportant that both the administrators and the users can get a complete global view of the system. We will present a system which integrates the monitoring of all of these services into one globally accessible system based on the Monalisa product. This system is intended for administrators to monitor the system status and service level and for users to better locate resources and monitor job progress. In addition, it is meant to satisfy the request by the CDF International Finance Committee that global computing resource usage by CDF can be audited.
Speaker: I. Sfiligoi (INFN Frascati)
• 97
New compact hierarchical mass storage system at Belle realizing a peta-scale system with inexpensive ice-raid disks and an S-ait tape library
The Belle experiment has accumulated an integrated luminosity of more than 240fb-1 so far, and a daily logged luminosity now exceeds 800pb- 1. These numbers correspond to more than 1PB of raw and processed data stored on tape and an accumulation of the raw data at the rate of 1TB/day. To meet these storage demands, a new cost effective, compact hierarchical mass storage system has been constructed. The system consists of commodity RAID systems using IDE disks and Linux PC servers as the front-end and a tape library system using the new high density SONY S-AIT tape as the back-end.The SONY Peta Serv software manages migration and restoration of the files between tapes and disks. The capacity of the tape library is, at the moment, 500 TB in three 19 inch racks and the RAID system, 64 TB in two 19 inch racks. An extension of the system to 1.2 PB tape library in eight racks with 150 TB RAID in four racks is planned. In this talk, experiences with the new system will be discussed and the performance of the system when used for data processing and physics analysis of the Belle experiment will be demonstrated.
Speaker: N. Katayama (KEK)
• 98
Online Monitoring and online calibration/reconstruction for the PHENIX experiment
The PHENIX experiment consists of many different detectors and detector types, each one with its own needs concerning the monitoring of the data quality and the calibration. To ease the task for the shift crew to monitor the performance and status of each subsystem in PHENIX we developed a general client server based framework which delivers events at a rate in excess of 100Hz. This model was chosen to minimize the possibility of accidental interference with the monitoring tasks themselves. The user only interacts with the client which can be restarted any time without loss or alteration of information on the server side. It also enables multiple people to check simultaneously the same detector - if need be even from remote locations. The information is transferred in form of histograms which are processed by the client. These histograms are saved for each run and some html output is generated which is used later on to remove problematic runs from the offline analysis. An additional interface to a data base is provide to enable the display of long term trends. This framework was augmented to perform an immediate calibration pass and a quick reconstruction of rare signals in the counting house. This is achieved by filtering out interesting triggers and processing them on a local Linux cluster. That enabled PHENIX to e.g. keep track of the number of J/Psi's which could be expected while still taking data.
Speaker: Martin purschke
• 99
Parallel implementation of Parton String Model event generator
We report the results of parallelization and tests of the Parton String Model event generator at the parallel cluster of St.Petersburg State University Telecommunication center. Two schemes of parallelization were studied. In the first approach master process coordinates work of slave processes, gathers and analyzes data. Results of MC calculations are saved in local files. Local files are sent to the host computer on which the program of data processing is started. The second approach uses the parallel write in the common file shared between all processes. In this case the load of a communication subsystem of the cluster grows. Both approaches are realized with MPICH library. Some problems including the pseudorandom number generation inparallel computations were solved. The modified parallel version of the PSM code includes a number of the additional possibilities: a selection of the impact parameter windows,the account of acceptance of the experimental setup and trigger selection data, and the calculation of various long range correlations between such observables as mean transverse momentum and charged particles multiplicity.
Speaker: S. Nemnyugin (ASSOCIATE PROFESSOR)
• 100
Patching PCs
FNAL has over 5000 PCs running either Linux or Windows software. Protecting these systems efficiently against the latest vulnerabilities that arise has prompted FNAL to take a more central approach to patching systems. We outline the lab support structure for each OS and how we have provided a central solution that works within existing support boundaries. The paper will cover how we identify what patches are considered crucial for a system on the FNAL network and how we verify that systems are appropriately patched.
Speaker: J. Schmidt (Fermilab)
• 101
Portable Gathering System for Monitoring and Online Calibration at Atlas
During the runtime of any experiment, a central monitoring system that detects problems as soon as they appear has an essential role. In a large experiment, like Atlas, the online data acquisition system is distributed across the nodes of large farms, each of them running several processes that analyse a fraction of the events. In this architecture, it is necessary to have a central process that collects all the monitoring data from the different nodes, produces full statistics histograms and analyses them. In this paper we present the design of such a system, called the "gatherer". It allows to collect any monitoring object, such as histograms, from the farm nodes, from any process in the DAQ, trigger and reconstruction chain. It also adds up the statistics, if required, and processes user defined algorithms in order to analyse the monitoring data. The results are sent to a centralized display, that shows the information online, and to the archiving system, triggering alarms in case of problems. The innovation of our approach is that conceptually it abstracts the several communication protocols underneath, being able to talk with different processes using different protocols at the same time and, therefore, providing maximum flexibility. The software is easily adaptable to any trigger-DAQ system. The first prototype of the gathering system has been implemented for Atlas and will be running during this year's combined test beam. An evaluation of this first prototype will also be presented.
Speaker: P. Conde MUINO (CERN)
• 102
Raw Ethernet based hybrid control system for the automatic control of suspended masses in gravitational waves interferometric detectors
In this paper we examine the performance of the raw Ethernet protocol in deterministic, low-cost, real-time communication. Very few applications have been reported until now, and they focus on the use of the TCP and UDP protocols, which however add a sensible overhead to the communication and reduce the useful bandwidth. We show how low-level Ethernet access can be used for peer-to-peer, short distance communication, and how it allows the writing of applications requiring large bandwidth. We show some examples running on the Lynx real-time OS and on Linux, both in mixed and homogeneous environments. As an example of application of this technique, we describe the architecture of an hybrid Ethernet based real-time control system prototype we implemented in Napoli, discussing its characteristics and performances. Finally we discuss its application to the real-time control of a suspended mass of the mode cleaner of the 3m prototype optical interferometer for gravitational wave detection operational in Napoli.
Speaker: A. Eleuteri (DIPARTIMENTO DI SCIENZE FISICHE - UNIVERSITà DI NAPOLI FEDERICO II)
• 103
Remote Shifting at the CLEO Experiment
The CLEO III data acquisition was from the beginning in the late 90's designed to allow remote operations and monitoring of the experiment. Since changes in the coordination and operation of the CLEO experiment two years ago enabled us to separate tasks of the shift crew into an operational and a physics task, existing remote capabilities have been revisited. In 2002/03 CLEO started to deploy its remote monitoring tasks for performing remote shifts and evaluated various communication tools e.g. video conferencing and remote desktop sharing. Remote, collaborating institutions were allowed to perform the physicist shift part from their home institutions keeping only the professional operator of the CLEO experiment on site. After a one year long testing and evaluation phase the remote shifting for physicists is now in production mode. This talk reports on experiences made when evaluating and deploying various options and technologies used for remote control, operation and monitoring e.g. CORBA's IIOP, X11 and VNC in the CLEO experiment. Furthermore some aspects of the usage of video conferencing tools by distributed shift crews are being discussed.
• 104
Simplified deployment of an EDG/LCG cluster via LCFG-UML
The clusters using DataGrid middleware are usually installed and managed by means of an "LCFG" server. Originally developed by the Univ. of Edinburgh and extended by DataGrid, this is a complex piece of software. It allows for automated installation and configuration of a complete grid site. However, installation of the "LCFG"-Server takes most of the time, thus hinder widespread use. Our approach was to set up and preconfigure the LCFG-server inside a "User Mode Linux" (UML) instance in order to make deployment faster. The result is the "UML-LCFG-Sserver". It is provided as a prebuilt root-filesystem image which can be up and running within only with few configuration steps. Detailed instructions and experience are also provided on the basis of tests within the CrossGrid project. Altogether UML-LCFG makes it easier for a new site to join an EDG/LCG based Grid by bypassing most of the LCFG server installation.
Speaker: A. Garcia (KARLSRUHE RESEARCH CENTER (FZK))
• 105
Status of the alignment calibrations in the ATLAS-Muon experiment
ATLAS is a particle detector which will is being built at CERN in Geneva. The muon detection system is made up among other things, of 600 chambers measuring 2 to 6 m2 and 30 cm thick. The chambers' position must be known with an accuracy of +/- 30 m for translations and +/-100 rad for rotations for a range of +/- 5mm and +/-5mrad. In order to fulfill these requirements, we have designed different optical sensors. Due to (i) the very high accuracy required, (ii) the number of sensors (over 1000) and (iii) the different type of sensors, we developed one user interface which manages among other things several control command software. Each of this software is associated with an accurate calibration bench. In this conference, we will present only the most complex one which combines command control, an analysis module, real time processing and database access. These softwares are now currently used for sensors calibration.
Speaker: V. GAUTARD (CEA-SACLAY)
• 106
The ATLAS DAQ system
The 40 MHz collision rate at the LHC produces ~25 interactions per bunch crossing within the ATLAS detector, resulting in terabytes of data per second to be handled by the detector electronics and the trigger and DAQ system. A Level 1 trigger system based on custom designed and built electronics will reduce the event rate to 100 kHz. The DAQ system is responsible for the readout of the detector specific electronics via 1600 point to point links hosted by Readout Subsystems, the collection and provision of ''Region of Interest data'' to the Level 2 trigger, the building of events accepted by the Level 2 trigger and their subsequent input to the Event Filter system where they are subject to further selection criteria. Also the DAQ provides the functionality for the configuration, control, information exchange and monitoring of the whole ATLAS detector. The baseline ATLAS DAQ architecture and its implementation will be introduced. In this implementation, the configuration, control, information exchange and monitoring functionalities are provided with CORBA; the control aspects are handled by an expert system based on CLIPS and the data connection between 150 Readout Subsystems, up to 500 Level 2 Processing Units and to 80 Event building nodes is done Gigabit Ethernet network technology. The experience from using the DAQ system in a combined test beam environment where all ATLAS subdetectors are participating will be presented. The current performances of some DAQ components as measured in the laboratory environment will be summarized. Some results from the large scale functionality tests, on a system of a 300 nodes, aimed at understanding the scalability of the current implementation will also be shown.
Speaker: G. unel (UNIVERSITY OF CALIFORNIA AT IRVINE AND CERN)
• 107
The CMS User Analysis Farm at Fermilab
US-CMS is building up expertise at regional centers in preparation for analysis of LHC data. The User Analysis Farm (UAF) is part of the Tier 1 facility at Fermilab. The UAF is being developed to support the efforts of the Fermilab LHC Physics Center (LPC) and to enableefficient analysis of CMS data in the US. The support, infrastructure, and services to enable a local analysis community at a computing center which is remote from the physical detector and the majority of the collaboration present unique challenges. The current UAF is a farm running the LINUX operating system providing interactive and batch computing for users. Load balancing, resource and process management are realized with FBSNG, the batch system developed at Fermilab. Over the course of the next three years the UAF must grow in size and functionality, while continuing to support simulated analysis activities and test beam applications. In this presentation we will describe the development of the current cluster, the technology choices made, the services required to support regional analysis activities, and plans for the future.
Speaker: Ian FISK (FNAL)
• 108
The Condor based CDF CAF
The CDF Analysis Facility (CAF) has been in use since April 2002 and has successfully served 100s of users on 1000s of CPUs. The original CAF used FBSNG as a batch manager. In the current trend toward multisite deployment, FBSNG was found to be a limiting factor, so the CAF has been reimplemented to use Condor instead. Condor is a more widely used batch system and is well integrated with the emerging grid tools. One of the most useful being the ability to run seamlessly on top of other batch systems. The transition has brought us a lot of additional benefits, such as ease of installation, fault tolerance and increased manageability of the cluster. The CAF infrastructure has also been simplified a lot since Condor implements a number of features we had to implement ourselves with FBSNG. In addition, our users have found that Condor's fair share mechanism provides a more equitable and predictable distribution of resources. In this talk the Condor based CAF will be presented, with particular emphasis on the changes needed to run with Condor, the problems found during and the advantages gained by the transition. Some background and the plans for the future, as well as results from Condor scalability tests will also be presented.
• 109
The Configurations Database Challenge in the ATLAS DAQ System
The ATLAS data acquisition system uses the database to describe configurations for different types of data taking runs and different sub-detectors. Such configurations are composed of complex data objects with many inter-relations. During the DAQ system initialisation phase the configurations database is simultaneously accessed by a large number of processes. It is also required that such processes be notified about database changes that happen during or between data-taking runs. The paper describes the architecture of the configurations database. It presents the set of graphical tools which are available for the database schema design and the data editing. The automatic generation of data access libraries for C++ and Java languages is also described. They provide the programming interfaces to access the database either via a common file system or via remote database servers, and the notification mechanism on data changes. The paper presents results of recent performance and scalability tests, which allow a conclusion to be drawn about the applicability of the current configurations database implementation in the future DAQ system.
Speaker: I. Soloviev (CERN/PNPI)
• 110
The Design, Installation and Management of a Tera-Scale High Throughput Cluster for Particle Physics Research
We describe our experience in building a cost efficient High Throughput Cluster (HTC) using commodity hardware and free software within a university environment. Our HTC has a modular system architecture and is designed to be upgradable. The current, second phase configuration, consists of 344 processors and 20 Tbyte of RAID storage. In order to rapidly install and upgrade software, we have developed automatic remote system installation and configuration tools to deploy standard software configurations on individual machines. To efficiently manage machines we have written a custom cluster configuration database. This database is used to track all hardware components in the cluster, the network and power distribution and the software configuration. Access to this database and the cluster performance and monitoring systems is provided by a web portal, which allows efficient remote management in our low-manpower environment. We describe the performance of our system under a mixed load of scalar and parallel tasks and discuss future possible improvements.
Speaker: A. Martin (QUEEN MARY, UNIVERSITY OF LONDON)
• 111
The Project CampusGrid
A central idea of Grid Computing is the virtualization of heterogeneous resources. To meet this challenge the Institute for Scientific Computing, IWR, has started the project CampusGrid. Its medium term goal is to provide a seamless IT environment supporting the on-site research activities in physics, bioinformatics, nanotechnology and meteorology. The environment will include all kinds of HPC resources: vector computers, shared memory SMP servers and clusters of commodity components as well as a shared high-performance storage solution. After introducing the general ideas the talk will inform about the current project status and scheduled development tasks. This is associated with reports on other activities in the fields of Grid computing and high performance computing at IWR.
Speaker: O. Schneider (FZK)
• 112
Using HEP Systems to Provide Storage for Biologists
Protein analysis, imaging, and DNA sequencing are some of the branches of biology where growth has been enabled by the availability of computational resources. With this growth, biologists face an associated need for reliable, flexible storage systems. For decades the HEP community has been driving the development of such storage systems to meet their own needs. Two of these systems - the dCache disk caching system and the Enstore hierarchical storage manager - are viable candidates for addressing the storage needs of biologists. Both incorporate considerable experience from the HEP community. While biologists have much to gain from the HEP community's experience with storage systems, they face several issues that are unique to the biological sciences. There is a wider diversity in experiments, in number and size of datafiles, and in client operating systems in biology than there is in HEP. Patient information must be kept confidential. Disparate IT departments set up firewalls that separate client systems and the storage system. Vanderbilt University is developing a storage system with the goal of meeting biologists' needs. This system will use Enstore for its robustness and reliability, and will use the flexible door-based architecture of dCache to provide storage services to biologists via web-portal, the dCache copy command, and custom applications. This system will be deployed using an automated tape library, several secure central servers, and nodes placed near biologists' existing compute infrastructure to ensure locality of caches and secure data channels between researchers and the central servers.
Speaker: Alan Tackett
• 113
WAN Emulation Development and Testing at Fermilab
The Compact Muon Solenoid (CMS) experiment at CERN's Large Hadron Collider (LHC) is scheduled to come on-line in 2007. Fermilab will act as the CMS Tier-1 center for the US and make experiment data available to more than 400 researchers in the US participating in the CMS experiment. The US CMS Users Facility group, based at Fermilab, has initiated a project to develop a model for optimizing movement of CMS experiment data between CERN and the various tiers of US CMS data centers. Fermilab has initiated a project to design a WAN emulation facility which will enable controlled testing of unmodified or modified CMS applications and TCP implementations locally under conditions that emulate WAN connectivity. The WAN emulator facility is configurable for latency, jitter, and packet loss. The initial implementation is based on the NISTnet software product. In this paper we will describe the status of this project to date, the results of validation and comparison of performance measurements obtained in emulated and real environment for different applications including multistreams GridFTP. We also will introduce future short term and intermediate term plans, as well as outstanding problems and issues.
Speaker: A. Bobyshev (FERMILAB)
• Plenary: Session 4 Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

Convener: David Williams (CERN)
• 114
The BIRN Project: Distributed Information Infrastructure and Multi-scale Imaging of the Nervous System (BIRN = Biomedical Informatics Research Network)
The grand goal in neuroscience research is to understand how the interplay of structural, chemical and electrical signals in nervous tissue gives rise to behavior. Experimental advances of the past decades have given the individual neuroscientist an increasingly powerful arsenal for obtaining data, from the level of molecules to nervous systems. Scientists have begun the arduous and challenging process of adapting and assembling neuroscience data at all scales of resolution and across disciplines into computerized databases and other easily accessed sources. These databases will complement the vast structural and sequence databases created to catalogue, organize and analyze gene sequences and protein products. The general premise of the neuroscience goal is simple; namely that with "complete" knowledge of the genome and protein structures accruing rapidly we next need to assemble an infrastructure that will facilitate acquisition of an understanding for how functional complexes operate in their cell and tissue contexts. Our U.C. San Diego- based group is leading several interdisciplinary projects around this grand challenge. We are evolving a shared infrastructure that allows for mapping molecular and cellular brain anatomy in the context of a shared multi-scale mouse brain atlas system, the Cell-Centered Database (CCDB). Complementary to these neuroinformatics activities at the National Center for Microscopy and Imaging Research in San Diego (NCMIR) we have developed new molecular labeling methods compatible with advanced ultra-wide field laser-scanning light microscopy and multi- resolution 3 dimensional electron microscopy. These new labeling and imaging methods are being used to populate the CCDB, using as a driver mouse models of neurological and neuropsychiatric disorders. The informatics framework is facilitating cooperative work by distributed teams of scientists engaged in focused collaborations aimed to deliver new fundamental understanding of structures on the scale of 1 nm3 to 10's of µm3, a dimensional range that encompasses macromolecular complexes, organelles, and multi-component structures like synapses and the cellular interactions in the context of the complex organization of the entire nervous system. This is a unique and pioneering effort that links new neuroscience techniques and revolutionary advances in information technology. Database federation tools are critical to the scalability of these efforts and future development plans will be described in the context of the NIH-supported project to create a new framework for collaboration and data integration in the Biomedical Informatics Research Network (BIRN). BIRN is the leading example of a virtual database effort that is using the challenge of federating multi-scale distributed data about the nervous systems to help guide the evolution of an International Cyberinfrastructure serving all science disciplines, including biomedicine.
Speaker: M. Ellisman (National Center for Microscopy and Imaging Research of the Center for Research in Biological Systems - The Department of Neurosciences, University of California San Diego School of Medicine - La Jolla, California - USA)
• 115
Grid Security
The aim of Grid computing is to enable the easy and open sharing of resources between large and highly distributed communities of scientists and institutes across many independent administrative domains. Convincing site security officers and computer centre managers to allow this to happen in view of today's ever-increasing Internet security problems is a major challenge. Convincing users and application developers to take security seriously is equally difficult. This paper will describe the main Grid security issues, both in terms of technology and policy, that have been tackled over recent years in LCG and related Grid projects. Achievements to date will be described and opportunities for future improvements will be addressed.
Speaker: David Kelsey (RAL)
• 116
The impact of e-science
Just as the development of the World Wide Web has had its greatest impact outside particle physics, so it will be with the development of the Grid. E-science, of which the Grid is just a small part, is already making a big impact upon many scientific disciplines, and facilitating new scientific discoveries that would be difficult to achieve in any other way. Key to this is the definition and use of metadata.
Speaker: Ken Peach (RAL)
• 117
EU Grid Research - Projects and Vision
The European Grid Research vision as set out in the Information Society Technologies Work Programmes of the EU's Sixth Research Framework Programme is to advance, consolidate and mature Grid technologies for widespread e-science, industrial, business and societal use. A batch of Grid research projects with 52 Million EUR EU support was launched during the European Grid Technology Days 15 - 17 September 2004. The portfolio of projects has the potential for turning Europe's strong competence and critical mass in Grid Research into competitive advantages. In this presentation, the Grid research vision of the programme and the new project portfolio will be introduced. More information: www.cordis.lu/ist/grids.
Speaker: Max Lemke
• 12:45 PM
Excursion
• Wednesday, September 29
• Plenary: Session 5 Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

Convener: Yoshiyuki Watase (KEK)
• 118
The role of scientific middleware in the future of HEP computing
In the 18 months since the CHEP03 meeting in San Diego, the HEP community deployed the current generation of grid technologies in a veracity of settings. Legacy software as well as recently developed applications was interfaced with middleware tools to deliver end-to-end capabilities to HEP experiments in different stages of their life cycles. In a series of data challenges, reprocessing efforts and data distribution activities the community demonstrated the benefits distributed computing can offer and the power a range of middleware tools can deliver. After running millions of jobs, moving tera-bytes of data, creating millions of files and resolving hundreds of bug reports, the community also exposed the limitations of these middleware tools. As we move to the next level of challenges, requirements and expectations, we must also examine the methods and procedures we employ to develop, implement and maintain our common suite of middleware tools. The talk will focus on the role common middleware developed by the scientific community can and should play in the software stack of current and future HEP experiments.
Speaker: Miron Livny (Wisconsin)
• 119
The Evolution of Computing: Slowing down? Not Yet!
Dr Sutherland will review the evolution of computing over the past decade, focusing particularly on the development of the database and middleware from client server to Internet computing. But what are the next steps from the perspective of a software company? Dr Sutherland will discuss the development of Grid as well as the future applications revolving around collaborative working, which are appearing as the next wave of computing applications.
Speaker: Andrew Sutherland (ORACLE)
• 120
Grand Challenges facing Storage Systems
In this talk, we will discuss the future of storage systems. In particular, we will focus on several big challenges which we are facing in storage, such as being able to build, manage and backup really massive storage systems, being able to find information of interest, being able to do long-term archival of data, and so on. We also present ideas and research being done to address these challenges, and provide a perspective on how we expect these challenges to be resolved as we go forward.
Speaker: Jai Menon (IBM)
• Poster Session 2: Distributed Computing Services, Distributed Computing systems and Experiences Coffee

### Coffee

#### Interlaken, Switzerland

• 121
A general and flexible framework for virtual organization application tests in a grid system
A grid system is a set of heterogeneous computational and storage resources, distributed on a large geographic scale, which belong to different administrative domains and serve several different scientific communities named Virtual Organizations (VOs). A virtual organization is a group of people or institutions which collaborate to achieve common objectives. Therefore such system has to guarantee the coexistence of different VO’s applications providing them the suitable run-time environment. Hence tools are needed both at local and central level for testing and detecting eventually bad software configuration on a grid site. In this paper we present a web based tool which permits to a Grid Operational Centre (GOC) or a Site Manager to test a grid site from the VO viewpoint. The aim is to create a central repository for collecting both existing and emerging VO tests. EachVO test may include one ore more specific application tests, and each test could include one ore more subtests, arranged in a hierarchic structure. A general and flexible framework is presented capable to include VO tests straightforwardly by means of a description file. Submission of a bunch of tests to a particular grid site is made available through a web portal. On the same portal, past and current results and logs can be browsed.
Speaker: T. Coviello (INFN Via E. Orabona 4 I - 70126 Bari Italy)
• 122
A Multidimensional Approach to the Analysis of Grid Monitoring Data
Analyzing Grid monitoring data requires the capability of dealing with multidimensional concepts intrinsic to Grid systems. The meaningful dimensions identified in recent works are the physical dimension referring to geographical location of resources, the Virtual Organization (VO) dimension, the time dimension and the monitoring metrics dimension. In this paper, we discuss the application of On-Line Analytical Processing (OLAP), an approach to the fast analysis of shared multidimensional information, to the mentioned problem. OLAP relies on structures called OLAP cubes', that are created by a reorganization of data contained inside a relational database, thus transforming operational data into dimensional data. Our OLAP model is a four-dimension cuboid based on time, geographic, Virtual Organization (VO), and monitoring metric. Time and geographic dimensions have total order relation and form two concept hierarchies, respectively hours
Speaker: G. Rubini (INFN-CNAF)
• 123
Alibaba: A heterogeneous grid-based job submission system used by the BaBarexperiment
The BaBar experiment has accumulated many terabytes of data on particle physics reactions, accessed by a community of hundreds of users. Typical analysis tasks are C++ programs, individually written by the user, using shared templates and libraries. The resources have outgrown a single platform and a distributed computing model is needed. The grid provides the natural toolset. However, in contrast to the LHC experiments, BaBar has an existing user community with an existing non-Grid usage pattern, and providing users with an acceptable evolution presents a challenge. The 'Alibaba' system, developed as part of the UK GridPP project, provides the user with a familiar command line environment. It draws on the existing global file systems employed and understood by the current user base. The main difference is that they submit jobs with a 'gsub' command that looks and feels like the familiar'qsub'. However it enables them to submit jobs to computer systems at different institutions, with minimal requirements on the remote sites. Web based job monitoring is also provided. The problems and features (the input and output sandboxes, authentication, data location) and their solutions are described.
Speaker: M. Jones (Manchester University)
• 124
An intelligent resource selection system based on neural network for optimal application performance in a grid environment
Grid computing is a large scale geographically distributed and heterogeneous system that provides a common platform for running different grid enabled applications. As each application has different characteristics and requirements, it is a difficult task to develop a scheduling strategy able to achieve optimal performance because application-specific and dynamic system status have to be taken into account. Moreover it may be possible to obtain optimal performance for multiple application simultaneously using a single scheduler. Hence in a lot of cases the application scheduling strategy is assigned to an expert application user who provides a ranking criterion for selecting the best computational element on a set of available resources. Such criteria are based on user perception of system capabilities and knowledge about the features and requirements of his application. In this paper an intelligent mechanism has been both implemented and evaluated to select the best computational resource in a grid environment from the application viewpoint. A neural network based system has been used to capture automatically the knowledge of a grid application expert user. The system scalability problem is also tackled and a preliminary solution based on sorting algorithm is discussed. The aim is to allow a common grid application user to benefit of this expertise.
Speaker: T. Coviello (DEE – POLITECNICO DI BARI, V. ORABONA, 4, 70125 – BARI,ITALY)
• 125
ARDA Project Status Report
The ARDA project was started in April 2004 to support the four LHC experiments (ALICE, ATLAS, CMS and LHCb) in the implementation of individual production and analysis environments based on the EGEE middleware. The main goal of the project is to allow a fast feedback between the experiment and the middleware development teams via the construction and the usage of end-to-end prototypes allowing users to perform analyses out of the present data sets from recent montecarlo productions. In this talk the project is presented with highlights of the first results and lessons learnt so far. The relations of the project with similar initiatives within and outside the High Energy Physics community are reviewed (notably in the EGEE application identification and support).
Speaker: The ARDA Team
• 126
Beyond Persistence: Developments and Directions in ATLAS Data Management
As ATLAS begins validation of its computing model in 2004, requirements imposed upon ATLAS data management software move well beyond simple persistence, and beyond the "read a file, write a file" operational model that has sufficed for most simulation production. New functionality is required to support the ATLAS Tier 0 model, and to support deployment in a globally distributed environment in which the preponderance of computing resources--not only CPU cycles but data services as well--reside outside the host laboratory. This paper takes an architectural perspective in describing new developments in ATLAS data management software, including the ATLAS event-level metadata system and related infrastructure, and the mediation services that allow one to distinguish writing from registration and selection from retrieval, in a manner that is consistent both for event data and for time-varying conditions. The ever-broader role of databases and catalogs, and issues relatedto the distributed deployment thereof, are also addressed.
Speaker: D. Malon (ANL)
• 127
Building the LCG: from middleware integration to production quality software
In the last few years grid software (middleware) has become available from various sources. However, there are no standards yet which allow for an easy integration of different services. Moreover, middleware was produced by different projects with the main goal of developing new functionalities rather than production quality software. In the context of the LHC Computing Grid project (LCG) an integration, testing and certification activity is ongoing which aims at producing a stable coherent set of services. Here we report on the processes employed to produce the LCG middleware release and related activities, including the infrastructures used, the activities needed to integrate the various components and the certification process. Our certification process consists of a continuous iterative cycle that also involves feedback from the LCG production system and input from the software providers. The architecture of the LCG middleware is described, including additional components developed by LCG to improve scalability and performance. Other associated activities include packaging for deployment, porting to different platforms, debugging and patching of the software. Functionality and stress tests are performed via a large test-bed infrastructure that allows for benchmarking of different configurations. We describe also the results of our tests and our experience collected during the building of the LCG infrastructure.
Speaker: L. Poncet (LAL-IN2p3)
• 128
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory
A description of a Condor-based, Grid-aware batch software system configured to function asynchronously with a mass storage system is presented. The software is currently used in a large Linux Farm (2700+ processors) at the RHIC and ATLAS Tier 1 Computing Facility at Brookhaven Lab. Design, scalability, reliability, features and support issues with a complex Condor-based batch system are addressed within the context of a Grid-like, distributed computing environment.
Speaker: T. Wlodek (Brookhaven National Lab)
• 129
CERN Modular Physics Screensaver or Using spare CPU cycles of CERN's Desktop PCs
CERN has about 5500 Desktop PCs. These computers offer a large pool of resources that can be used for physics calculations outside office hours. The paper describes a project to make use of the spare CPU cycles of these PCs for LHC tracking studies. The client server application is implemented as a lightweight, modular screensaver and a Web Application containing the physics job repository. The information exchange between client and server is done using the HTTP protocol. The design and implementation is presented together with results of performance and scalability studies. A typical LHC tracking study involves some 1500 jobs, each over 100,000 turns, requiring about 1 hour of CPU on a modern PC. A reliable and easy to use Linux interface to the CPSS Web application has been provided. It has been used for a production run of 15,000 jobs, using some 50 desktop Windows PCs, which uncovered a numerical incompatibility between Windows 2000 and XP. It is expected to make available up to two orders of magnitude more computing power for these studies at zero cost.
Speaker: A. Wagner (CERN)
• 130
Cross Experiment Workflow Management: The Runjob Project
Building on several years of sucess with the MCRunjob projects at DZero and CMS, the fermilab sponsored joint Runjob project aims to provide a Workflow description language common to three experiments: DZero, CMS and CDF. This project will encapsulate the remote processing experiences of the three experiments in an extensible software architecture using web services as a communication medium. The core of the Runjob project will be the Shahkar software packages that provide services for describing jobs and targeting them at different execution environments. A common interface to multiple storage and compute grid elements will be provided, alllowing the three experiments to share hardware resources in a transparent manner. Several tools provided by Shahkar are discussed including FileMetaBrokers, hich provide a uniform way to handle files and metadata over a distributed cluster, the ShREEK runtime execution environment that allows executable jobs to provide a real time monitoring and control interface to any system, the scriptObject generic task encapsulation objects and XMLProcessor object persistency tool.
Speaker: P. Love (Lancaster University)
• 131
D0 data processing within EDG/LCG
The D0 experiment at the Tevatron is collecting some 100 Terabytes of data each year and has a very high need of computing resources for the various parts of the physics program. D0 meets these demands by establishing a world - increasingly based on GRID technologies. Distributed resources are used for D0 MC production and data reprocessing of 1 billion events, requiring 250 TB to be transported over WANs. While in 2003 most of this computing at remote sites was distributed manually, some data reprocessing was performed with the EDG. In 2004 GRID tools are increasingly and successfully employed. We will report on performing MC production and data reprocessing using EDG and LCG. We will explain how the D0 computing environment was linked to these GRID platforms, and will discuss some lessons learned (for both Grid computing and preparing applications for distributed operation) from the D0 reprocessing on EDG, subjecting a generic Grid infrastructure to real data for the first time. An outlook on plans for applying LCG within D0 is given.
Speaker: T. Harenberg (UNIVERSITY OF WUPPERTAL)
• 132
Data management services of NorduGrid
In common grid installations, services responsible for storing big data chunks, replication of those data and indexing their availability are usually completely decoupled. And a task of synchronizing data is passed to either user-level tools or separate services (like spiders) which are subject to failure and usually cannot perform properly if one of underlying services fails too. The NorduGrid Smart Storage Element (SSE) was designed to try to overcome those problems by combining the most desirable features into one service. It uses HTTPS/G for secure data transfer, Web Services for control (through same HTTPS/G channel) and can provide information to indexing services used in middlewares based on the Globus Toolkit (TM). At the moment, those are the Replica Catalog and the Replica Location Service. The modular internal design of the SSE and the power of C++ object programming allows to add support for other indexing services in an easy way. There are plans to complement it with a Smart Indexing Service capable of resolving inconsistencies hence creating a robust distributed data storage system.
Speaker: O. Smirnova (Lund University, Sweden)
• 133
Data Rereprocessing on Worldwide Distributed Sytems
Abstract: The D0 experiment faces many challenges enabling access to large datasets for physicists on 4 continents. The strategy of solving these problems on worlwide distributed computing clusters is followed. Already since the begin of TEvatron RunII (March 2001) all Monte-Carlo simulations are produced outside of Fermilab at remote systems. For analyses as system of regional analysis centers (RACs) was established which supply the associated institutes with the data. This structure which is similar the the Tier structure foreseen for LHC was used in autumn 2003 to rereprocess all D0-data with the uptodate and much improved recontruction software. As the first running experiment D0 has implemented and operated all important computing dask of a high energy physics experiment on worldwide distributed systems. The experiences gained in D0 can be applied to judge the LHC computing model.
Speaker: D. Wicke (Fermilab)
• 134
Database Usage and Performance for the Fermilab Run II Experiments
The Run II experiments at Fermilab, CDF and D0, have extensive database needs covering many areas of their online and offline operations. Delivery of the data to users and processing farms based around the world has represented major challenges to both experiments. The range of applications employing databases includes data management, calibration (conditions), trigger information, run configuration, run quality, luminosity, and others. Oracle is the primary database product being used for these applications at Fermilab and some of its advanced features have been employed, such as table partitioning and replication. There is also experience with open source database products such as MySQL for secondary databases. A general overview of the operation, access patterns, and transaction rates is examined and the potential for growth in the next year presented. The two experiments, while having similar requirements for availability and performance, employ different architectures for database access. Details of the experience for these approaches will be compared and contrasted, as well as the evolution of the delivery systems throughout the run. Tools employed for monitoring the operation and diagnosing problems will also be described.
Speaker: L. Lueking (FERMILAB)
• 135
Deployment of SAM for the CDF Experiment
CDF is an experiment at the Tevatron at Fermilab. One dominating factor of the experiments' computing model is the high volume of raw, reconstructed and generated data. The distributed data handling services within SAM move these data to physics analysis applications. The SAM system was already in use at the D-Zero experiment. Due to difference in the computing model of the two experiments some aspects of the SAM system had to be adapted. We will present experiences from the adaptation and the deployment phase. This includes the behavior of the SAM system on batch systems of very different sizes and type as well as the interaction between the datahandling and the storage systems, ranging from disk pools to tape systems. In particular we will cover the problems faced on large scale compute farms. To accommodate the needs of Grid computing, CDF deployed installations consisting of SAM for datahandling and CAF for high throughput batch processing. The CDF experiment already had experiences with the CAF system. We will report on the deployment of the combined system.
Speaker: S. Stonjek (Fermi National Accelerator Laboratory / University of Oxford)
• 136
DIRAC Lightweight information and monitoring services using XML-RPC and Instant Messaging
The DIRAC system developed for the CERN LHCb experiment is a grid infrastructure for managing generic simulation and analysis jobs. It enables jobs to be distributed across a variety of computing resources, such as PBS, LSF, BQS, Condor, Globus, LCG, and individual workstations. A key challenge of distributed service architectures is that there is no single point of control over all components. DIRAC addresses this via two complementary features: a distributed Information System, and an XMPP (Extensible Messaging and Presence Protocol) Instant Messaging framework. The Information System provides a concept of local and remote information sources. Any information which is not found locally will be fetched from remote sources. This allows a component to define its own state, while fetching the state of other components directly from those components, or via a central Information Service. We will present the architecture, features, and performance of this system. XMPP has provided DIRAC with numerous advantages. As an authenticated, robust,lightweight, and scalable asynchronous message passing system, XMPP is used, in addition to XML-RPC, for inter- Service communication, making DIRAC very fault-tolerant, a critical feature when using Service Oriented Architectures. XMPP is also used for monitoring real-time behaviour of the various DIRAC components. Finally, XMPP provides XML-RPC like facilities which are being developed to provide control channels direct to Services, Agents, and Jobs. We will describe our novel use of Instant Messaging in DIRAC and discuss directions for the future.
Speaker: I. Stokes-Rees (UNIVERSITY OF OXFORD PARTICLE PHYSICS)
• 137
DIRAC Workload Management System
The Workload Management System (WMS) is the core component of the DIRAC distributed MC production and analysis grid of the LHCb experiment. It uses a central Task database which is accessed via a set of central Services with Agents running on each of the LHCb sites. DIRAC uses a 'pull' paradigm where Agents request tasks whenever they detect their local resources are available. The collaborating central Services allow new components to be plugged in easily. These Services can perform functions such as scheduling optimization, task prioritization, job splitting and merging, to name a few. They provide also job status information for various monitoring clients. We will discuss the services deployment and operation with particular emphasis on the robustness and scalability issues. The distributed Agents have modular design which allows easy functionality extensions to adapt to the needs of a particular site. The Agent installation have only basic pre-requisites which makes it easy for new sites to be incorporated. An Agent can be deployed on a gatekkeeper of a large cluster or just on a single worker node of the LCG grid. PBS,LSF,BQS, Condor,LCG,Globus can be used as the DIRAC computing resources. The WMS components use XML-RPC and instant messaging Jabber protocols for communication which increases the overall reliability of the system. The jobs handled by the WMS are described using Classad library which facilitates the interoperability with other grids.
Speaker: V. garonne (CPPM-IN2P3 MARSEILLE)
• 138
Distributed computing and oncological radiotherapy: technology transfer from HEP and experience with prototype systems
We show how nowadays it is possible to achieve the goal of accuracy and fast computation response in radiotherapic dosimetry using Monte Carlo methods, together with a distributed computing model. Monte Carlo methods have never been used in clinical practice because, even if they are more accurate than available commercial software, the calculation time needed to accumulate sufficient statistics is too long for a realistic use in radiotherapic treatment. We present a complete, fully functional prototype dosimetric system for radiotherapy, integrating various components based on HEP software systems: a Geant4-based simulation, an AIDA-based dosimetric analysis, a web-based user interface, and distributed processing either on a local computing farm or on geographically spread nodes. The performance of the dosimetric system has been studied in three execution modes: sequential on a single dedicated machine, parallel on a dedicated computing farm, parallel on a grid test-bed. An intermediate software layer, the DIANE system, makes the three execution modes completely transparent to the user, allowing to use the same code in any of the three configurations. Thanks to the integration in a grid environment, any hospital, even small ones or in less wealthy countries, that could not afford the high costs of commercial treatment planning software, may get the chance of using advanced software tools for oncological therapy, by accessing distributed computing resources, shared with other hospitals and institutes belonging to the same virtual organization
Speaker: M.G. Pia (INFN GENOVA)
• 139
Distributed Testing Infrastructure and Processes for the EGEE Grid Middleware
Extensive and thorough testing of the EGEE middleware is essential to ensure that a production quality Grid can be deployed on a large scale as well as across the broad range of heterogeneous resources that make up the hundreds of Grid computing centres both in Europe and worldwide. Testing of the EGEE middleware encompasses the tasks of both verification and validation. In adition we test the integrated middleware for stability, platform independence, stress resilience, scalability and performance. The EGEE testing infrastructure is distributed across three major EGEE grid centres in three countries: CERN, NIKHEF and RAL. As much as is possible the testing procedures are automated and integrated with the EGEE build system. This allows for continuous testing together with the incremental daily code builds, fast and early feedback to developers of bug, and for the easy inclusion of regression tests. This paper will report on the initial results of the testing procedures, frameworks and automation techniques adopted by the EGEE project, the advantages and disadvantages of test automation and the issues involved in testing a complex distributed middleware system in a distributed environment.
Speaker: L. Guy (CERN)
• 140
Distributed Tracking, Storage, and Re-use of Job State Information on the Grid
The Logging and Bookkeeping service tracks job passing through the Grid. It collects important events generated by both the grid middleware components and applications, and processes them at a chosen L&B server to provide the job state. The events are transported through secure reliable channels. Job tracking is fully distributed and does not depend on a single information source, the robustness is achieved through speculative job state computation in case of reordered, delayed or lost events. The state computation is easily adaptable to modified job control flow. The events are also passed to the related Job Provenance service. Its purpose is a long-term storage of information on job execution, environment, and the executable and input sandbox files. The data can be used for debugging, post-mortem analysis, or re-running jobs. The data are kept by the job-provenance storage service in a compressed format, accessible on per-job basis. A complementary index service is able to find particular jobs according to configurable criteria, e.g. submission time or "tags" assigned by the user. A user client to support job re-execution is planned. Both the L&B and Job Provenance index server provide web-service interfaces for querying. Those interfaces comply with the On-demand producer specification of the R-GMA infrastructure. Hence R-GMA capabilities can be utilized to perform complex distributed queries across multiple servers. Also, aggregate information about job collections can be easily provided. The L&B service was deployed in the EU DataGrid and Cern LCG projects, the Job Provenance will be deployed in the EGEE project.
Speaker: L. Matyska (CESNET, CZECH REPUBLIC)
• 141
Experience integrating a General Information System API in LCG Job Management and Monitoring Services
In a Grid environment, the access to information on system resources is a necessity in order to perform common tasks such as matching job requirements with available resources, accessing files or presenting monitoring information. Thus both middleware service, like workload and data management, and applications, like monitoring tools, requiere an interface to the Grid information service which provides that data. Even though a unique schema for the published information is defined, actual implementations use different data models, and define different access protocols. Applications interacting with the information service must therefore deal with several APIs, and be aware of the underlying technology in order to use the appropiate syntax for their queries or to publish new information. We have produced a new hign level C++ API that accomodates several existing implementations of the information service such as Globus MDS(LDAP based), MDS3(XML based) an R-GMA(SQL based). It allows applications to access information in a transparent manner loading the needed implementation specific library on demand. Features allowing for the adding and removal of dynamic information have been included as well. A general query language to make the API compatible with future protocols has been used. In this paper we described the design of this API and the results obtained integrating this API in the Workload Management system and in the GridIce monitoring system of LCG.
Speaker: P. Mendez Lorenzo (CERN IT/GD)
• 142
Experience with Deployment and Operation of the ATLAS Production System and the Grid3+ Infrastructure at Brookhaven National Lab
This paper describes the deployment and configuration of the production system for ATLAS Data Challenge 2 starting in May 2004, at Brookhaven National Laboratory, which is the Tier1 center in the United States for the International ATLAS experiment. We will discuss the installation of Windmill (supervisor) and Capone (executor) software packages on the submission host and the relevant security issues. The Grid3+ infrastructure and information service are used for the deployment of grid enabled ATLAS transformations on the Grid3+ computing elements. The Tier 1 hardware configuration includes 95 dual processor Linux compute nodes, 24 TB of NFS disk and an HPSS mass storage system. VOMS server maintains both VO services for US ATLAS and BNL local site policies. This paper describes the work of optimizing the performance and efficiency of this configuration.
Speaker: X. Zhao (Brookhaven National Laboratory)
• 143
Experiment Software Installation experience in LCG-2
The management of Application and Experiment Software represents a very common issue in emerging grid-aware computing infrastructures. While the middleware is often installed by system administrators at a site via customized tools that serve also for the centralized management of the entire computing facility, the problem of installing, configuring and validating Gigabytes of Virtual Organization (VO) specific software or frequently changing user applications remains an open issue. Following the requirements imposed by the experiments, in the LHC Computing Grid (LCG) Experiment Software Managers (ESM) are designated people with privileges of installing, removing and validating software for a specific VO on a per site basis. They can manage univocally identifying tags in the LCG Information System to announce the availability of a specific software version. Users of a VO can then select, via the published tag, sites to run their jobs. The solution adopted by LCG has mainly served its purpose but it presents many problems. The requirement imposed by the present solution for the existence of a shared file-system in a computing farm poses performance, reliability and scalability issues for large installations. With this work we present a more flexible service based on P2P technology that has been designed to tackle the limitation of the current system. This service allows the ESM to propagate the installation occuring in a given WN to the rest of the farm elements. We illustrate the deployment, the design, preliminary results obtained and the feedback from the LHC experiments and sites that have adopted it.
Speaker: R. santinelli (CERN/IT/GD)
• 144
Federating Grids: LCG meets Canadian HEPGrid
A large number of Grids have been developed, motivated by geo-political or application requirements. Despite being mostly based on the same underlying middleware, the Globus Toolkit, they are generally not inter-operable for a variety of reasons. We present a method of federating those disparate grids which are based on the Globus Toolkit, together with a concrete example of interfacing the LHC grid(LCG) with HEPGrid. HEPGrid consists of shared resources, at several Canadian research institutes, which are exposed via Globus gatekeepers, and makes use of Condor-G for resource advertisement, matchmaking and job submission. An LCG Computing Element(CE) based at the TRIUMF Laboratory hosts a HEPGrid User Interface(UI) which is contained within a custom jobmanager. This jobmanager appears in the LCG information system as a normal CE publishing an aggregation of the HEPGrid resources. The interface interprets the incoming job in terms of HEPGrid UI usage, submits it onto HEPGrid, and implements the jobmanager 'poll' and 'remove' methods, thus enabling monitoring and control across the grids. In this way non-LCG resources are integrated into LCG, without the need for LCG middleware on those resources. The same method can be used to create interfaces between other grids, with the details of the child-Grid being fully abstracted into the interface layer. The LCG-HEPGrid interface is operational, and has been used to federate 1300 CPU's at 4 sites into LCG for the Atlas Data Challenge (DC2).
Speaker: R. Walker (Simon Fraser University)
• 145
Generic logging layer for the distributed computing
Most HENP experiment software includes a logging or tracing API allowing for displaying in a particular format important feedback coming from the core application. However, inserting log statements into the code is a low-tech method for tracing the program execution flow and often leads to a flood of messages in which the relevant ones are occluded. In a distributed computing environment, accessing the information via a log-file is no longer applicable and the approach fails to provide runtime tracing. Running a job involves a chain of events where many components are involved often written in diverse languages and not offering a consistent and easily adaptable interface for logging important events. We will present an approach based on a new generic layer built on top of a logger family derived from the Jakarta log4j project that includes log4cxx, log4c, log4perl packages. This provides consistency across packages and framework. Additionally, the power of using log4j, is the possibility to enable logging (or features) at runtime without modifying the application binary or the wrapper layers. We provide a C++ abstract class library that serves as a proxy between the application framework and the distributed environment. The approach is designed so that the debugging statements can remain in shipped code without incurring a heavy performance cost. Logging equips the developer with as detailed context as necessary for application failures, from testing, quality assurance to a production mode limited amount of information. We will explain and show its implementation in the STAR production environment.
Speaker: V. Fine (BROOKHAVEN NATIONAL LABORATORY)
• 146
GILDA: a Grid for dissemination activities
Computational and data grids are now entering a more mature phase where experimental test-beds are turned into production quality infrastructures operating around the clock. All this is becoming true both at national level, where an example is the Italian INFN production grid (http://grid-it.cnaf.infn.it), and at the continental level, where the most strinking example is the European Union EGEE Project Infrastructure (http://www.eu-egee.org). However, the impact of grid technologies on the next future way of doing e-science and research in Europe will be proportional to the capability of National and European Grid Infrastructures to attract and serve many diverse scientific and industrial communities through serious and detailed dissemination and tutoring programs. In this contribution we present GILDA, the Grid Infn Laboratory for Dissemination Activities (http://gilda.ct.infn.it). GILDA is a complete suite of grid elements (Certification Authority, Virtual Organization, Distributed Test-bed, Grid Demonstrator, etc.) completely devoted to dissemination activities. GILDA can also act as a fast-prototyping test-bed where to start the porting/interfacing of new applications with the grid middle-ware. The use and exploitation of GILDA in the context of the Network Activities of the EGEE Project will be discussed.
Speaker: R. Barbera (Univ. Catania and INFN Catania)
• 147
Global Grid User Support for LCG
For very large projects like the LHC Computing Grid Project (LCG) involving 8,000 scientists from all around the world, it is an indispensable requirement to have a well organized user support. The Institute for Scientific Computing at the Forschungszentrum Karlsruhe started implementing a Global Grid User Support (GGUS) after official assignment of the Grid Deployment Board in March 2003. For this purpose a web portal and a helpdesk application have been developed. As a single entry point for all Grid related issues and problems GGUS follows the objectives of providing news, documentation and status information about Grid resources. The user will find forms to submit and track service requests. GGUS collaborates with different support teams in the Grid environment like the Grid Operations Center and the Experiment Specific Support. They can access the helpdesk system via web interface. GGUS stores all the incoming trouble tickets and outgoing solutions in a central database and plans to build up a knowledge base where all the information can be offered in a structured manner. As a prototype GGUS started operation at the Forschungszentrum Karlsruhe in October 2003 and supported local user groups of the German Tier 1 Computing Center, called GridKa. 4 month later the GGUS system was opened for the LCG community. The GGUS system will be explained and demonstrated. The present status of GGUS within the LCG environment will be discussed.
Speaker: T. ANTONI (GGUS)
• 148
GoToGrid - A Web-Oriented Tool in Support to Sites for LCG Installations
The installation and configuration of LCG middleware, as it is currently being done, is complex and delicate. An “accurate” configuration of all the services of LCG middleware requires a deep knowledge of the inside dynamics and hundreds of parameters to be dealt with. On the other hand, the number of parameters and flags that are strictly needed in order to run a working ”default” configuration of the middleware is relatively small, due to the fact that the values to be set mainly deal with environment configuration and with a limited set of possible operation scenarios. This “default” configuration appears to be the most suitable for sites joining LCG for the first time. The GoToGrid system is aimed to support Site Administrators to easily perform such a configuration. G2G combines the gathering of configuration information, provided by sites, with the dynamic adaptive creation of customized documentation and installation tools. By using a web interface and being requested only for the relevant configuration information, site Administrators will be able to design the desired configuration of their own LCG site. Site configuration data is collected and stored in a well defined format liable to be used as the interface to different configuration management tools.
Speaker: A. Retico (CERN)
• 149
Grid Deployment Experiences: The path to a production quality LDAP based grid information system
This paper reports on the deployment experience of the defacto grid information system, Globus MDS, in a large scale production grid. The results of this experience led to the development of an information caching system based on a standard openLDAP database. The paper then describes how this caching system was developed further into a production quality information system including a generic framework for information providers. This includes the deployment and operation experience and the results from performance tests on the information system to assess the scalability limits of it.
Speaker: L. Field (CERN)
• 150
GROSS: an end user tool for carrying out batch analysis of CMS data on the LCG-2 Grid.
GROSS (GRidified Orca Submission System) has been developed to provide CMS end users with a single interface for running batch analysis tasks over the LCG-2 Grid. The main purpose of the tool is to carry out job splitting, preparation, submission, monitoring and archiving in a transparent way which is simple to use for the end user. Central to its design has been the requirement for allowing multi-user analyses, and to accomplish this all persistent information is stored on a backend MySQL database. This database is additionally shared with BOSS, to which GROSS interfaces in order to provide job submission and real time monitoring capability. In this paper we present an overview of GROSS's architecture and functionality and report on first user tests of the system using CMS Data Challenge 2004 data (DC04).
Speaker: H. Tallini (IMPERIAL COLLEGE LONDON)
• 151
Installing and Operating a Grid Infrastructure at DESY
DESY is one of the world-wide leading centers for research with particle accelerators and a center for research with synchrotron light. The hadron-electron collider HERA houses four experiments which are taking data and will be operated until 2006 at least. The computer center manages a data volumes of order 1 PB and is the home for around 1000 CPUs. In 2003 DESY started to set up a Grid infrastructure on site. Monte Carlo production is the primer HEP application candidate for the Grid at DESY. The experiments have started major tests. A first Grid Testbed was based on EDG 1.4. Some effort was taken to install the binary distribution of the middleware on SuSE based Linux systems at DESY. With the first fixed LCG-2 release in spring 2004, the Grid Testbed2 was installed, which serves as the basis for all further DESY activities. The contribution to CHEP2004 will start by briefly summarizing the status of the Grid activities at DESY in the context of EGEE and D-GRID, in which DESY takes a leading role. In the following, we will discuss the integration of Grid components in the infrastructure of the DESY computer center. This includes technical aspects of the operating system, such as SuSE versus RedHat Linux, the interaction with the mass storage system, and the management of Virtual Organizations. We will finish with discussing installation and operation experiences of Grid middleware at DESY, also having in mind HEP and future synchrotron light experiments in the X-FEL era.
Speaker: A. Gellrich (DESY)
• 152
JIM Deployment for the CDF Experiment
JIM (Job and Information Management) is a grid extension to the mature data handling system called SAM (Sequential Access via Metadata) used by the CDF, DZero and Minos Experiments based at Fermilab. JIM uses a thin client to allow job submissions from any computer with Internet access, provided the user has a valid certificate or kerberos ticket. On completion the job output can be downloaded using a web interface. The JIM execution site software can be installed on shared resources, such as ScotGRID, as it may be configured for any batch system and does not require exclusive control of the hardware. Resources that do not belong entirely to CDF and thus cannot run DCAF (Decentralised CDF Analysis Farm), may therefore be accessed using JIM. We will report on the initial deployment of JIM for CDF and the steps taken to integrate JIM with DCAF.
Speaker: M. Burgon-Lyon (UNIVERSITY OF GLASGOW)
• 153
Job Interactivity using a Steering Service in an Interactive Grid Analysis Environment
In the context of Interactive Grid-Enabled Analysis Environment (GAE), physicists desire bi-directional interaction with the job they submitted. In one direction, monitoring information about the job and hence a “progress bar” should be provided to them. On other direction, physicist should be able to control their jobs. Before submission, they may direct the job to some specified resource or computing element. Before execution, its parameter may be changed or it may be moved to another location. During execution, its intermediate results should be fetched or it may be moved to another location. Also, physicists should be able to kill, restart, hold and resume their jobs. Interactive job execution requires that at each step, the user must make choices between alternative application components, files, or locations. So a dead end may be reached where no solution can be found, which would require backtracking to undo some previous choice. Another desire is reliable and optimal execution of the job. Grid should take some decisions regarding the job execution to help in reliable and optimal execution of the job. Reliability can be achieved using the job recovery mechanism. When a job on grid fails, the recovery mechanism should resubmit the job on either the same resource or on different resource. Check-pointing the job will make resource utilization low when recovering the job from failure. In this paper the architecture and design of an autonomous grid service is described that fulfills the above stated requirements for interactivity in Grid-enabled data analysis.
Speaker: A. Anjum (NIIT)
• 154
Job Monitoring in Interactive Grid Analysis Environment
Grid is emerging as a great computational resource but its dynamic behaviour makes the Grid environment unpredictable. System failure or network failure can occur or the system performance can degrade. So once the job has been submitted monitoring becomes very essential for user to ensure that the job is completed in an efficient way. In current environments once user submits a job he loses direct control over the job, system behaves like a batch system, user submits the job and gets the result back. Only information a user can obtain about a job is whether it is scheduled, running, cancelled or finished. This information is enough from the Grid management point of view but not from the point of view of a user. User wants interactive environment in which he can check the progress of the job, obtain intermediate results, terminate the job based on the progress of job or intermediate results, steer the job other nodes to achieve better performance and check the resources consumed by the job. So a mechanism is needed that can provide user with secure access to information about different attributes of a job. In this paper we describe a monitoring service, a java based web service that will provide secure access to different attributes of a job once a job has been submitted to Interactive Grid Analysis Environment.
Speaker: A. Anjum (NIIT)
• 155
Job-monitoring over the Grid with GridIce infrastructure.
In a wide-area distributed and heterogeneous grid environment, monitoring represents an important and crucial task. It includes system status checking, performance tuning, bottlenecks detecting, troubleshooting, fault notifying. In particular a good monitoring infrastructure must provide the information to track down the current status of a job in order to locate any problems. Job monitoring requires interoperation between the monitoring system and other grid services. Currently development and deployment LCG testbeds integrate GridICE monitoring system which measures and publics the state of a grid resource at a particular point in time. In this paper we present the efforts to integrate in the current GridICE infrastructure, additional useful information about job status, e.g. the name of job, the virtual organization to which it belongs, eventually real and mapped user who has submitted the job, the effective CPU time consumed and its exit status.
Speakers: G. Donvito (UNIVERSITà DEGLI STUDI DI BARI), G. Tortone (INFN Napoli)
• 156
K5 @ INFN.IT: an infrastructure for the INFN cross REALM & AFS cell authentication.
The infn.it AFS cell has been providing a useful single file-space and authentication mechanism for the whole INFN, but the lack of a distributed management system, has lead several INFN sections and LABs to setup local AFS cells. The hierarchical transitive cross-realm authentication introduced in the Kerberos 5 protocol and the new versions of the OpenAFS and MIT implementation of Kerberos 5, make possible to setup an AFS cross cell authentication in a transparent way, using the Kerberos 5 cross-realm one. The goal of the K5 @ INFN.IT project is to provide a Kerberos 5 authentication infrastructure for the INFN and cross-realm authentication to be used for the cross cell AFS authentication. In this work we describe the scenario, the results of various tests performed, the solution chosen and the status of the K5 @ INFN.IT project.
Speaker: E.M.V. Fasanelli (I.N.F.N.)
• 157
LEXOR, the LCG-2 Executor for the ATLAS DC2 Production System
In this paper we present an overview of the implementation of the LCG interface for the ATLAS production system. In order to take profit of the features provided by DataGRID software, on which LCG is based, we implemented a Python module, seamless integrated into the Workload Management System, which can be used as an object-oriented API to the submission services. On top of it we implemented Lexor,an executor component conforming to the pull/push model designed by the DC2 production system team. It pulls job descriptions from the supervisor component and uses them to create job objects, which in turn are submitted to the Grid. All the typical Grid operations (match-making with respect to input data location, registration of output data in the replica catalog, workload balancing) are performed by the underlying middleware, while interactions with ATLAS metadata catalog and the production database are granted by the integration with the Data Management System (Don Quijote) client module and via XML messages to the production supervisor (Windmill).
Speaker: D. Rebatto (INFN - MILANO)
• 158
LHC data files meet mass storage and networks: going after the lost performance
Experiments frequently produce many small data files for reasons beyond their control, such as output splitting into physics data streams, parallel processing on large farms, database technology incapable of concurrent writes into a single file, and constraints from running farms reliably. Resulting data file size is often far from ideal for network transfer and mass storage performance. Provided that time to analysis does not significantly deteriorate, files arriving from a farm could easily be merged into larger logical chunks, for example by physics stream and file type within a configurable time and size window. Uncompressed zip archives seem an attractive candidate for such file merging and are currently tested by the CMS experiment. We describe the main components now in use: the merging tools, tools to read and write zip files directly from C++, plug-ins to the database system, mass-storage access optimisation, consistent handling of application and replica metadata, and integration with catalogues and other grid tools. We report on the file size ratio obtained in the CMS 2004 data challenge and observations and analysis on changes to data access as well as estimated impact on network usage.
Speaker: L. Tuura (NORTHEASTERN UNIVERSITY, BOSTON, MA, USA)
• 159
Mass Storage Management and the Grid
The University of Edinburgh has an significant interest in mass storage systems as it is one of the core groups tasked with the roll out of storage software for the UK's particle physics grid, GridPP. We present the results of a development project to provide software interfaces between the SDSC Storage Resource Broker, the EU DataGrid and the Storage Resource Manager. This project was undertaken in association with the eDikt group at the National eScience Centre, the Universities of Bristol and Glasgow, Rutherford Appleton Laboratory and the San Diego Supercomputing Center.
Speaker: S. Thorn
• 160
MONARC2: A Processes Oriented, Discrete Event Simulation Framework for Modelling and Design of Large Scale Distributed Systems.
The design and optimization of the Computing Models for the future LHC experiments, based on the Grid technologies, requires a realistic and effective modeling and simulation of the data access patterns, the data flow across the local and wide area networks, and the scheduling and workflow created by many concurrent, data intensive jobs on large scale distributed systems. This paper presents the latest generation of the MONARC (MOdels of Networked Analysis at Regional Centers) simulation framework, as a design and modelling tool for large scale distributed systems applied to HEP experiments. A process-oriented approach for discrete event simulation is used for describing concurrent running programs, as well as the stochastic arrival patterns that characterize how such systems are used. The simulation engine is based on Threaded Objects, (or Active Objects) which offer great flexibility in simulating the complex behavior of distributed data processing programs. The engine provides an appropriate scheduling mechanism for the Active Objects with efficent support for interrupts. The framework provides a complete set of basic components (processing nodes, data servers, network components) together with dynamically loadable decision units (scheduling or data replication modules) for easily building complex Computing Model simulations. Examples of simulating complex data processing systems, specific for the LHC experiments (production tasks associated with data replication and interactive analysis on distributed farms) are presented, and the way the framework is used to compare different decision making algorithms or to optimize the overall Grid architecture.
Speaker: I. Legrand (CALTECH)
• 161
Monitoring a Petabyte Scale Storage System
Fermilab operates a petabyte scale storage system, Enstore, which is the primary data store for experiments' large data sets. The Enstore system regularly transfers greater than 15 Terabytes of data each day. It is designed using a client-server architecture providing sufficient modularity to allow easy addition and replacement of hardware and software components. Monitoring of this system is essential to insure the integrity of the data that is stored in it and to maintain the high volume access that this system supports. The monitoring of this distributed system is accomplished using a variety of tools and techniques that present information for use by a variety of roles (operator, storage system administrator, storage software developer, user). All elements of the system are monitored: performance, hardware, firmware, software, network, data integrity. We will present details of the deployed monitoring tools with an emphasis on the different techniques that have proved useful to each role. Experience with the monitoring tools and techniques, what worked and what did not will be presented.
Speaker: E. Berman (FERMILAB)
• 162
Monitoring CMS Tracker construction and data quality using a grid/web service based on a visualization tool
The complexity of the CMS Tracker (more than 50 million channels to monitor) now in construction in ten laboratories worldwide with hundreds of interested people , will require new tools for monitoring both the hardware and the software. In our approach we use both visualization tools and Grid services to make this monitoring possible. The use of visualization enables us to represent in a single computer screen all those million channels at once. The Grid will make it possible to get enough data and computing power in order to check every channel and also to reach the experts everywhere in the world allowing the early discovery of problems . We report here on a first prototype developed using the Grid environment already available now in CMS i.e. LCG2. This prototype consists on a Java client which implements the GUI for Tracker Visualization and a few data servers connected to the tracker construction database , to Grid catalogs of event datasets or directly to test beam setups data acquisition . All the communication between client and servers is done using data encoded in xml and standard Internet protocols. We will report on the experience acquired developing this prototype and on possible future developments in the framework of an interactive Grid and a virtual counting room allowing complete detector control from everywhere in the world.
Speaker: G. Zito (INFN BARI)
• 163
Multi-Terabyte EIDE Disk Arrays running Linux RAID5
High-energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible. Grid Computing is one method; however, the data must be cached at the various Grid nodes. We examine some storage techniques that exploit recent developments in commodity hardware. Disk arrays using RAID level 5 (RAID5) include both parity and striping. The striping improves access speed. The parity protects data in the event of a single disk failure, but not in the case of multiple disk failures. We report on tests of dual-processor Linux Software RAID5 arrays and Hardware RAID5 arrays using the 12- disk 3ware controller, in conjunction with 300 GB disks, for use in offline high-energy physics data analysis. The price of IDE disks is now less than \$1/GB. These RAID5 disk arrays can be scaled to sizes affordable to small institutions and used when fast random access at low cost is important.
Speaker: D. Sanders (UNIVERSITY OF MISSISSIPPI)
• 164
New distributed offline processing scheme at Belle
The Belle experiment has accumulated an integrated luminosity of more than 240fb-1 so far, and a daily logged luminosity has exceeded 800pb-1. This requires more efficient and reliable way of event processing. To meet this requirement, new offline processing scheme has been constructed, based upon technique employed for the Belle online reconstruction farm. Event processing is performed at PC farms, which consists of 60 quad(0.7GHz) and 225 dual(1.3GHz or 3.2GHz) CPU PC nodes. Raw event data are read from a Solaris tape server connected to a DTF2 tape drive, and they are distributed over all PC nodes. Reconstructed events are recorded onto 8 file servers, which are newly installed last year. To maximize processing capabilities, various optimizations such as PC clustering, job control, output data management and so on have been done. As a result, processing power with this scheme has been more than doubled, which corresponds to that more than 3 fb-1 of beam data per day can be processed. In this talk, stable operation of our new system, together with a description of the Belle offline computing model, will be demonstrated by showing computing performance obtained from experience in processing beam data.
Speaker: I. Adachi (KEK)
• 165
On the Management of Certification Authority in Large Scale GRID Infrastructure
The scope of this work is the study of scalability limits of the Certification Authority (CA), running for large scale GRID environments. The operation of Certification Authority is analyzed from the view of the rate of incoming requests, complexity of authentication procedures, LCG security restrictions and other limiting factors. It is shown, that standard CA operational model has some native "bottlenecks", which can be resolved with proper management and technical tools. The central point is the discussion of "decentralized" scheme with single CA and multiple authentication agents, called Registration Authorities (RA). Single CA retains a role for technical center, responsible for support of GRID security infrastructure, while general role of RAs is verification of requests from end-users. Practical implementation of this scheme (including the development and installation of end-user software) have been done in CERN in 2002 (http://service-grid-ca.web.cern.ch/service-grid-ca/help/RA.html). Second implementation of the same ideas was the GRID project of the Russia Ministry of Atomic Energy, 2003 (http://grid.ihep.su/MAG/). These two implementations are compared in aspects of security and functionality.
Speaker: E. Berdnikov (INSTITUTE FOR HIGH ENERGY PHYSICS, PROTVINO, RUSSIA)
• 166
OptorSim: a Simulation Tool for Scheduling and Replica Optimisation in Data Grids
In large-scale Grids, the replication of files to different sites is an important data management mechanism which can reduce access latencies and give improved usage of resources such as network bandwidth, storage and computing power. In the search for an optimal data replication strategy, the Grid simulator OptorSim was developed as part of the European DataGrid project. Simulations of various HEP Grid scenarios have been undertaken using different job scheduling and file replication algorithms, with the experimental emphasis being on physics analysis use-cases. Previously, the CMS Data Challenge 2002 testbed and UK GridPP testbed were among those simulated; recently, our focus has been on the LCG testbed. A novel economy-based strategy has been investigated as well as more traditional methods, with the economic models showing distinct advantages in terms of improved resource usage. Here, an overview of OptorSim's design and implementation is presented with a selection of recent results, showing its usefulness as a Grid simulator both in its current features and in the ease of extensibility to new scheduling and replication algorithms.
Speaker: C. Nicholson (UNIVERSITY OF GLASGOW)
• 167
Participation of Russian sites in the Data Challenge of ALICE experiment in 2004
The report presents an analysis of the Alice Data Challenge 2004. This Data Challenge has been performed on two different distributed computing environments. The first one is the Alice Environment for distributed computing (AliEn) used standalone. Presently this environment allows ALICE physicists to obtain results on simulation, reconstruction and analysis of data in ESD format for AA and pp collisions at LHC energies. The second environment is the LCG-2 middleware accessed via AliEn with the help of an interface, developed at INFN. Three Russian sites have been configured as AliEn nodes for the Data Challenge. These sites (IHEP at Protvino, ITEP in Moscow and JINR at Dubna) could run a maximal of 86 jobs. The initial analysis shows that the architecture of one site was not adequate for distributed computing. Another farm had nodes with insufficient RAM for efficient job processing. All these problems have been cured subsequent DC phases. Actions have also been taken to reduce the downtime due to wrong site configuration. The local AliEn server installed at the JINR site has been used as a standard configuration for the other Russian sites. The total number of jobs processed in Russia constitute ~2% of total run in the ALICE DC 2004.
Speaker: G. Shabratova (Joint Institute for Nuclear Research (JINR))
• 168
Patriot: Physics Archives and Tools required to Investigate Our Theories
PATRIOT is a project that aims to provide better predictions of physics events for the high-Pt physics program of Run2 at the Tevatron collider. Central to Patriot is an enstore or mass storage repository for files describing the high-Pt physics predictions. These are typically stored as StdHep files which can be handled by CDF and D0 and run through detector and triggering simulations. The definition of these datasets in the CDF and D0 data handling system SAM is under way. Patriot relies heavily on a new generation of Monte Carlo tools (such as MadEvent, Alpgen, Grappa, CompHEP, etc.) to calculate the hard structure of high-Pt events and the more venerable event generators (Pythia and Herwig) to make particle level predictions. An early informational database, describing the types of data files stored in Patriot, already exists. A new database is under development. In parallel with PATRIOT, we wish to develop the QCD tools that describe the detailed properties of high-Pt events. Some of the essential features of particle-level events must be described by non- perturbative functions, whose form is often constrained by theory, but which must be ultimately tuned to data.
Speaker: S. Mrenna (FERMILAB)
• 169
Performance of an operating High Energy Physics Data grid, D0SAR-grid
The D0 experiment at Fermilab's Tevatron will record several petabytes of data over the next five years in pursuing the goals of understanding nature and searching for the origin of mass. Computing resources required to analyze these data far exceed the capabilities of any one institution. Moreover, the widely scattered geographical distribution of collaborators poses further serious difficulties for optimal use of human and computing resources. These difficulties will be exacerbated in future high energy physics experiments, like those at the LHC. The computing grid has long been recognized as a solution to these problems. This technology is being made a more immediate reality to end users by developing a fully realized grid in the D0 Southern Analysis Region (D0SAR). D0SAR consists of eleven universities in the Southern US, Brazil, Mexico and India. The centerpiece of D0SAR is a data and resource hub, a Regional Analysis Center (RAC). Each D0SAR member institution constructs an Institutional Analysis Center (IAC), which acts as a gateway to the grid for users within that institution. These IACs combine dedicated rack-mounted servers and personal desktop computers into a local physics analysis cluster. D0SAR has been working on establishing an operational regional grid, D0SAR-Grid, using all available resources within it and a home-grown local task manager, McFarm. In this talk, we will describe the architecture of the D0SAR-Grid implementation, the use and functionality of the grid, and the experiences of operating the grid for simulation, reprocessing and analysis of data from a currently running HEP experiment.
Speaker: B. Quinn (The University of Mississippi)
• 170
Predicting Resource Requirements of a Job Submission
Grid computing provides key infrastructure for distributed problem solving in dynamic virtual organizations. However, Grids are still the domain of a few highly trained programmers with expertise in networking, high-performance computing, and operating systems. One of the big issues in the full-scale usage of a grid is the matching of the resource requirements of a job submission to available resources. In order for resource brokers/job schedulers to ensure efficient use of grid resources, an initial estimate of the likely resource usage of a submission must be made. In the context of the Grid Enabled Analysis Environment (GAE), physicists want the ability to discover, acquire, and reliably manage computational resources dynamically, in the course of their everyday activities. They do not want to be bothered with the location of these resources, the mechanisms that are required to use them, keeping track of the status of computational tasks operating on these resources, or with reacting to failure. They do care about how long their tasks are likely to run and how much these tasks will cost. So the grid scheduler must have the capability to estimate before job submission, how much time and resources the job will consume on execution site. Our proposed module, Prediction engine will be part of scheduler and it will provide estimates of resource use along with the duration of use. This will enable scheduler to choose the optimum site for job execution. This paper presents the survey of existing grid schedulers and then based on this survey states the need for resource usage estimation. Also the architecture and design of “grid prediction engine” that predicts the resource requirements of a job submission is discussed.
Speaker: A. Anjum (NIIT)
• 171
Production data export and archiving system for new data format of the BaBar experiment.
For The BaBar Computing Group BaBar has recently moved away from using Objectivity/DB for it's event store towards a ROOT-based event store. Data in the new format is produced at about 20 institutions worldwide as well as at SLAC. Among new challenges are the organization of data export from remote institutions, archival at SLAC and making the data visible to users for analysis and import to their own institutions. The new system is designed to be scalable, easily configurable on the client and server side and adaptive to server load. It's intergrated to work with SLAC's mass storage system (HPSS) and with the xrootd service. Design, implementation and experience with new system, as well as future development is discussed in this article.
• 172
Production Experience of the Storage Resource Broker in the BaBar Experiment
We describe the production experience gained from implementing and using exclusively the San Diego Super Computer Center developed Storage Resource Broker (SRB) to distribute the BaBar experiment's production event data stored in ROOT files from the experiment center at SLAC, California, USA to a Tier A computing center at ccinp3, Lyon France. In addition we outline how the system can be readily expanded to include more sites.
Speaker: A. Hasan (SLAC)
• 173
Production of simulated events for the BaBar experiment by using LCG
The BaBar experiment has been taking data since 1999. In 2001 the computing group started to evaluate the possibility to evolve toward a distributed computing model in a Grid environment. In 2003, a new computing model, described in other talks, was implemented, and ROOT I/O is now being used as the Event Store. We implemented a system, based onthe LHC Computing Grid (LCG) tools, to submit full-scale MonteCarlo simulation jobs in this new BaBar computing model framework. More specifically, the resources of the LCG implementation in Italy, grid.it, are used as computing elements (CE) and Worker Nodes (WN). A Resource Broker (RB) specific for the Babar computing needs was installed. Other BaBar requirements, such as the installation and usage of an object-oriented (Objectivity) Database to read detector conditions and calibration constants, were accomodated by using non-gridified hardware in a subset of grid.it sites. The BaBar simulation software was packed and installation on Grid elements was centrally managed with LCG tools. Sites were geographically mapped to Objectivity databases, and conditions were read by the WN either locally or remotely. An LCG User Interface (UI) has been used to submit simulation tests by using standard JDL commands. The ROOT I/O output files were retrieved from the WN and stored in the closest Storage Element (SE). Standard BaBar simulation production tools were then installed on the UI and configured such that the resulting simulated events can be merged and shipped to SLAC, like in the standard BaBar simulation production setup. Final validation of the system is being completed. This gridified approach results in the production of simulated events on geographically distributed resources with a large throughput and minimal, centralized system maintenance.
Speaker: D. Andreotti (INFN Sezione di Ferrara)
• 174
SAMGrid Experiences with the Condor Technology in Run II Computing
SAMGrid is a globally distributed system for data handling and job management, developed at Fermilab for the D0 and CDF experiments in Run II. The Condor system is being developed at the University of Wisconsin for management of distributed resources, computational and otherwise. We briefly review the SAMGrid architecture and its interaction with Condor, which was presented earlier. We then present our experiences using the system in production, which have two distinct aspects. At the global level, we deployed Condor-G, the Grid-extended Condor, for the resource brokering and global scheduling of our jobs. At the heart of the system is Condor's Matchmaking Service. As a more recent work at the computing element level, we have been benefitting from the large computing cluster at the University of Wisconsin campus. The architecture of the computing facility and the philosophy of Condor's resource management have prompted us to improve the application infrastructure for D0 and CDF, in aspects such as parting with the shared file system or reliance on resources being dedicated. As a result, we have increased productivity and made our applications more portable and Grid-ready. We include some statistics gathered from our experience. Our fruitful collaboration with the Condor team has been made possible by the Particle Physics Data Grid.
Speaker: I. Terekhov (FERMI NATIONAL ACCELERATOR LABORATORY)
• 175
SAMGrid Monitoring Service and its Integration with MonALisa
The SAMGrid team is in the process of implementing a monitoring and information service, which fulfills several important roles in the operation of the SAMGrid system, and will replace the first generation of monitoring tools in the current deployments. The first generation tools are in general based on text logfiles and represent solutions which are not scalable or maintainable. The roles of the monitoring and information service are: 1) providing diagnostics for troubleshooting the operation of SAMGrid services; 2) providing support for monitoring at the level of user jobs; 3) providing runtime support for local configuration and other information currently which currently must be stored centrally (thus moving thesystem toward greater autonomy for the SAM station services, which include cache management and job management services); 4) providing intelligent collection of statistics in order to enable performance monitoring and tuning. The architecture of this service is quite flexible, permitting input from any instrumented SAM application or service. It will allow multiple backend storage for archiving of(possibly) filtered monitoring events, as well as real time information displays andactive notification service for alarm conditions. This service will be able to export, in a configurable manner, information to higher level Grid monitoring services, such as MonALisa. We describe our experience to date with using a prototype version together with MonAlisa.
Speaker: A. Lyon (FERMI NATIONAL ACCELERATOR LABORATORY)
• 176
SRM AND GFAL TESTING FOR LCG2
Storage Resource Manager (SRM) and Grid File Access Library (GFAL) are GRID middleware components used for transparent access to Storage Elements. SRM provides a common interface (WEB service) to backend systems giving dynamic space allocation and file management. GFAL provides a mechanism whereby an application software can access a file at a site without having to know which transport mechanism to use or at which site it is running. Two separate Test Suites have been developed for testing of SRM interface v 1.1 and testing against the GFAL file system. Test Suites are written in C and Perl languages. SRM test suite: a script in Perl generates files and their replicas. These files are copied to the local SE and registered (published). Replicas of files are made to the specified SRM site. All replicas are used by the C-program. The SRM functions, such as get, put, pin, unPin etc. are tested using a program written in C. As SRMs do not perform file movement operations, the C-program transfers files using "globus-url-copy". It then compares the data files before and after transfer. GFAL test suite: as GFAL allows users to access a file in a Storage Element directly (read and write) without copying it locally, a C-program tests the implementation of POSIX I/O functions such as open/seek/read/write. A Perl script executes almost all Unix based commands: dd, cat, cp, mkdir and so on. Also the Perl script launches a stress test, creating many small files (~5000), nested directories and huge files. The investigation of interactions between the Replica Manager, the SRM and the file access mechanism will help making the Data Management software better.
Speaker: E. Slabospitskaya (Institute for High Energy Physics,Protvino,Russia)
• 177
Testing the CDF Distributed Computing Framework
To distribute computing for CDF (Collider Detector at Fermilab) a system managing local compute and storage resources is needed. For this purpose CDF will use the DCAF (Decentralized CDF Analysis Farms) system which is already at Fermilab. DCAF has to work with the data handling system SAM (Sequential Access to data via Metadata). However, both DCAF and SAM are mature systems which have not yet been used in combination, and on top of this DCAF has only been installed at Fermilab and not on local sites. Therefore tests of the systems are necessary to test the interplay of the data handling with the farms, the behaviour of the off-site DCAFs and the user friendliness of the whole system. The tests are focussed on the main tasks of the DCAFs, like Monte Carlo generation and stores, as well as the readout of data files and connected data handling. To achieve user friendliness the SAM station environment has to be common to all stations and adaptations to the environment have to be made.
Speaker: V. Bartsch (OXFORD UNIVERSITY)
• 178
The ATLAS Computing Model
The ATLAS Computing Model is under continuous active development. Previous exercises focussed on the Tier-0/Tier-1 interactions, with an emphasis on the resource implications and only a high-level view of the data and workflow. The work presented here considerably revises the resource implications, and attempts to describe in some detail the data and control flow from the High Level Trigger farms all the way through to the physics user. The model draws from the experience of previous and running experiments, but will be tested in the ATLAS Data Challenge 2 (DC2, described in other abstracts) and in the ATLAS Combined Testbeam exercises. An important part of the work is to devise the measurements and tests to be run during DC2. DC2 will be nearing completion in September 2004, and the first assessments of the performance of the computing model in scaled slice tests will be presented.
Speaker: R. JONES (LANCAS)
• 179
The BABAR Analysis Task Manager
The new BaBar bookkeeping system comes with tools to directly support data analysis tasks. This Task Manager system acts as an interface between datasets defined in the bookkeeping system, which are used as input to analyzes, and the offline analysis framework. The Task Manager organizes the processing of the data by creating specific jobs to be either submitted to a batch system, or run in the background on a local desktop, or laptop. The current system has been designed to support pbs and lsf batch systems. Changes to defined datasets due production is directly supported by the Task Manager, where new collections that add to a dataset or replace other collections are automatically detected, allowing an analysis at any time to be up-to-date with the latest available data. The output of tasks, whether new data collections, ntuple/hbook files, or text files, can be put back into a collections bookkeeping system or stored in the private Task Manager database. Currently MySQL and Oracle relational databases are supported. The BABAR Task Manager has been in use for data production since January this year, and the schema of the working system will be presented.
Speaker: Douglas Smith (Stanford Linear Accelerator Center)
• 180
The D0 Virtual Center and planning for large scale computing
The D0 experiment relies on large scale computing systems to achieve her physics goals. As the experiment lifetime spans, multiple generations of computing hardware, it is fundemental to make projective models in to use available resources to meet the anticipated needs. In addition, computing resources can be supplied as in-kind contributions by collaborating institutions and countries, however, such resources typically require scheduling, thus adding another dimension for planning. In addition, to avoid over-subscription of the resources, the experiment has to be educated on the limitations and trade-offs for various computing activities to enable the management to prioritze. We present the metrics and mechanisms used for planning and discuss the uncertainties and unknowns, as well as some of the mechanisms for communicating the resource load to the stakeholders. In order to correctly account for in-kind contributions of remote computing, D0 uses the concept of a Virtual Center, in which all of the costs are estimated as if the computing were located at solely at FNAL. In contrast to other such models in common use, D0 accounts for contributions based on computer usage rather than strictly on money spend on hardware. This gives incentive to acheive the maximum efficiency of the systems as well as encouraging active participation in the computing model by collaborating instititions. This method of operation leverages a common tool and infrastructure base for all production-type activites.
Speaker: A. Boehnlein (FERMI NATIONAL ACCELERATOR LABORATORY)
• 181
The deployment mechanisms for the ATLAS software.
One of the most important problems in software management of a very large and complex project such as Atlas is how to deploy the software on the running sites. By running sites we include computer sites ranging from computing centers in the usual sense down to individual laptops but also the computer elements of a computing grid organization. The deployment activity consists in constructing a well defined representation of the states of the working software (known as releases), and transporting them to the target sites, in such a way that the installation process can be entirely automated and can take care of discovering the context and adapting itself to it. A set of tools based on both CMT - the basic configuration management tool of ATLAS - and Pacman has been developed. The resulting mechanism now supports the systematic production of distribution kits for various binary conditions of every release, the partial or complete automatic installation of kits on any site and the running of test suites to validate the installed kits. This mechanism is meant to be fully compliant with the Grid requirements and has been tested in the context of LCG. Several issues related with the constraints on the target system, or with the incremental updates of the installation still need to be studied and will be discussed.
Speaker: C. ARNAULT (CNRS)
• 182
The LCG-AliEn interface, a realization of a MetaGrid system
AliEn (ALICE Environment) is a GRID middleware developed and used in the context of ALICE, the CERN LHC heavy-ion experiment. In order to run Data Challenges exploiting both AliEn “native” resources and any infrastructure based on EDG-derived middleware (such as the LCG and the Italian GRID.IT), an interface system was designed and implemented; some details of a prototype were already presented at CHEP2003. In the spring of 2004 an ALICE Data Challenge began with the simulated data production on this multiple infrastructure, thus qualifying as the first large production carried out transparently making use of very different middleware system. This system is a practical realisation of the “federated” or “meta-” grid concept, and it has been successfully tested in a very large production. This talk reports about new developments of the interface system, the successful DC running experience, the advantages and limitations of this concept, the plans for the future and some lessons learned.
Speaker: S. Bagnasco (INFN Torino)
• 183
The role of legacy services within ATLAS DC2
This paper presents an overview of the legacy interface provided for the ATLAS DC2 production system. The term legacy refers to any non-grid system which may be deployed for use within DC2. The reasoning behind providing such a service for DC2 is twofold in nature. Firstly, the legacy interface provides a backup solution should unforeseen problems occur while developing the grid based interfaces. Secondly, this system allows DC2 to use resources which have yet to deploy grid software, thus increasing the available computing power for the Data Challenge. The aim of the legacy system is to provide a simple framework which is easily adaptable to any given computing system. Here the term computing system refers to the batch system provided at a given site and also to the structure of the computing and storage systems at that site. The legacy interface provides the same functionality as the grid based interfaces and is deployed transparently within the DC2 production system. Following the push-pull model implemented for DC2 the system pulls jobs from a production database and pushes them onto a gives computing/batch system. In a world which is becoming increasingly grid orientated this project allows us to evaluate the role of non-grid solutions in dedicated production environments. Experiences, both good and bad, gained during DC2 are presented and the future of such systems is discussed.
Speaker: J. kennedy (LMU Munich)
• 184
Tools for GRID deployment of CDF offline and SAM data handling systems for Summer 2004 computing.
The Fermilab CDF Run-II experiment is now providing official support for remote computing, expanding this to about 1/4 of the total CDF computing during the Summer of 2004. I will discuss in detail the extensions to CDF software distribution and configuration tools and procedures, in support of CDF GRID/DCAF computing for Summer 2004. We face the challenge of unreliable networks, time differences, and remote managers with little experience with this particular software. We have made the first deployment of the SAM data handling system outside its original home in the D0 experiment. We have deployed to about 20 remote CDF sites. We have created light weight testing and monitoring tools to assure that these sites are in fact functional when installed. We are distributing and configuring both client code within CDF code releases, and the SAM servers to which the clients connect. Procedures which once took days are now performed in minutes. These tools can be used to install SAM servers for D0 and other experiments. Networks permitting, we will give a live SAM installation demonstration. We have separated the data handling components from the main CDF offline code releases by means of shared libraries, permitting live upgrades to otherwise frozen code. We now use a special 'development lite' release to ensure that all sites have the latest tools available. We have put subtantial effort into revision control, so that essentially all active CDF sites are running exactly the same code.
Speaker: A. Kreymer (FERMILAB)
• 185
Toward a Grid Technology Independent Programming Interface for HEP Applications
In the High Energy Physics (HEP) community, Grid technologies have been accepted as solutions to the distributed computing problem. Several Grid projects have provided software in the last years. Among of all them, the LCG - especially aimed at HEP applications - provides a set of services and respective client interfaces, both in the form of command line tools as well as programming language APIs in C, C++, Java, etc. Unfortunately, the programming interface presented to the end user (the physicist) is often not uniform or provides different levels of abstractions. In addition, Grid technologies face a constant change and an improvement process and it is of major importance to shield changes of underlying technology to the end users. As services evolve and new ones are introduced, the way users interact with them also changes. These new interfaces are often designed to work at a different level and with a different focus than the original ones. This makes it hard for the end user to build Grid applications. We have analyzed the existing LCG programming environment and identified several ways to provide high-level technology independent interfaces. In this article, we describe the use cases we were presented by the LCG experiments and the specific problems we encountered in documenting existing APIs and providing usage examples. As a main contribution, we also propose a prototype high-level interface for the information, authentication and authorization systems that is now under test on the LCG EIS testbed by the LHC experiments.
• 186
Usage of ALICE Grid middleware for medical applications
Breast cancer screening programs require managing and accessing a huge amount of data, intrinsically distributed, as they are collected in different Hospitals. The development of an application based on Computer Assisted Detection algorithms for the analysis of digitised mammograms in a distributed environment is a typical GRID use case. In particular, AliEn (ALICE Environment) services, whose development was carried on by the ALICE Collaboration, were used to configure a dedicated Virtual Organisation; a PERL-based interface to AliEn commands allows the registration of new patients and mammograms in the AliEn Data Catalogue as well as queries to retrieve images associated to selected patients. The analysis of selected mammograms can be performed interactively, making use of PROOF services, or taking advantage of the AliEn capabilities to generate "sub-jobs"; each of them analyzes the fraction of the selected sample stored on a site, and the results are merged. All the required functionality is available: by the end of 2004 a working prototype is foreseen, with an AliEn Client installed in each of the Hospitals participating to the INFN-funded MAGIC-5 project. The same approach will be applied in the near future in two other application areas: - Lung cancer screening, equivalent to the mammographic screening from the middleware point of view, where Computer Assisted Detection algorithms are being developed; - Diagnosis of the Alzheimer disease, where the application is intrinsically distributed: it should, in fact, compare the PET- generated image to a set of reference images which are scattered on many sites and merge the results.
Speaker: P. Cerello (INFN Torino)
• 187
Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world.
The Nordic Grid facility (NorduGrid) came into production operation during the summer of 2002 when the Scandinavian Atlas HEP group started to use the Grid for the Atlas Data Challenges and was thus the first Grid ever contributing to an Atlas production. Since then, the Grid facility has been in continuous 24/7 operation offering an increasing number of resources to a growing set of active users coming from various scientific areas including chemistry, biology, informatics. As of today the Grid has grown into one of the largest production Grids of the world continuously running Grid jobs on the more than 30 Grid-connected sites which offer over 2000 CPUs. This article will start with a short overview of the design and implementation of the Advanced Resource Connector (ARC), the NorduGrid middleware, which delivers reliable Grid services to the NorduGrid production facility. This will be followed by a presentation of the logging facility of NorduGrid, describing the logging service and the collected information. The main part of the talk will focus on the analysis of the collected logging information: usage statistics, usage patterns (what is a typical grid job on the NorduGrid looks like?). Use cases from different application domains will also be discussed. References: -NorduGrid live: www.nordugrid.org -> Grid Monitor -Atlas Data-Challenge 1 on NorduGrid: http://arxiv.org/abs/physics/0306013
Speaker: O. SMIRNOVA (Lund University, Sweden)
• 188
Using Tripwire to check cluster system integrity
Expansion of large computing fabrics/clusters throughout the world would create a need for stricter security. Otherwise any system could suffer damages such as data loss, data falsification or misuse. Perimeter security and intrusion detection system (IDS) are the two main aspects that must be taken into account in order to achieve system security. The main target of an intrusion detection system is early detection in the previously mentioned cases, as a way to minimize any damage in data contained in the system. Tripwire is one of the most powerful IDSs and is widely used as a security tool by the community of network administrators. Tripwire is oriented to monitor the status of files and directories, being able to detect the lightest change suffered by them. At Ciemat, Tripwire has been used to monitor our local clusters, involved in GRID projects such as implementation of LCG prototypes, to guarantee the integrability of data generated, and stored there. It is used as well to monitor any modificacion of operating system files and any other scientific core software.
Speaker: E. Perez-Calle (CIEMAT)
• 189
XTNetFile, a fault tolerant extension of ROOT TNetFile
This paper describes XTNetFile, the client side of a project conceived to address the high demand data access needs of modern physics experiments such as BaBar using the ROOT framework. In this context, a highly scalable and fault tolerant client/server architecture for data access has been designed and deployed which allows thousands of batch jobs and interactive sessions to effectively access the data repositories basing on the XROOTD data server, a complex extension of the rootd daemon. The majority of the communication problems are handled by the design of the client/server mechanism and the communication protocol. This allows us to build distributed data access systems which are highly robust, load balanced and scalable to an extent which allows 'no jobs to fail'. Furthermore XTNetFile ensures backward compatibility with the 'old' rootd server by using same API as the existing ROOT TFile/TNetFile classes. The code is designed with a high degree of modularity that allows to build other interfaces, such as administrative tools, based on the same communication layer. In addition the client plugin can also be used to readother types of (non-ROOT I/O) data files, providing the same benefits.
Speaker: F. Furano (INFN Padova)
• Plenary: Session 6 Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

Convener: Neil Geddes (CCLRC-RAL)
• 190
Evolution and Revolution in the Design of Computers Based on Nanoelectronics
Today's computers are roughly a factor of one billion less efficient at doing their job than the laws of fundamental physics state that they could be. How much of this efficiency gain will we actually be able to harvest? What are the biggest obstacles to achieving many orders of magnitude improvement in our computing hardware, rather that the roughly factor of two we are used to seeing with each new generation of chip? Shrinking components to the nanoscale offers both potential advantages and severe challenges. The transition from classical mechanics to quantum mechanics is a major issue. Others are the problems of defect and fault tolearance: defects are manufacturing mistakes or components that irreversibly break over time and faults are transient interuptions that occur during operation. Both of these issues become bigger problems as component sizes shrink and the number of components scales up massively. In 1955, John von Neumann showed that a completely general approach to building a reliable machine from unreliable components would require a redundancy overhead of at least 10,000 - this would completely negate any advantages of building at the nanoscale. We have been examining a variety of defect and fault tolerant techniques that are specific to particular structures or functions, and are vastly more efficient for their particular task than the general approach of von Neumann. Our strategy is to layer these techniques on top of each other to achieve high system reliability even with component reliability of no more than 97% or so, and a total redudancy of less than 3. This strategy preserves the advantages of nanoscale electronics with a relatively modest overhead.
Speaker: Stan Williams (HP)
• 191
Enterasys - Networks that Know
Today and in the future businesses need an intelligent network. And Enterasys has the smarter solution. Our active network uses a combination of context-based and embedded security technologies - as well as the industry’s first automated response capability - so it can manage who is using your network. Our solution also protects the entire enterprise - from the edge, through the distribution layer, and into the core of the network. Threats are recognized and isolated at the user level, rather than taking your entire network down. It even has the ability to coexist with and enhance your legacy data networking infrastructure and existing security appliances - regardless of the vendor. By continually offering a context-based analysis of network traffic, our solution allows you to see not only what the problem is, but also where it is, and who caused it. And, with the industry's most advanced controls, we're the first solution that's able to resolve threats across the entire network - dynamically or on demand.
Speaker: J. ROESE
• 192
The IBM Research Global Technology Outlook
The Global Technology Outlook (GTO) is IBM Research’s projection of the future for information technology (IT). The GTO identifies progress and trends in key indicators such as raw computing speed, bandwidth, storage, software technology, and business modeling. These new technologies have the potential to radically transform the performance and utility of tomorrow's information processing systems and devices, ultimately creating new levels of business value.
Speaker: Dave McQueeney (IBM)
• 12:30 PM
Lunch
• Computer Fabrics Harder

### Harder

#### Interlaken, Switzerland

• 193
Operation of the CERN Managed Storage environment; current status and future directions
This paper discusses the challenges in maintaining a stable Managed Storage Service for users built upon dynamic underlying disk and tape layers. Early in 2004 the tools and techniques used to manage disk, tape, and stage servers were refreshed in adopting the QUATTOR tool set. This has markedly increased the coherency and efficiency of the configuration of data servers. The LEMON monitoring suite was deployed to raise alarms and gather performance metrics. Exploiting this foundation, higher level service displays are being added, giving comprehensive and near-real-time views of operations. The scope of our monitoring has been broadened to include low-level machine sensors such as thermometer, IPMI and SMART readings, improving our ability to detect impending hardware failure. In terms of operations, widespread disk reliability problems which were manpower intensive to chase, were overcome by exchanging a bad batch of 1200 disks. Recent LHC data challenges have ventured into new operating domains for the CASTOR system, with massive disk resident file catalogues requiring special handling. The tape layer has focused on STK 9940 drives for bulk recording capacity: a large scale data migration to this media permitted old drive technologies to be retired. Repacking 9940A data to 9940B high density media allows us to recycle tapes, giving substantial savings by avoiding acquisition of new media. In addition to more robust software, hardware developments are required for LHC era services. We are moving from EIDE to SATA based disk storage and envisage a tape drive technology refresh. Details will be provided of our investigations in these areas.
Speaker: T. Smith (CERN)
• 194
The status of Fermilab Enstore Data Storage System
Fermilab has developed and successively uses Enstore Data Storage System. It is a primary data store for the Run II Collider Experiments, as well as for the others. It provides data storage in robotic tape libraries according to requirements of the experiments. High fault tolerance and availability, as well as multilevel priority based request processing allows experiments to effectively store and access data stored in the Enstore, including storing raw data from data acquisition systems. The distributed structure and modularity of Enstore allow to scale the system and add more storage equipment as the requirements grow. Currently Fermilab Data Storage System storage system Enstore includes 5 robotic tape libraries, 96 tape drives of different type. Amount of data stored in the system is ~1.7 PetaBytes. Users access Enstore directly using a special command. They also can use ftp, grid ftp, SRM interfaces to dCache system, that uses Enstore as its lower layer storage.
Speaker: A. Moibenko (FERMI NATIONAL ACCELERATOR LABORATORY, USA)
• 195
Disk storage technology for the LHC T0/T1 centre at CERN
By 2008, the T0/T1 centre for the LHC at CERN is estimated to use about 5000 TB of disk storage. This is a very significant increase over the about 250 TB running now. In order to be affordable, the chosen technology must provide the required performance and at the same time be cost-effective and easy to operate and use. We will present an analysis of the cost (both in terms of material and personnel) of the current implementation (network-attached storage), and then describe detailed performance studies with hardware currently in use at CERN in different configurations of filesystems on software or hardware RAID arrays over disks. Alternative technologies that have been evaluated by CERN in varying depth (such as arrays of SATA disks with a Fiber Channel uplink, distributed disk storage across worker nodes, iSCSI solutions, SANFS, ...) will be discussed. We will conclude with an outlook of the next steps to be taken at CERN towards defining the future disk storage model.
Speaker: H. Meinhard (CERN-IT)
• 196
64-Bit Opteron systems in High Energy and Astroparticle Physics
64-Bit commodity clusters and farms based on AMD technology meanwhile have been proven to achieve a high computing power in many scientific applications. This report first gives a short introduction into the specialties of the amd64 architecture and the characteristics of two-way Opteron systems. Then results from measuring the performance and the behavior of such systems in various Particle Physics applications as compared to the classical 32-Bit systems are presented. The investigations cover analysis tools like ROOT, Astrophysics simulations based on CORSIKA and event reconstruction programs. Another field of investigations are parallel high performance clusters for Lattice QCD calculations, and n-loop calculations based on perturbative methods in quantum field theory using the formula manipulation program FORM. In addition to the performance results the compatibility of 32- and 64-Bit architectures and Linux operating system issues, as well as the impact on fabric management are discussed. It is shown that for most of the considered applications the recently available 64-bit commodity computers from AMD are a viable alternative to comparable 32-Bit systems.
Speaker: S. Wiesand (DESY)
• 197
CERN's openlab for Datagrid applications
For the last 18 months CERN has collaborated closely with several industrial partners to evaluate, through the opencluster project, technology that may (and hopefully will) play a strong role in the future computing solutions, primarily for LHC but possibly also for other HEP computing environments. Unlike conventional field testing where solutions from industry are evaluated rather independently, the openlab principle is based on active collaboration between all partners, with the common goal of constructing a coherent system. The talk will discuss our experience to date with the following hardware - 64-bit computing (in our case represented by the Itanium processor). This will also include the porting of applications and Grid software to 64 bits. - Rack mounted servers - The use of 10 Gbps Ethernet for both LAN and WAN connectivity - An iSCSI-based Storage System that promises to scale to Petabyte dimensions - The use of 10 Gbps Infiniband as a cluster interconnect On the software side we will review our experience with the latest grid-enabled release of Oracle, the so-called release "10g". The talk will review the results obtained so far, either in stand alone tests or as part of the larger LCG testbed, and it will describe the plans for the future in this three-year collaboration with industry.
Speaker: S. Jarp (CERN)
• 198
InfiniBand for High Energy Physics
Distributed physics analysis techniques as provided by the rootd and proofd concepts require a fast and efficient interconnect between the nodes. Apart from the required bandwidth the latency of message transfers is important, in particular in environments with many nodes. Ethernet is known to have large latencies, between 30 and 60 micro seconds for the common Giga-bit Ethernet. The InfiniBand architecture is a relatively new, open industry standard. It defines a switched high-speed, low-latency fabric designed to connect compute nodes and I/O nodes with copper or fibre cables. The theoretical bandwidth is up to 30 Gbit/s. The Institute for Scientific Computing (IWR) at the Forschungszentrum Karlsruhe is testing InfiniBand technology since begin of 2003, and has a cluster of dual Xeon nodes using the 4X (10 Gbit/s) version of the interconnect. Bringing the RFIO protocol - which is part of the CERN CASTOR facilities for sequential file transfers - to InfiniBand has been a big success, allowing significant reduction of CPU consumption and increase of file transfer speed. A first prototype of a direct interface to InfiniBand for the root toolkit has been designed and implemented. Experiences with hard- and software, in particular MPI performance results, will be reported. The methods and first performance results on rfio and root will be shown and compared to other fabric technologies like Ethernet.
Speaker: A. Heiss (FORSCHUNGSZENTRUM KARLSRUHE)
• 4:00 PM
Coffee break
• 199
CASTOR: Operational issues and new Developments
The Cern Advanced STORage (CASTOR) system is a scalable high throughput hierarchical storage system developed at CERN. CASTOR was first deployed for full production use in 2001 and has expanded to now manage around two PetaBytes and almost 20 million files. CASTOR is a modular system, providing a distributed disk cache, a stager, and a back end tape archive, accessible via a global logical name-space. This paper focuses on the operational issues of the system currently in production, and first experiences with the new CASTOR stager which has undergone a significant redesign in order to cope with the data handling challenges posed by the LHC, which will be commissioned in 2007. The design target for the new stager was to scale to another order of magnitude above the current CASTOR, namely to be able to sustain peak rates of the order of 1000 file open requests per second for a PetaByte disk pool. The new developments have been inspired by the problems which arose managing massive installations of commodity storage hardware. The farming of disk servers poses new challenges to the disk cache management: request scheduling; resource sharing and partitioning; automated configuration and monitoring; and fault tolerance of unreliable hardware Management of the distributed component based CASTOR system across a large farm, provides an ideal example of the driving forces for the development of automated management suites. Quattor and Lemon frameworks naturally address CASTOR's operational requirements, and we will conclude by describing their deployment on the masstorage systems at CERN.
Speaker: J-D. Durand (CERN)
• 200
dCache, LCG Storage Element and enhanced use cases
The dCache software system has been designed to manage a huge amount of individual disk storage nodes and let them appear under a single file system root. Beside a variety of other features, it supports the GridFtp dialect, implements the Storage Resource Manager interface (SRM V1) and can be linked against the CERN GFAL software layer. These abilities makes dCache a perfect Storage Element in the context of LCG and possibly future grid initiatives as well. During the last year, dCache has been deployed at dozens of Tier-I and Tier-II centers for the CMS and CDF experiments in the US and Europe, including Fermilab, Brookhaven, San Diego, Karlsruhe and CERN. The largest implementation, the CDF system at FERMI, provides 150 TeraBytes of disk space and delivers up to 50 TeraBytes/day to its clients. Sites using the LCG dCache distribution are more or less operating the cache as black box and little knowledge is available about customization and enhanced features. This presentation is therefor intended to make non dCache users curious and enable dCache users to better integrate dCache into their site specific environment. Beside many other topics, paper will touch on the possibility of dCache to closely cooperate with tertiary storage systems, like Enstore, Tsm and HPSS. It will describe the way dCache can be configured to attach different pool nodes to different user groups but let them all use the same set of fall back pools. We will explain how dCache takes care of dataset replication, either by configuration or by automatic detection of data access hot spots. Finally we will report on ongoing development plans.
Speaker: P. Fuhrmann (DESY)
• 201
Storage Resource Manager
Speaker: T. Perelmutov (FERMI NATIONAL ACCELERATOR LABORATORY)
• 202
SRB system at Belle/KEK
The Belle experiment has accumulated an integrated luminosity of more than 240fb-1 so far, and a daily logged luminosity now exceeds 800pb-1. These numbers correspond to more than 1PB of raw and processed data stored on tape and an accumulation of the raw data at the rate of 1TB/day. The processed, compactified data, together with Monte Carlo simulation data for the final physics analyses amounts to more than 100TB. The Belle collaboration consists of more than 55 institutes in 14 countries and at most of the collaborating institutions, active physics data analysis programs are being undertaken. To meet these storage and data distribution demands, we have tried to adopt a resource broker, SRB. We have installed the SRB system at KEK, Australia, and other collaborating institutions and have started to share data. In this talk, experiences with the SRB system will be discussed and the performance of the system when used for data processing and physics analysis of the Belle experiment will be demonstrated.
Speaker: Y. Iida (HIGH ENERGY ACCELERATOR RESEARCH ORGANIZATION)
• 203
StoRM: grid middleware for disk resource management
Within a Grid the possibility of managing storage space is fundamental, in particular, before and during application execution. On the other hand, the increasing availability of highly performant computing resources raises the need for fast and efficient I/O operations and drives the development of parallel distributed file systems able to satisfy these needs granting access to distributed storage. The demand of POSIX compliant access to storage and the need to have a uniform interface for both Grid integrated and pure vanilla applications stimulate developers to investigate the possibility to integrate already existing filesystems into a Grid infrastructure, allowing users to take advantage of storage resources without being forced to change their applications. This paper describes the design and implementation of StoRM, a storage resource manager (SRM) for disk only. Through StoRM an application can reserve and manage space on disk storage systems. It can then access the space either in a Grid environment or locally in a transparent way via classic POSIX calls. The StoRM architecture is based on a pluggable model in order to easily add new functionalities. The StoRM implementation uses now filesystems such as GPFS or LUSTRE. The StoRM prototype includes space reservation functionalities that complement SRM space reservation to allow applications to directly access/use the managed space trough POSIX calls. Moreover, StoRM includes quota management and a space guard. StoRM will serve as policy enforcement point (PEP) for the Grid Policy Management System over disk resources. The experimental results obtained are promising.
Speaker: L. Magnoni (INFN-CNAF)
• 204
The SAMGrid Database Server Component: Its Upgraded Infrastructure and Future Development Path
The SAMGrid Database Server encapsulates several important services, such as accessing file metadata and replica catalog, keeping track of the processing information, as well as providing the runtime support for SAMGrid station services. Recent deployment of the SAMGrid system for CDF has resulted in unification of the database schema used by CDF and D0, and the complexity of changes required for the unified metadata catalog has warranted a complete redesign of the DB Server. We describe here the architecture and features of the new server. In particular, we discuss the new CORBA infrastructure that utilizes python wrapper classes around IDL structs and exceptions. Such infrastructure allows us to use the same code on both server and client sides, which in turn results in significantly improved code maintainability and easier development. We also discuss future integration of the new server with an SBIR II project which is directed toward allowing the dbserver to access distributed databases, implemented in different DB systems and possibly using different schema.
Speaker: S. Veseli (Fermilab)
• Core Software Brunig 1 + 2

### Brunig 1 + 2

#### Interlaken, Switzerland

• 205
LCIO persistency and data model for LC simulation and reconstruction
LCIO is a persistency framework and data model for the next linear collider. Its original implementation, as presented at CHEP 2003, was focused on simulation studies. Since then the data model has been extended to also incorporate prototype test beam data, reconstruction and analysis. The design of the interface has also been simplified. LCIO defines a common abstract user interface (API) in Java, C++ and Fortran in order to fulfill the needs of the global linear collider community. It is designed to be lightweight and flexible without introducing additional dependencies on other software packages. User code is completely separated from the concrete persistency implementation. SIO, a simple binary format that supports data compression and pointer retrieval is the current choice. LCIO is implemented in such a way that it can also be used as the transient data model in any linear collider application, e.g. a modular reconstruction program can use the LCIO event class (LCEvent) as the container for the modules' input and output data. As LCIO offers a common API for three languages it is also possible to construct a multi-language reconstruction framework that would facilitate the integration of already existing algorithms. A number of groups has already incorporated LCIO in their software frameworks and others plan to do so. We present the design and implementation of LCIO, focusing on new developments and uses.
Speaker: F. Gaede (DESY IT)
• 206
POOL Development Status and Plans
The LCG POOL project is now entering the third year of active development. The basic functionality of the project is provided but some functional extensions will move into the POOL system this year. This presentation will give a summary of the main functionality provided by POOL, which used in physics productions today. We will then present the design and implementation of the main new interfaces and components planned such as the POOL RDBMS abstraction layer and the RDBMS based Storage Manager back- end.
Speaker: D. Duellmann (CERN IT/DB & LCG POOL PROJECT)
• 207
POOL Integration into three Experiment Software Frameworks
The POOL software package has been successfully integrated with the three large experiment software frameworks of ATLAS, CMS and LHCb. This presentation will summarise the experience gained during these integration efforts and will try to highlight the commonalities and the main differences between the integration approaches. In particular we’ll discuss the role of the POOL object cache, the choice of the main storage technology in ROOT (tree or named objects) and approaches to collection and catalogue integration.
Speaker: Giacomo Govi
• 208
Recent Developments in the ROOT I/O
Since version 3.05/02, the ROOT I/O System has gone through significant enhancements. In particular, the STL container I/O has been upgraded to support splitting, reading without existing libraries and using directly from TTreeFormula (TTree queries). This upgrade to the I/O system is such that it can be easily extended (even by the users) to support the splitting and querying of almost any collections. The ROOT TTree queries engine has also been enhanced in many ways including an increase performance, better support for array printing and histograming, addition of the ability to call any external C or C++ functions, etc. We improved the I/O support for classes not inheriting from TObject, including support for automatic schema evolution without using an explicit class version. ROOT now support generating files larger than 2Gb. We also added plugins for several of the mass storage servers (Castor, DCache, Chirp, etc.). We will describe in details these new features and their implementation.
Speaker: P. Canal (FERMILAB)
• 209
XML I/O in ROOT
Till now, ROOT objects can be stored only in a binary ROOT specific file format. Without the ROOT environment the data stored in such files are not directly accessible. Storing objects in XML format makes it easy to view and edit (with some restriction) the object data directly. It is also plausible to use XML as exchange format with other applications. Therefore XML streaming has been implemented in ROOT. Any object which is in the ROOT dictionary can be stored/retrieved in XML format. Two layouts of object representation in XML are supported: class-dependent and generic. In the first case all XML tag names are derived from class and member names. To avoid name intersections, XML namespaces for each class are used. A Document Type Definition (DTD) file is automatically generated for each class (or set of classes). It can be used to validate the structure of the XML document. The generic layout of XML files includes tag names like "Object", "Member", "Item" and so on. In this case the DTD is common for all produced XML files. Further development is required to provide tools for accessing created XML files from other applications like: pure C++ code without ROOT libraries and dictionaries, Java and so on.
Speaker: S. Linev (GSI)
• 210
The FreeHEP Java Library Root IO package
The FreeHEP Java library contains a complete implementation of Root IO for Java. The library uses the "Streamer Info" embedded in files created by Root 3.x to dynamically create high performance Java proxies for Root objects, making it possible to read any Root file, including files with user defined objects. In this presentation we will discuss the status of this code, explain its implementation and demonstrate performance using benchmark comparisons to standard Root IO. We will also describe recently added support for reading files remotely using rootd and xrootd protocols. We will also show some uses of this library, including using JAS3 to analyze Root data, using the WIRED event display to visualize data from Root files and using rootd and Java servlet technology to make live plots web accessible - with examples from GLAST and BaBar. We will also explain how you can trivially make your own root data web-accessible using the AIDA Tag Library and Jakarta Tomcat.
Speaker: T. Johnson (SLAC)
• 4:00 PM
Coffee Break
• 211
EventStore: Managing Event Versioning and Data Partitioning using Legacy Data Formats
HEP analysis is an iterative process. It is critical that in each iteration the physicist's analysis job accesses the same information as previous iterations (unless explicitly told to do otherwise). This becomes problematic after the data has been reconstructed several times. In addition, when starting a new analysis, physicists normally want to use the most recent version of reconstruction. Such version control is useful for data managed by a single physicist using a laptop or small groups of physicists at a remote institution in addition to the collaboration wide managed data. In this presentation we will discuss our implementation of the EventStore which uses a data location, indexing and versioning service to manage legacy data formats (e.g. an experiment's existing proprietary file format or Root files). A plug-in architecture is used to support adding additional file formats. The core of the system is used to implement three different sizes of services: personal, group and collaboration.
Speaker: C. Jones (CORNELL UNIVERSITY)
• 212
ATLAS Metadata Interfaces (AMI) and ATLAS Metadata Catalogs
The ATLAS Metadata Interface (AMI) project provides a set of generic tools for managing database applications. AMI has a three-tier architecture with a core that supports a connection to any RDBMS using JDBC and SQL. The middle layer assumes that the databases have an AMI compliant self-describing structure. It provides a generic web interface and a generic command line interface. The top layer contains application specific features. The principal uses of AMI are the ATLAS Data Challenge dataset bookkeeping catalogs, and Tag Collector, a tool for release management. The first AMI Web service client was introduced in early 2004. It offers many advantages over earlier clients because: - Web services permit multi-language and multi-operating system support - The user interface is very effectively de-coupled from the implementation. Most upgrades can be implemented on the server side; no redistribution of client software is needed. In 2004 this client will be used for the ATLAS Data Challenge 2, for the ATLAS combined test beam offline bookkeeping, and also in the first prototypes of ARDA compliant analysis interfaces.
Speaker: S. Albrand (LPSC)
• 213
CMS Detector Description: New Developments
The CMS Detector Description Database (DDD) consists of a C++ API and an XML based detector description language. DDD is used by the CMS simulation (OSCAR), reconstruction (ORCA), and visualization (IGUANA) as well by test beam software that relies on those systems. The DDD is a sub-system within the COBRA framework of the CMS Core Software. Management of the XML is currently done using a separate Geometry project in CVS. We give an overview of the DDD integration and report on recent developments concerning detector description in CMS software: * The ability of client software to describe sub-detectors by providing an algorithm plug-in in C++ based on SEAL plug-in facilities. A typical algorithm plug-in makes use of the DDD API to describe detector properties. Through the API seamless access to data defined via the XML description language is ensured. * An Oracle schema was recently developed and the database populated by a DDD application. The geometrical structure of the detector is seen as a skeleton to which conditions or configuration data can be attached. * A C++ streaming mechanism to output the geometry as binary files was developed. This representation can be read into memory much more rapidly than the XML files can be parsed. The DDD API shields clients from each of the possible input sources. Even the simultaneous use of several different input sources is possible through various configuration options in the framework COBRA.
Speaker: M. Case (UNIVERSITY OF CALIFORNIA, DAVIS)
• 214
Addressing the persistency patterns of the time evolving HEP data in the ATLAS/LCG MySQL Conditions Databases
The size and complexity of the present HEP experiments represents an enormous effort in the persistency of data. These efforts imply a tremendous investment in the databases field not only for the event data but also for data that is needed to qualify this one - the Conditions Data. In the present document we'll describe the strategy for addressing the Conditions data problem in the ATLAS experiment, focusing in the ConditionsDB MySQL for the ATLAS/LCG project. The need for a persistent engine for structured conditions data has motivated the studies for an relational backend that maps transient structured objects in the relational database persistent engine. This paper illustrate the proposal for the storage of Conditions data in the LCG framework using it both to store only the Interval Of Validity (IOV) and a reference that represents the 'path' to an external persistent storage mechanism, and to archive the IOV and the data in relational tables mapping the costumizable CondDBTable objects. This allow to take advantages of all the relational features and also to directly map between transient objects and tables in the database server. The issue of distributed data storage and partitioning, is also analyzed in this paper, taking into account the different levels of indirection that are provided by the ConditionsDB MySQL implementation. These features represent a very important built in functionality in terms of scalability, data balance in a system that aims to be completely distributed and with a very high performance for hundreds of users.
Speaker: A. Amorim (FACULTY OF SCIENCES OF THE UNIVERSITY OF LISBON)
• 215
CDB - Distributed Conditions Database of BaBar Experiment
A new, completely redesigned Condition/DB was deployed in BaBar in October 2002. It replaced the old database software used through the first three and half years of data taking. The new software aims at performance and scalability limitations of the original database. However this major redesign brought in a new model of the metadata, brand new technology- and implementation- independent API, flexible configurability and extended functionality. One of the greatest strength of new CDB is that it's been designed to be a distributed kind database from the ground up to facilitate propagation and exchange of conditions (calibrations, detector alignments, etc.) in the realm of the international HEP collaboration. The first implementation of CDB uses Objectivity/DB as its underlying persistent technology. There is an ongoing study to understand how to implement CDB on top of other persistent technologies. The talk will cover the whole spectrum of topics ranging from the basic conceptual model of the new database through the way CDB is currently exploited in BaBar to the directions of further developments.
Speaker: I. Gaponenko (LAWRENCE BERKELEY NATIONAL LABORATORY)
• 216
LCG Conditions Database Project Overview
The Conditions Database project has been launched to implement a common persistency solution for experiment conditions data in the context of the LHC Computing Grid (LCG) Persistency Framework. Conditions data, such as calibration, alignment or slow control data, are non-event experiment data characterized by the fact that they vary in time and may have different versions. The LCG project draws on preexisting projects which have led to the definition of a generic C++ API for condition data access and its implementation using different storage technologies, such as Objectivity, MySQL or Oracle. The project is assigned the task to deliver a production release of the software including implementation libraries for several technologies and high level tools for data management. The presentation will review the current status of the LCG common project at the time of the conference and the plans for its evolution.
Speaker: A. Valassi (CERN)
• Distributed Computing Services Theatersaal

### Theatersaal

#### Interlaken, Switzerland

Convener: Rob Kennedy (FNAL)
• 217
Experience with POOL from the LCG Data Challenges of the three LHC experiments
This presentation will summarise the deployment experience gained with POOL during the first larger LHC experiments data challenges performed. In particular we discuss the storage access performance and optimisations, the integration issues with grid middleware services such as the LCG Replica Location Service (RLS) and the LCG Replica Manager and experience with the POOL proposed way of exchanging meta data (such as File Catalog catalogue entries) in a de-coupled production system.
Speaker: Maria Girone
• 218
Middleware for the next generation Grid infrastructure
The aim of the EGEE (Enabling Grids for E-Science in Europe) is to create a reliable and dependable European Grid infrastructure for e-Science. The objective of the Middleware Re-engineering and Integration Research Activity is to provide robust middleware components, deployable on several platforms and operating systems, corresponding to the core Grid services for resource access, data management, information collection, authentication & authorization, resource matchmaking and brokering, and monitoring and accounting. For achieving this objective, we developed an architecture and design of the next generation Grid middleware leveraging experiences and existing components mainly from AliEn, EDG, and VDT. The architecture follows the service breakdown developed by the LCG ARDA RTAG. Our goal is to do as little original development as possible but rather re-engineer and harden existing Grid services. The evolution of these middleware components towards a Service Oriented Architecture (SOA) adopting existing standards (and following emerging ones) as much as possible is another major goal of our activity. A rapid prototyping approach has been adopted, providing a sequence of more sophisticated prototypes to the EGEE candidate applications coming from the LHC HEP experiments and the Biomedical field. The close feedback loop with applications via these prototypes in indispensible for achieving our ultimate goals of providing a reliable and dependable Grid infrastructure. In this paper we will report on the architecture and design of the main Grid components and report on our experiences with early prototype systems.
Speaker: E. Laure (CERN)
• 219
The Clarens Grid-enabled Web Services Framework: Services and Implementation
Clarens enables distributed, secure and high-performance access to the worldwide data storage, compute, and information Grids being constructed in anticipation of the needs of the Large Hadron Collider at CERN. We report on the rapid progress in the development of a second server implementation in the Java language, the evolution of a peer-to-peer network of Clarens servers, and general improvements in client and server implementations. Services that are implemented at this time include read/write file access, service lookup and discovery, configuration management, job execution, Virtual Organization Management, an LHCb Information Service, as well as web service interfaces to POOL replica location and metadata catalogs, MonaLISA monitoring information, CMS MCRunjob workflow management, BOSS job monitoring and bookkeeping, Sphinx job scheduler and Chimera virtual data systems. Commodity web service protocols allows a wide variety of computing platoforms and applications to be used to securely access Clarens services, including a standard web browser, Java applets and stand-alone applications, the ROOT data analysis package, as well as libraries that provide programmatic access from the Python, C/C++ and Java languages.
Speaker: C. Steenberg (California Institute of Technology)
• 220
Experiences with the gLite Grid Middleware
The ARDA project was started in April 2004 to support the four LHC experiments (ALICE, ATLAS, CMS and LHCb) in the implementation of individual production and analysis environments based on the EGEE middleware. The main goal of the project is to allow a fast feedback between the experiment and the middleware development teams via the construction and the usage of end-to-end prototypes allowing users to perform analyses out of the present data sets from recent montecarlo productions. The LCG ARDA project is contributing to the development of the new EGEE Grid middleware by exercising it with realistic analysis systems developed within the four LHC experiments. We will present our experiences in using the EGEE middleware in first prototypes developed by the experiments together with the ARDA project. We will cover aspects such as the usability of individual components of the middleware and give an overview on which components are used by which experiments.
Speaker: Birger KOBLITZ (CERN)
• 221
Global Distributed Parallel Analysis using PROOF and AliEn
The ALICE experiment and the ROOT team have developed a Grid-enabled version of PROOF that allows efficient parallel processing of large and distributed data samples. This system has been integrated with the ALICE-developed AliEn middleware. Parallelism is implemented at the level of each local cluster for efficient processing and at the Grid level, for optimal workload management of distributed resources. This system allows harnessing large Computing on Demand capacity during an interactive session. Remote parallel computations are spawned close to the data, minimising network traffic. If several copies of the data are available, a workload management system decides automatically where to send the task. Results are automatically merged and displayed at the user workstation. The talk will describe the different components of the system (PROOF, the parallel ROOT engine, and the AliEn middleware), the present status and future plans for the development and deployment and the consequences for the ALICE computing model.
Speaker: F. Rademakers (CERN)
• 222
Software agents in data and workflow management
CMS currently uses a number of tools to transfer data which, taken together, form the basis of a heterogenous datagrid. The range of tools used, and the directed, rather than optimised nature of CMS recent large scale data challenge required the creation of a simple infrastructure that allowed a range of tools to operate in a complementary way. The system created comprises a hierarchy of simple processes (named agents) that propagate files through a number of transfer states. File locations and some application metadata were stored in POOL file catalogues, with LCG LRC or MySQL backends. Agents were assigned limited responsibilities, and were restricted to communicating state in a well-defined, indirect fashion through a central transfer management database. In this way, the task of distributing data was easily divided between different groups for implementation. The prototype system was developed rapidly, and achieved the required sustained transfer rate of ~10 MBps, with O(10^6) files distributed to 6 sites from CERN. Experience with the system during the data challenge raised issues with underlying technology (MSS write/read, stability of the LRC, maintenance of file catalogues, synchronisation of filespaces _) which have been successfully identified and handled. The development of this prototype infrastructure allows us to plan the evolution of backbone CMS data distribution from a simple hierarchy to a more autonomous, scalable model drawing on emerging agent and grid technology.
Speaker: T. Barrass (CMS, UNIVERSITY OF BRISTOL)
• 4:00 PM
Coffee break
• 223
Housing Metadata for the Common Physicist Using a Relational Database
SAM was developed as a data handling system for Run II at Fermilab. SAM is a collection of services, each described by metadata. The metadata are modeled on a relational database, and implemented in ORACLE. SAM, originally deployed in production for the D0 Run II experiment, has now been also deployed at CDF and is being commissioned at MINOS. This illustrates that the metadata decomposition of its services has a broader applicability than just one experiment. A joint working group on metadata with representatives from ATLAS, BaBar, CDF, CMS, D0, and LHCB in cooperation with EGEE has examined this metadata decomposition in the light of general HEP user requirements. Greater understanding of the required services of a performant data handling system has emerged from Run II experience. This experience is being merged with the understanding being developed in the course of LHC experience with data challenges and user case discussions. We describe the SAM schema and the commonalities of function and service support between this schema and proposals for the LHC experiments. We describe the support structure required for SAM schema updates, the use of development, integration, and production instances. We are also looking at the LHC proposals for the evolution of schema using keyword-value pairs that are then transformed into a normalized, performant database schema.
• 224
Lattice QCD Data and Metadata Archives at Fermilab and the International Lattice Data Grid
The lattice gauge theory community produces large volumes of data. Because the data produced by completed computations form the basis for future work, the maintenance of archives of existing data and metadata describing the provenance, generation parameters, and derived characteristics of that data is essential not only as a reference, but also as a basis for future work. Development of these archives according to uniform standards both in the data and metadata formats provided and in the software interfaces to the component services could greatly simplify collaborations between institutions and enable the dissemination of meaningful results. This paper describes the progress made in the development of a set of such archives at the Fermilab lattice QCD facility. We are coordinating the development of the interfaces to these facilities and the formats of the data and metadata they provide with the efforts of the international lattice data grid (ILDG) metadata and middleware working groups, whose goals are to develop standard formats for lattice QCD data and metadata and a uniform interface to archive facilities that store them. Services under development include those commonly associate with data grids: a service registry, a metadata database, a replica catalog, and an interface to a mass storage system. All services provide GSI authenticated web service interfaces following modern standards, including WSDL and SOAP, and accept and provide data and metadata following recent XML based formats proposed by the ILDG metadata working group.
Speaker: E. Neilsen (FERMI NATIONAL ACCELERATOR LABORATORY)
• 225
Huge Memory systems for data-intensive science
Speaker: Richard Mount (SLAC)
• Distributed Computing Systems and Experiences Ballsaal

### Ballsaal

#### Interlaken, Switzerland

• 226
BaBar simulation production - A millennium of work in under a year
for the BaBar Computing Group. The analysis of the BaBar experiment requires many times the measured data to be produced in simulation. This requirement has resulted in one of the largest distributed computing projects ever completed. The latest round of simulation for BaBar started in early 2003, and completed in early 2004, and encompassed over 1 million jobs, and over 2.2 billion events. By the end of the production cycle over 2 dozen different computing centers and nearly 1.5 thousand cpus were in constant use in North America and Europe. The whole effort was managed from a central database at SLAC, with real-time updates of the status of all jobs. Utilities were developed to tie together production with many different batch systems, and with different needs for security. The produced data was automatically transfered to SLAC for use and distribution to analysis sites. The system developed to manage this effort was a combination of web and database applications, and command line utilities. The technologies used to complete this effort along with its complete scope will be presented.
Speaker: D. Smith (STANFORD LINEAR ACCELERATOR CENTER)
• 227
Role of Tier-0, Tier-1 and Tier-2 Regional Centres in CMS DC04
The CMS 2004 Data Challenge (DC04) was devised to test several key aspects of the CMS Computing Model in three ways: by trying to sustain a 25 Hz reconstruction rate at the Tier-0; by distributing the reconstructed data to six Tier-1 Regional Centers (FNAL in US, FZK in Germany, Lyon in France, CNAF in Italy, PIC in Spain, RAL in UK) and handling catalogue issues; by redistributing data to Tier-2 centers for analysis. Simulated events, up to the digitization step, were produced prior to the DC as input for the reconstruction in the Pre-Challenge Production (PCP04). In this paper, the model of the Tier-0 implementation used in DC04 is described, as well as the experience gained in using the newly developed data distribution management layer, which allowed CMS to successfully direct the distribution of data from Tier-0 to Tier-1 sites by loosely integrating a number of available Grid components. While developing and testing this system, CMS explored the overall functionality and limits of each component, in any of the different implementations which were deployed within DC04. The role of Tier-1's is presented and discussed, from the import of reconstructed data from Tier-0, to the archiving on to the local mass storage system and the data distribution management to Tier-2's for analysis. Participating Tier- 1's differed in available resources, set-up and configuration: a critical evalutation of the results and performances achieved adopting different strategies in the organization and management of each Tier-1 center to support CMS DC04 is presented.
• 228
AMS-02 Computing and Ground Data Handling
AMS-02 Computing and Ground Data Handling. V.Choutko (MIT, Cambridge), A.Klimentov (MIT, Cambridge) and M.Pohl (Geneva University) AMS (Alpha Magnetic Spectrometer) is an experiment to search in space for dark matter and antimatter on the International Space Station (ISS). The AMS detector had a precursor flight in 1998 (STS- 91, June 2-12, 1998). More than 100M events were collected and analyzed. The final detector (AMS-02) will be installed on ISS in the fall of 2007 for at least 3 years. The data will be transmitted from ISS to NASA Marshall Space Flight Center (MSFC, Huntsvile, Alabama) and transfered to CERN (Geneva Switzerland) for processing and analysis. We are presenting the AMS-02 Ground Data Handling scenario and requirements to AMS ground centers: the Payload Operation and Control Center (POCC) and the Science Operation Center (SOC). The Payload Operation and Control Center is where AMS operations take place, including commanding, storage and analysis of house keeping data and partial science data analysis for rapid quality control and feed back. The AMS Science Data Center receives and stores all AMS science and house keeping data, as well as ancillary data from NASA. It ensures full science data reconstruction, calibration and alignment; it keeps data available for physics analysis and archives all data. We also discuss the AMS-02 distributed MC production currently running in 15 Universities and Labs in Europe, USA and Asia, with automatic jobs submission and control from one central place (CERN). The software uses CORBA technology to control and monitor MC production and an ORACLE relational database, to keep catalogues, event description as well as production and monitoring information.
Speaker: A. Klimentov (A)
• 229
Distributed Computing Grid Experiences in CMS DC04
In March-April 2004 the CMS experiment undertook a Data Challenge(DC04). During the previous 8 months CMS undertook a large simulated event production. The goal of the challenge was to run CMS reconstruction for sustained period at 25Hz input rate, distribute the data to the CMS Tier-1 centers and analyze them at remote sites. Grid environments developed in Europe by the LHC Computing Grid (LCG) in Europe and in the US with Grid2003 were utilized to complete the aspects of the challenge. During the simulation phase, US-CMS utilized Grid2003 to simulate and process approximately 17 million events. Simultaneous usage of CPU resources peaked at 1200 CPUs, controlled by a single FTE. Using Grid3 was a milestone for CMS computing in reaching a new magnitude in the number of autonomously cooperating computing sites for production. The use of Grid-based job execution resulted in reducing the overall support effort required to submit and monitor jobs by a factor of two. During the challenge itself, the CMS groups from Italy and Spain used the LCG Grid Environment to satisfy challenge requirements . The LCG Replica Manager was used to transfer the data. The CERN RLS provided the needed replica catalogue functionality. The LCG submission system based on the Resource Broker was used to submit analysis jobs to the sites hosting the data. A CMS dedicated GridICE monitoring was activated to monitor both services and resources. A description of the experiences, successes and lessons learned from both experiences with grid infrastructure is presented.
Speaker: A. Fanfani (INFN-BOLOGNA (ITALY))
• 230
Experience producing simulated events for the DZero experiment on the SAM-Grid
Most of the simulated events for the DZero experiment at Fermilab have been historically produced by the “remote” collaborating institutions. One of the principal challenges reported concerns the maintenance of the local software infrastructure, which is generally different from site to site. As the understanding of the community on distributed computing over distributively owned and shared resources progresses, it becomes increasingly interesting the adoption of grid technologies to address the production of montecarlo events for high energy physics experiments. The SAM-Grid is a software system developed at Fermilab, which integrates standard grid technologies for job and information management with SAM, the data handling system of the DZero and CDF experiments. During the past few months, this grid system has been tailored for the montecarlo production of DZero. Since the initial phase of deployment, this experience has exposed an interesting series of requirements to the SAM-Grid services, the standard middleware, the resources and their management and to the analysis framework of the experiment. As of today, the inefficiency due to the grid infrastructure has been reduced to as little as 1%. In this paper, we present our statistics and the "lesson learned" in running large high energy physics applications on a grid infrastructure.
Speaker: Rob KENNEDY (FNAL)
• 231
The ALICE Data Challenge 2004 and the ALICE distributed analysis prototype
During the first half of 2004 the ALICE experiment has performed a large distributed computing exercise with two major objectives: to test the ALICE computing model, included distributed analysis, and to provide data sample for a refinement of the ALICE Jet physics Monte-Carlo studies. Simulation reconstruction and analysis of several hundred thousand events were performed, using the heterogeneous resources of tens of computer centres worldwide. These resources belong to different GRID systems and were steered by the AliEn (ALICE Environment) framework, acting as a meta-GRID. This has been a very thorough test of the middleware of AliEn and LCG (LCG-2 and grid.it resources) and their compatibility. During the Data Challenge more than 1,500 jobs run in parallel for several weeks. More than 50 TB of data have been produced and analysed worldwide in one of the major exercises of this kind run to date. ALICE has developed an analysis system based on AliEn and ROOT. This system starts with a metadata selection in the AliEn file catalogue, followed by a computation phase. Analysis jobs are sent where the data is, thus minimising data movement. The control is performed by an intelligent workload management system. The analysis can be done either via batch or interactive jobs. The latter are "spawned" on remote systems and report the results back to the user workstation. The talk will describe the ALICE experience with this large-scale use of the Grid, the major lessons learned and the consequences for the ALICE computing model.
Speaker: A. Peters (ce)
• 4:00 PM
Coffee break
• 232
Results of the LHCb experiment Data Challenge 2004
The LHCb experiment performed its latest Data Challenge (DC) in May-July 2004. The main goal was to demonstrate the ability of the LHCb grid system to carry out massive production and efficient distributed analysis of the simulation data. The LHCb production system called DIRAC provided all the necessary services for the DC: Production and Bookkeeping Databases, File catalogs, Workload and Data Management systems, Monitoring and Accounting tools. It allowed to combine in a consistent way resources of more than 20 LHCb production sites as well as the LCG2 grid resources. 200M events constituting 90 TB of data were produced and stored in 6 Tier 1 centers. The subsequent analysis was carried out at CERN as well as in all the Tier 1 centers to where preselected datasets were distributed. The GANGA User Interface was used to assist users in preparation of their analysis jobs and running them on the local and remote computing resources. We will present the DC results, the experience gained utilising DIRAC and LCG2 grids as well as further developments necessary to achieve the scalability level of the real running LHCb experiment.
Speaker: J. Closier (CERN)
• 233
ATLAS Data Challenge Production on Grid3
We describe the design and operational experience of the ATLAS production system as implemented for execution on Grid3 resources. The execution environment consisted of a number of grid-based tools: Pacman for installation of VDT-based Grid3 services and ATLAS software releases, the Capone execution service built from the Chimera/Pegasus virtual data system for directed acyclic graph (DAG) generation, DAGMan/Condor-G for job submission and management , and the Windmill production supervisor which provides the messaging system for distributing production tasks to Capone. Produced datasets were registered into a distributed replica location service (Globus RLS) that was integrated with the Don Quixote proxy service for interoperability with other Grids used by ATLAS. We discuss performance, scalability, and fault handling during the first phase of ATLAS Data Challenge 2.
Speaker: M. Mambelli (UNIVERSITY OF CHICAGO)
• 234
Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2
This talk describes the various stages of ATLAS Data Challenge 2 (DC2) in what concerns usage of resources deployed via NorduGrid's Advanced Resource Connector (ARC). It also describes the integration of these resources with the ATLAS production system using the Dulcinea executor. ATLAS Data Challenge 2 (DC2), run in 2004, was designed to be a step forward in the distributed data processing. In particular, much coordination of task assignment to resources was planned to be delegated to Grid in its different flavours. An automatic production management system was designed, to direct the tasks to Grids and conventional resources. The Dulcinea executor is a part of this system that provides interface to the information system and resource brokering capabilities of the ARC middleware. The executor translates the job definitions recieved from the supervisor to the extended resource specification language (XRSL) used by the ARC middleware. It also takes advantage of the ARC middleware's built-in support for the Globus Replica Location Server (RLS) for file registration and lookup. NorduGrid's ARC has been deployed on many ATLAS-dedicated resources across the world in order to enable effective participation in ATLAS DC2. This was the first attempt to harness large amounts of strongly heterogeneous resources in various countries for a single collaborative exercise using Grid tools. This talk addresses various issues that arose during different stages of DC2 in this environment: preparation, such as ATLAS software installation; deployment of the middleware; and processing. The results and lessons are summarized as well.
• Event Processing Kongress-Saal

### Kongress-Saal

#### Interlaken, Switzerland

• 235
The simulation for the ATLAS experiment: present status and outlook
The simulation for the ATLAS experiment is presently operational in a full OO environment and it is presented here in terms of successful solutions to problems dealing with application in a wide community using a common framework. The ATLAS experiment is the perfect scenario where to test all applications able to satisfy the different needs of a big community. Following a well stated strategy of transition from the GEANT3 to the GEANT4-based simulation, a good validation programme during the last months confirmed the characteristics of reliability, performance and robustness of this new tool in comparison with the results of the previous simulation. Generation, simulation and digitization steps on different full sets of physics events were tested in terms of performance and robustness in comparisons with the same samples undergoing the old GEANT3-based simulation. The simulation program is simultaneously tested on all different testbeam setups characterizing the R&D programme of all subsystems belonging to the ATLAS detector with comparison to real data in order to validate the physics content and the reliability in the detector description of each component.
Speaker: Prof. A. Rimoldi (PAVIA UNIVERSITY & INFN)
• 236
An Object-Oriented Simulation Program for CMS
The CMS detector simulation package, OSCAR, is based on the Geant4 simulation toolkit and the CMS object-oriented framework for simulation and reconstruction. Geant4 provides a rich set of physics processes describing in detail electro-magnetic and hadronic interactions. It also provides the tools for the implementation of the full CMS detector geometry and the interfaces required for recovering information from the particle tracking in the detectors. This functionality is interfaced to the CMS framework, which, via its "action on demand" mechanisms, allows the user to selectively load desired modules and to configure and tune the final application. The complete CMS detector is rather complex with more than 12 million readout channels and more than 1 million geometrical volumes. OSCAR has been validated by comparing its results with test beam data and with results from simulation with a GEANT3-based program. It has been succesfully deployed in the 2004 data challenge for CMS, where ~20 million events for various LHC physics channels were simulated and analysed. Authors: S. Abdulline, V. Andreev, P. Arce, S. Arcelli, S. Banerjee, T. Boccali, M. Case, A. De Roeck, S. Dutta, G. Eulisse, D. Elvira, A. Fanfani, F. Ferro, M. Liendl, S. Muzaffar, A. Nikitenko, K. Lassila-Perini, I. Osborne, M. Stavrianakou, T. Todorov, L. Tuura, H.P. Wellisch, T. Wildish, S. Wynhoff, M. Zanetti, A. Zhokin, P. Zych
Speaker: M. Stavrianakou (FNAL)
• 237
The Virtual MonteCarlo : status and applications
The current major detector simulation programs, i.e. GEANT3, GEANT4 and FLUKA have largely incompatible environments. This forces the physicists willing to make comparisons between the different transport Monte Carlos to develop entirely different programs. Moreover, migration from one program to the other is usually very expensive, in manpower and time, for an experiment offline environment, as it implies substantial changes in the simulation code. To solve this problem, the ALICE Offline project has developed a virtual interface to these three programs allowing their seamless use without any change in the framework, the geometry description or the scoring code. Moreover a new geometrical modeller has been developed in collaboration with the ROOT team, and successfully interfaced to the three programs. This allows the use of one description of the geometry, which can be used also during reconstruction and visualisation. The talk will describe the present status and future plans for the Virtual Monte Carlo. It will also present the capabilities and performance of the geometrical modeller.
Speaker: A. Gheata (CERN)
• 238
From Geant 3 to Virtual Monte Carlo: Approach and Experience
The STAR Collaboration is currently using simulation software based on Geant 3. The emergence of the new Monte Carlo simulation packages, coupled with evolution of both STAR detector and its software, requires a drastic change of the simulation framework. We see the Virtual Monte Carlo (VMC) approach as providing a layer of abstraction that facilitates such transition. The VMC platform is a candidate to replace the present legacy software, and help avoid its certain shortcomings, such as the use of a particular algorithmic language to describe the detector geometry. It will also allow us to introduce a more flexible in-memory representation of the geometry. The Virtual Monte Carlo concept includes a platform-neutral kernel of the application, to the highest degree possible. This kernel is then equipped with interfaces to the modules responsible for simulating the physics of particle propagation, and tracking. We consider the geometry description classes in the ROOT system (in its latest form known as TGeo classes) as a good choice for the in-memory geometry representation. We present an application design based on the Virtual Monte Carlo, along with the results of testing, benchmarking and comparison to Geant 3. Internal event representation and IO model will be also discussed.
Speaker: M. POTEKHIN (BROOKHAVEN NATIONAL LABORATORY)
• 239
The High Level Trigger software for the CMS experiment
The observation of Higgs bosons predicted in supersymmetric theories will be a challenging task for the CMS experiment at the LHC, in particular for its High Level trigger (HLT). A prototype of the High Level Trigger software to be used in the filter farm of the CMS experiment and for the filtering of monte carlo samples will be presented. The implemented prototype heavily uses recursive processing of a HLT tree and allows dynamic trigger definition. Firstly the general architecture and design choices as well as the timing performance of the system will be reviewed in the light of the DAQ constrains. Secondly, specific trigger implementations in the context of the object-oriented Reconstruction for CMS Analysis (ORCA) software will be detailed. Finally, the analysis for the selection of a CP even Higgs decaying in tau pairs will be presented. The Aforementioned analysis will illustrate the importance of the trigger strategies required to achieve the various physics analysis in CMS.
Speaker: O. van der Aa (INSTITUT DE PHYSIQUE NUCLEAIRE, UNIVERSITE CATHOLIQUE DE LOUVAIN)
• 240
Fast tracking for the ATLAS LVL2 Trigger
We present a set of algorithms for fast pattern recognition and track reconstruction using 3D space points aimed for the High Level Triggers (HLT) of multi-collision hadron collider environments. At the LHC there are several interactions per bunch crossing separated along the beam direction, z. The strategy we follow is to (a) identify the z-position of the interesting interaction prior to any track reconstruction; (b) select groups of space points pointing back to this z-position, using a histogramming technique which avoids performing any combinatorics; and (c) proceed to the combinatorial tracking only within the individual groups of space points. The validity of this strategy will be demonstrated with results in terms of timing and physics performance for the LVL2 trigger of ATLAS at the LHC, although the strategy is generic and can be applied to any multi-collision hadron collider experiment. In addition, the algorithms are conceptually simple, flexible and robust and hence appropriate for use in demanding, online environments. We will also make qualitative comparisons with an alternative, complimentary strategy, based on the use of look-up tables for handling combinatorics, that has been developed for the ATLAS LVL2 trigger. These algorithms have been used for the results that appear in the ATLAS HLT, DAQ and Controls Technical Design Report, which was recently approved by the LHC Committee.
Speaker: Dr N. Konstantinidis (UNIVERSITY COLLEGE LONDON)
• 4:00 PM
Coffee break
• 241
Implementation and Performance of the High-Level Trigger electron and photon selection for the ATLAS experiment at the LHC
The ATLAS experiment at the Large Hadron Collider (LHC) will face the challenge of efficiently selecting interesting candidate events in pp collisions at 14 TeV center- of-mass energy, whilst rejecting the enormous number of background events, stemming from an interaction rate of about 10^9 Hz. The Level-1 trigger will reduce the incoming rate to around O(100 kHz). Subsequently, the High-Level Triggers (HLT), which are comprised of the second level trigger and the event filter, will need to reduce this rate further by a factor of O(10^3). The HLT selection is software based and will be implemented on commercial CPUs using a common framework, which is based on the standard ATLAS object-oriented software architecture. In this talk an overview of the current implementation of the selection for electrons and photons in the trigger is given. The performance of this implementation has been evaluated using Monte Carlo simulations in terms of the efficiency for the signal channels, the rate expected for the selection, the data preparation times, and the algorithm execution times. Besides the efficiency and rate estimates, some physics examples will be discussed, showing that the triggers are well adapted for the physics programme envisaged at LHC. The electron/gamma trigger software has been also integrated in the ATLAS 2004 combined test-beam, to validate the chosen selection architecture in a real on-line environment.
Speaker: Manuel Dias-Gomez (University of Geneva, Switzerland)
• 242
Event Data Model in ATLAS
The event data model (EDM) of the ATLAS experiment is presented. For large collaborations like the ATLAS experiment common interfaces and data objects are a necessity to insure easy maintenance and coherence of the experiments software platform over a long period of time. The ATLAS EDM improves commonality across the detector subsystems and subgroups such as trigger, test beam reconstruction, combined event reconstruction, and physics analysis. The object oriented approach in the description of the detector data allows the possibility to have one common raw data flow. Furthermore the EDM allows the use of common software between online data processing and offline reconstruction. One important component of the ATLAS EDM is a common track class which is used for combined track reconstruction across the innermost tracking subdetectors and is also used for tracking in the muon detectors. The structure of the track object and the variety of track parameters are presented. For the combined event reconstruction a common particle class is introduced which serves as the interface between event reconstruction and physics analysis.
Speaker: Edward Moyse
• 243
How to build an event store - the new Kanga Event Store for BaBar
In the past year, BaBar has shifted from using Objectivity to using ROOT I/O as the basis for our primary event store. This shift required a total reworking of Kanga, our ROOT-based data storage format. We took advantage of this opportunity to ease the use of the data by supporting multiple access modes that make use of many of the analysis tools available in ROOT. Specifically, our new event store supports: 1) the pre-existing separated transient + persistent model, 2) a transient based load-on-demand model currently being developed, 3) direct access to persistent data classes in compiled code, 4) fully interactive access to persistent data classes from either the ROOT prompt and via interpreted macros. We will describe key features of Kanga including: 1) the separation and management of transient and persistent representations of data, 2) the implementation of read on demand references in ROOT, 3) the modular and extensible persistent event design, 4) the implementation of schema evolution and 5) BaBar specific extensions to core ROOT classes that we used to preserve the end-user "feel" of ROOT.
Speaker: Dr M. Steinke (Ruhr Universitaet Bochum)
• 244
Using the reconstruction software, ORCA, in the CMS datachallenge
We report on the software for Object-oriented Reconstruction for CMS Analysis, ORCA. It is based on the Coherent Object-oriented Base for Reconstruction, Analysis and simulation (COBRA) and used for digitization and reconstruction of simulated Monte-Carlo events as well as testbeam data. For the 2004 data challenge the functionality of the software has been extended to store collections of reconstructed objects (DST) as well as the previously storable quantities (Digis) in multiple, parallel streams. We describe the structure of the DST, the way to ensure and store the configuration of reconstruction algorithms that fill the collections of reconstructed objects as well as the relations between them. Also the handling of multiple streams to store parts of selected events is discussed. The experience from the implementation used early 2004 and the modifications for future optimization of reconstruction and analysis are presented.
Speaker: