These are the Web pages providing information for the upcoming Computing in High Energy Physics (CHEP) conference in September 2004.
CHEP conferences provide an international forum to exchange information on computing experience and needs for the High Energy Physics community, and to review recent, ongoing and future activities.
CHEP conferences are held every 18 months, the previous one being held in San Diego in March 2003.
"Where are your Wares"
Computing in the broadest sense has a long history, and Babbage (1791-1871),
Hollerith (1860-1929) Zuse (1910-1995), many other early pioneers, and the wartime
code breakers, all made important breakthroughs. CERN was founded as the first
valve-based digital computers were coming onto the market.
I will consider 50 years of Computing at CERN from the following viewpoints:-
Where did we come from? What happened? Who was involved? Which wares (hardware,
software, netware, peopleware and now middleware) were important? Where did
computers (not) end up in a physics lab? What has been the impact of computing on
particle physics? What about the impact of particle physics computing on other
sciences? And the impact of our computing outside the scientific realm?
I hope to conclude by looking at where we are going, and by reflecting on why
computing is likely to remain challenging for a long time yet.
The topic is so vast that my remarks are likely to be either prejudiced or trivial,
50 years of Computing at CERN
Video in CDS
Run II computing
In support of the Tevatron physics program, the Run II experiments have
developed computing models and hardware facilities to support data sets at
the petabyte scale, currently corresponding to 500 pb-1 of data and over 2
years of production operations. The systems are complete from online
data collection to user analysis, and make extensive use of central services
and common solutions developed with the FNAL CD and experiment collaborating
institutions, and make use of global facilities to meet the computing needs.
We describe the similiarities and differences between computing on CDF and
D0 while describing solutions for database and database servers, data
handling, movement and storage and job submission mechanisms. The
facilities for production computing and analysis and the use of commody
fileservers will also be described. Much of the knowledge gained from
providing computing at this scale can be abstracted and applied to design
and planning for future experiments with large scale computing.
(FERMI NATIONAL ACCELERATOR LABORATORY)
Video in CDS
Plenary: Session 2Kongress-Saal
Computing for Belle
The Belle experiment operates at the KEKB accelerator, a high
luminosity asymmetric energy e+ e- machine. KEKB has achieved the
world highest luminosity of 1.39 times 10^34 cm-2s-1. Belle
accumulates more than 1 million B Bbar pairs in one good day.
This corresponds to about 1.2 TB of raw data per day. The amount of
the raw and processed data accumulated so far exceeds 1.4 PB.
Belle's computing model has been a traditional one and very
successful so far. The computing has been managed by minimal number
of people using cost effective solutions. Looking at the future,
KEKB/Belle plans to improve the luminosity to a few times 10^35 cm-
2s-1, 10 times as much as we obtain now. This presentation
describes Belle's efficient computing operations, struggles to
manage large amount of raw and physics data, and plans for
Belle computing for Super KEKB/Belle.
Video in CDS
BaBar computing - From collisions to physics results
The BaBar experiment at SLAC studies B-physics at the Upsilon(4S) resonance using
the high-luminosity e+e- collider PEP-II at the Stanford Linear Accelerator Center
(SLAC). Taking, processing and analyzing the very large data samples is a
significant computing challenge.
This presentation will describe the entire BaBar computing chain and illustrate
the solutions chosen as well as their evolution with the ever higher luminosity
being delivered by PEP-II. This will include data acquisition and software
triggering in a high availability, low-deadtime online environment, a prompt,
automated calibration pass through the data SLAC and then the full reconstruction of
the data that takes place at INFN-Padova within 24 hours. Monte Carlo production
takes place in a highly automated fashion in 25+ sites. The resulting real and
simulated data is distributed and made available at SLAC and other computing centers.
For analysis a much more sophisticated skimming pass has been introduced in the
past year, along with a reworked eventstore. This allows 120 highly customized
analysis-specific skims to be produced for direct use by the analysis groups. This
skim data format is the same eventstore data as that produced directly by the data
and monte carlo productions and can be handled and distributed in the same way.
The total data volume in BaBar is about 1.5PB.
Video in CDS
Concepts and technologies used in contemporary DAQ systems
The concepts and technologies applied in data acquisition systems have changed
dramatically over the past 15 years. Generic DAQ components and standards such as
CAMAC and VME have largely been replaced by dedicated FPGA and ASIC boards, and
dedicated real-time operation systems like OS9 or VxWorks have given way to Linux-
based trigger processor and event building farms. We have also seen a shift from
standard or proprietary bus systems used in event building to GigaBit networks and
commodity components, such as PCs. With the advances in processing power, network
throughput, and storage technologes, today's data rates in large experiments
routinely reach hundreds of MegaBytes/s.
We will present examples of contemporary DAQ systems from different experiments, try
to identify or categorize new approaches, and will compare the performance and
throughput of existing DAQ systems with the projected data rates of the LHC
experiments to see how close we have come to accomplish these goals. We will also
try to look beyond the field of High-Energy Physics and see if there are trends and
technologies out there which are worth keeping an eye on.
(Brookhaven National Laboratory)
Video in CDS
Current Status of Fabric Management at CERN
This paper describes the evolution of fabric management at CERN's T0/T1 Computing
Center, from the selection and adoption of prototypes produced by the European
DataGrid (EDG) project to enhancements made to them.
In the last year of the EDG project, developers and service managers have been
working to understand and solve operational and scalability issues.
CERN has adopted and strengthened Quattor, EDG's installation and configuration
management toolsuite, for managing all Linux clusters and servers in the Computing
Center, replacing existing legacy management systems. Enhancements to the original
prototype include a redundant and scalable server architecture using proxy
technology and producing plug-in components for configuring system and LHC computing
CERN now coordinates the maintenance of Quattor, making it available to other sites.
Lemon, the EDG fabric monitoring framework, has been progressively deployed onto
all managed Linux nodes. We have developed sensors to instrument fabric nodes to
provide us with complete performance and exception monitoring information.
Performance visualization displays and interfaces to the existing alarm system have
also been provided.
LEAF, the LHC-Era Automated Fabric toolset, comprises the State Management
System, a tool to enable high-level configuration commands to be issued to sets of
nodes during both hardware and service management Use Cases, and the Hardware
Management System, a tool for administering hardware workflows and for visualizing
and locating equipment.
Finally, we will describe issues currently being addressed and planned future
Developing & Managing a large Linux farm - the Brookhaven Experience
This presentation describes the experiences and the
lessons learned by the RHIC/ATLAS Computing Facility
(RACF) in building and managing its 2,700+ CPU (and growing)
Linux Farm over the past 6+ years. We describe how
hardware cost, end-user needs, infrastructure,
footprint, hardware configuration, vendor selection,
software support and other considerations have
played a role in the process of steering the growth
of the RACF Linux Farm, and how they help shape our
future hardware purchase decisions. As well as a detailed description
challenges encountered and of the solutions used in managing
and configuring a large, heterogenous Linux Farm
(2700+ CPU's) in the midst of an ongoing transition
from being a generally local resource to a global,
Grid-aware resource within a larger, distributed
computing environment is provided.
Developing and Managing a large Linux farm
Lattice QCD Clusters at Fermilab
As part of the DOE SciDAC "National Infrastructure for Lattice Gauge
Computing" project, Fermilab builds and operates production clusters for
lattice QCD simulations. We currently operate three clusters: a 128-node dual
Xeon Myrinet cluster, a 128-node Pentium 4E Myrinet cluster, and a 32-node
dual Xeon Infiniband cluster. We will discuss the operation of these systems
and examine their performance in detail. We will describe the uniform user
runtime environment emerging from the SciDAC collaboration.
The design of lattice QCD clusters requires careful attention towards
balancing memory bandwidth, floating point throughput, and network
performance. We will discuss our investigations of various commodity
processors, including Pentium 4E, Xeon, Itanium2, Opteron, and PPC970, in
terms of their suitability for building balanced QCD clusters. We will also
discuss our early experiences with the emerging Infiniband and PCI Express
architectures. Finally, we will examine historical trends in price to
performance ratios of lattice QCD clusters, and we will present our
predictions and plans for future clusters.
ScotGrid: A prototype Tier 2 centre
ScotGrid is a prototype regional computing centre formed as a collaboration between
the universities of Durham, Edinburgh and Glasgow as part of the UK's national
particle physics grid, GridPP. We outline the resources available at the three core
sites and our optimisation efforts for our user communities. We discuss the work
which has been conducted in extending the centre to embrace new projects both from
particle physics and new user communities and explain our methodology for doing this.
ScotGrid: A prototype Tier 2 centre
A Regional Analysis Center at the University of Florida
The High Energy Physics Group at the University of Florida is involved in a variety
of projects ranging from High Energy Experiments at hadron and electron positron
colliders to cutting edge computer science experiments focused on grid computing.
In support of these activities members of the Florida group have developed and
deployed a local computational facility which consists of several service nodes,
computational clusters and disk storage services. The resources contribute
collectively or individually to a variety of production and development activities
such as the UFlorida Tier2 center for the CMS experiment at the Large Hadron
Collider (LHC), Monte Carlo production for the CDF experiment at Fermi Lab, the
CLEO experiment, and research on grid computing for the GriPhyN and iVDGL projects.
The entire collection of servers, clusters and storage services is managed as a
single facility using the ROCKS cluster management system. Managing the facility as
a single centrally managed system enhances our ability to relocate and reconfigure
the resources as necessary in support of both research and production activities.
In this paper we describe the architecture deployed, including details on our local
implementation of the ROCKS systems, how this simplifies the maintenance and
administration of the facility and finally the advantages and disadvantages of
using such a scheme to manage a modest size facility.
(UNIVERSITY OF FLORIDA)
CHOS, a method for concurrently supporting multiple operating system
Supporting multiple large collaborations on shared compute
farms has typically resulted in divergent requirements from the
users on the configuration of these farms. As the frameworks used
by these collaborations are adapted to use Grids, this issue will likely
have a significant impact on the effectiveness of Grids.
To address these issues, a method was developed at Lawrence Berkeley National
Lab and is being used in production on the PDSF cluster. This method, termed
CHOS, uses a combination of a Linux kernel module, the change
root system call, and several utilities to provide access to
multiple Linux distributions and versions concurrently on a
single system. This method will be presented, along with an explanation
on how it is integrated into the login process, grid services,
and batch scheduler systems. We will also describe how a distribution
is installed and configured to run in this environment and explore
some common problems that arise. Finally, we will relate our experience
in deploying this framework on a production cluster used by several
high energy and nuclear physics collaborations.
(NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER)
CHOS, a method for concurrently supporting multiple operating system
Network Information and Management Infrastructure Project
Management of large site network such as FNAL LAN presents many
technical and organizational challenges. This highly dynamic network
consists of around 10 thousand network nodes. The nature of the
activities FNAL is involved in and its computing policy
require that the network remains as open as reasonably possible
both in terms of connectivity to the outside networks and in with
respect to procedural simplicity of joining the network by temporary
network participants such as visitors notebook computers.
The goal of the Network Information and Management Infrastructure
project at FNAL is to build software infrastructure which would help
network management and computer security teams organize monitoring
and management of the network, simplify communication between
these entities and users, integrate network management into
FNAL computer center management infrastructure.
Primary authors: Phil DeMar (FNAL), Igor Mandrichenko (FNAL),
Don Petravick(FNAL), Dane Skow (FNAL)
Network Information and Management Infrastructure Project
Implementation of a reliable and expandable on-line storage for compute clusters
The HEP experiments that use the regional center GridKa will handle
large amounts of data. Traditional access methods via local disks or
large network storage servers show limitations in size, throughput or
data management flexibility.
High speed interconnects like Fibre Channel, iSCSI or Infiniband as
well as parallel file systems are becoming increasingly important in
large cluster installations to offer the scalable size and throughput
needed for PetaByte storage. At the same time the reliable and proven
NFS protocol allows local area storage access via traditional Ethernet
very cost effectively.
The cluster at GridKa uses the General Parallel File System (GPFS) on
a 20 node file server farm that connects to over 1000 FC disks via a
Storage Area Network. The 130 TB on-line storage is distributed to the
390 node cluster via NFS. A load balancing system ensures an even load
distribution and additionally allows for on-line file server exchange.
Discussed are the components of the storage area network, specific
Linux tools, and the construction and optimisation of the cluster file
system along with the RAID groups. A high availability is obtained and
measurements prove high throughput under different conditions. The use of
the file system administration and management possibilities is presented
as is the implementation and effectiveness of the load balancing system.
Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing
Gfarm v2 is designed for facilitating reliable file sharing and
high-performance distributed and parallel data computing in a Grid
across administrative domains by providing a Grid file system. A
file system is a virtual file system that federates multiple file
systems. It is possible to share files or data by mounting the
virtual file system. This paper discusses the design and
implementation of secure, robust, scalable and high-performance Grid
The most time-consuming, but also the most typical, task in data
computing such as high energy physics, astronomy, space exploration,
human genome analysis, is to process a set of files in the same way.
Such a process can be typically performed independently on every file
in parallel, or at least have good locality. Gfarm v2
supports high-performance distributed and parallel computing for such
a process by introducing a "Gfarm file", a new "file-affinity"
scheduling based on file locations, and new parallel file access
semantics. An arbitrary group of files possibly dispersed across
administrative domains can be managed as a single Gfarm file. Each
member file will be accessed in parallel in a new file view called
"local file view" by a parallel process possibly allocated by
file-affinity scheduling based on replica locations of the member
files. File-affinity scheduling and new file view enable the ``owner
computes'' strategy, or ``move the computation to data'' approach for
parallel and distributed data computing of member files of a Gfarm
file in a single system image.
(GRID TECHNOLOGY RESEARCH CENTER, AIST)
Performance analysis of Cluster File System on Linux
With the development of Linux and improvement of PC's performance, PC cluster used
as high performance computing system is becoming much popular. The performance of
I/O subsystem and cluster file system is critical to a high performance computing
system. In this work the basic characteristics of cluster file systems and their
performance are reviewed. The performance of four distributed cluster file systems,
AFS, NFS, PVFS and CASTOR, were measured. The measurements were carried out on CERN
version RedHat 7.3.3 Linux using standard I/O performance benchmarks. Measurements
show that for single-server single client configuration, NFS, CASTOR and PVFS have
better performance and write rate slightly increases while the record length becomes
larger. CASTOR has the best throughput when the number of write processes increases.
PVFS and CASTOR are tested on multi-server and multi-client system. The two file
systems nicely distribute data I/O to all servers. CASTOR RFIO protocol shows the
best utilization of network bandwidth and optimized to large data size files. CASTOR
also has the better scalability as a cluster file system. Based on the test some
methods are proposed to improve the performance of cluster file system.
(COMPUTING CENTER,INSTITUTE OF HIGH ENERGY PHYSICS,CHINESE ACADEMY OF SCIENCES)
Chimera - a new, fast, extensible and Grid enabled namespace service
After successful implementation and deployment of the dCache system
over the last years, one of the additional required services, the
namespace service, is faced additional and completely new
requirements. Most of these are caused by scaling the system, the
integration with Grid services and the need for redundant (high
availability) configurations. The existing system, having only an
NFSv2 access path, is easy to understand and well accepted by the
users. This single 'access path' limits data management task to make
use of classical tools like 'find', 'ls' and others. This is intuitiv
for most users, but failed while dealing with millions of entries
(files) and more sophisticated organizational schemes (metadata). The
new system should support a native programmable interface (deep
coupled, but fast), the 'classical' NFS path (now version 3 at
a dCache native access and the SQL path allowing any type of metadata
to be used in complex queries. Extensions with other 'access paths'
will be possible. Based on the experience with the current system we
highlight on the following requirements:
- large file support (64 Bit) + large number of files (> 10^8)
- Platform independents (runtime + persistent objects)
- Grid name service integration
- custom dCache integration
- redundant, high available runtime configurations (concurrent
- user usable metadata (store and query)
- ACL support
- pluggable authentication (e.g. GSSAPI)
- external processes can register for namespace events (e.g.
The presentation will show a detailed analysis of the requirements,
the choosen design and selection of existing components. The current
schedule should allow to show the first prototype results.
A Dynamically Reconfigurable Data Stream Processing System
The paper describes a component-based framework for data stream processing that
allows for configuration, tailoring, and run-time system reconfiguration. The
system’s architecture is based on a pipes and filters pattern, where data is passed
through routes between components. Components process data and add, substitute,
and/or remove named data items from a data stream. They can also manipulate data
streams by buffering data, compressing/decompressing individual streams, and
combining, splitting, or synchronizing multiple data streams. Configurable general-
purpose filters for manipulating streams, visualizing data, persisting data, and
reading data from various standard data sources are supplemented with many
application specific filters, such as DSP, scripting, or instrumentation-specific
components. A network of pipes and filters can be dynamically reconfigured at run-
time, in response to a preplanned sequence of processing steps, operator
intervention, or a change in one or more data streams. Four distinctive methods
supporting reconfiguration are provided by the framework: modification of data
routes, management of components’ activity states, triggering processing based on
the content of the data, or the use of source addressing in components. The
framework can be used to build static data stream processing applications such as
monitoring or data acquisition systems as well as self-adjusting systems that would
adapt their processing algorithm, presentation layer, or data persistency layer in
response to changes in input data streams.
(FERMI NATIONAL ACCELERATOR LABORATORY)
The SEAL Component Model
This paper describes the component model that has been developed in the context of
the LCG/SEAL project. This component model is an attempt to handle the increasing
complexity in the current data processing applications of LHC experiments. In
addition, it should facilitate the software re-use by the integration of software
components from LCG and non-LCG into the experiment's applications. The component
model provides the basic mechanisms and base classes that facilitate the
decomposition of the whole C++ object-oriented application into a number of run-time
pluggable software modules with well defined generic behavior, inter-component
interaction protocols, run-time configuration and user customization.
This new development is based on the ideas and practical experiences of the various
software frameworks in use by the different LHC experiments for several years. The
design and implementation choices will be described and the practical experiences
and difficulties in adopting this model to existing experiment software systems will
The SEAL C++ Reflection System
The C++ programming language has very limited capabilities for
reflection information about its objects. In this paper a new reflection
system will be presented, which allows complete introspection of C++
objects and has been developed in the context of the CERN/LCG/SEAL
project in collaboration with the ROOT project.
The reflection system consists of two different parts. The first part is
a code generator that produces automatically reflection information from
existing C++ classes. This generation of the reflection information is
done in a non intrusive way, which means that the original C++ classes
definition do not need to be changed or instrumented. The second part of
the reflection system is able to load/build this information in memory
and provides an API to the user.
The user can query reflection information from any C++ class and also
interact generically with the objects, like invocation of functions,
setting and getting data members or constructing and deleting objects.
When designing the different packages it was taken care of having
minimal dependencies on external software and a possibility to port the
software to different platforms/compilers.
A quick overview of the current implementation in use by the LCG SEAL
and POOL projects will be given. A more detailed description of the new
model, which aims to reflect the complete C++ language and to be a
common reflection system used also by the ROOT framework, will be given.
Reflection-Based Python-C++ Bindings
Python is a flexible, powerful, high-level language with excellent
interactive and introspective capabilities and a very clean syntax. As
such it can be a very effective tool for driving physics analysis.
Python is designed to be extensible in low-level C-like languages, and
its use as a scientific steering language has become quite widespread.
To this end, existing and custom-written C or C++ libraries are bound
to the Python environment as so-called extension modules. A number of
tools for easing the process of creating such bindings exist, such as
SWIG or Boost.Python. Yet, the the process still requires a
considerable amount of effort and expertise.
The C++ language has little built-in introspective capabilities, but
tools such as LCGDict and CINT add this by providing so-called
dictionaries: libraries that contain information about the names,
entry points, argument types, etc. of other libraries.
The reflection information from these dictionaries can be used for the
creation of bindings and so the process can be fully automated, as
dictionaries are already provided for many end-user libraries for
other purposes, such as object persistency.
PyLCGDict is a Python extension module that uses LCG dictionaries, as
PyROOT uses CINT reflection information, to allow Python users to
access C++ libraries with essentially no preparation on the users'
behalf. In addition, and in a similar way, PyROOT gives ROOT users
access to Python libraries.
AIDA, JAIDA and AIDAJNI: Data Analysis using interfaces
AIDA, Abstract Interfaces for Data Analysis, is a set of abstract interfaces
for data analysis components: Histograms, Ntuples, Functions, Fitter,
Plotter and other typical analysis categories. The interfaces are currently
defined in Java, C++ and Python and implementations exist in the form of
libraries and tools using C++ (Anaphe/Lizard, OpenScientist), Java (Java
Analysis Studio) and Python (PAIDA).
JAIDA is the full implementation of AIDA in Java. It is used internally by
JAS3 as its analysis core but it can also be used independently for either
batch or interactive processing, or for web applications to access data,
make plots and simple data analysis through a browser. Some of the JAIDA
features are the ability to open AIDA, ROOT and PAW files and the support of
an extensible set of fit methods (chi-square, least squares, binned/unbinned
likelihood, etc) to be matched with an extensible set of optimizers
including Minuit and Uncmin.
AIDAJNI is glue code between C++ and Java that allows any C++ code to access
any Java implementation of the AIDA interfaces. For example AIDAJNI is used
with Geant4 to access the JAIDA implementation of AIDA.
This paper gives an update on the AIDA 3.2.1 interfaces and its
corresponding JAIDA implementation. Examples will be provided on how to use
JAIDA within JAS3, as a standalone library and from C++ using AIDAJNI.
Go4 analysis design
The GSI online-offline analysis system Go4 is a ROOT based framework for medium
energy ion- and nuclear physics experiments. Its main features are a multithreaded
online mode with a non-blocking Qt GUI, and abstract user interface classes to set
up the analysis process itself which is organised as a list of subsequent analysis
steps. Each step has its own event objects and a processor instance. It can handle
its event i/o independently. It can be set up by macros or by generic a GUI. With
respect to the more complex experiments planned at GSI, a configurable network of
steps is required. Multiple IO channels per step and multiple references to steps
can be set up by macros or via generic GUI. The required mechanisms are provided by
an upgrade of the Go4 analysis step manager using the new ROOT TTasks. Support for
IO configuration and references across the task tree is provided.
OpenScientist. Status of the project.
We want to present the status of this project.
After quickly remembering the basic choices around GUI, visualization
and scriptingm we would like to develop what had been done in order to
have an AIDA-3.2.1 complient systen, to visualize Geant4 data (G4Lab module),
to visualize ROOT data (Mangrove module), to have an hippodraw module
and what had been done in order to run on MacOSX by using the native
NextStep (Cocoa) environment.
G B. Barrand
(CNRS / IN2P3 / LAL)
The Athena Control Framework in Production, New Developments and Lessons Learned
Athena is the Atlas Control Framework, based on the common Gaudi architecture,
originally developed by LHCb. In 2004 two major production efforts, the Data
Challenge 2 and the Combined Test-beam reconstruction and analysis were structured as
Athena applications. To support the production work we have added new features to
both Athena and Gaudi: an "Interval of Validity" service to manage time-varying
conditions and detector data; a History service, to manage the provenance information
of each event data object; and a toolkit to simulate and analyze the overlay of
multiple collisions during the detector sensitive time (pile-up). To support the
analysis of simulated and test-beam data in athena we have introduced a python-based
scripting interface, based on the CERN LCG tools PyLCGDict, PyRoot and PyBus. The
scripting interface allows to fully configure any athena component, interactively
browse and modify this configuration, as well as examine the content of any data
object in the event or detector store.
The AliRoot framework, status and perspectives
The ALICE collaboration at the LHC is developing since 1998 an OO offline framework, written entirely in C++.
In 2001 a GRID system (AliEn - ALICE Environment) has been added and successfully integrated with ROOT
and the offline. The resulting combination allows ALICE to do most of the design of the detector and test the
validity of its computing model by performing large scale Data Challenges, using OO technology in a
distributed framework. The early migration of all ALICE users to C++ and the adoption of advanced software
development techniques are two of the strong points of the ALICE offline strategy. The offline framework is
heavily based on virtual interfaces, which allows the use of different generators and even different Monte-
Carlo transport codes with no change in the framework or the scoring, reconstruction and analysis code. This
talk presents a review of the development path, current status and future perspectives of the ALICE Offline
Composite Framework for CMS Applications
We present a composite framework which exploits the advantages of
the CMS data model and uses a novel approach for building CMS
simulation, reconstruction, visualisation and future analysis
applications. The framework exploits LCG SEAL and CMS COBRA plug-ins
and extends the COBRA framework to pass communications between the
GUI and event threads, using SEAL callbacks to navigate through the
metadata and event data interactively in a distributed environment.
We give examples of current applications based on this framework,
including CMS test-beams, geometry description debugging, GEANT4
simulation, event reconstruction, and the verification of
reconstruction and higher level trigger algorithms.
(Northeastern University, Boston, USA)
IceTray: a Software Framework for IceCube
IceCube is a cubic kilometer-scale neutrino telescope under construction at the South
Pole. The minimalistic nature of the instrument poses several challenges for the
software framework. Events occur at random times, and frequently overlap, requiring
some modifications of the standard event-based processing paradigm. Computational
requirements related to modeling the detector medium necessitate the ability for
software components to defer processing events. With minimal information from the
detector, events must be reconstructed many times with different hypotheses or
methods, and the results compared. The appropriate series of software components
required to process an event varies considerably, and can be determined only at run
time. Finally, reconstruction algorithms are constantly evolving, with development
taking place throughout the collaboration, so it is essential that conversion of
private analysis code to online production software be simple and, given the
inaccessibility of the experimental site, robust. The IceCube collaboration has
developed the IceTray framework, which meets these needs by blending aspects of push-
and pull-based architectures to produce a highly modular system which nevertheless
allows each software component a significant degree of control over the execution flow.
(UNIVERSITY OF MARYLAND)
The Offline Framework of the Pierre Auger Observatory
The Pierre Auger Observatory is designed to unveil the nature and the
origin of the highest energy cosmic rays. Two sites, one currently
under construction in Argentina, and another pending in the Northern
hemisphere, will observe extensive air showers using a hybrid detector
comprising a ground array of 1600 water Cerenkov tanks overlooked by
four atmospheric fluorescence detectors. Though the computing demands
of the experiment are less severe than those of traditional high
energy physics experiments in terms of data volume and detector
complexity, the large geographically dispersed collaboration and the
heterogeneous set of simulation and reconstruction requirements
confronts the offline software with some special challenges.
We have designed and implemented a framework to allow collaborators to
contribute algorithms and sequencing instructions to build up the
variety of applications they require. The framework includes
machinery to manage these user codes, to organize the abundance of
user-contributed configuration files, to facilitate multi-format file
handling, and to provide access to event and time-dependent detector
information which can reside in various data sources. A number of
utilities are also provided, including a novel geometry package which
allows manipulation of abstract geometrical objects independent of
coordinate system choice. The framework is implemented in C++, follows
an object oriented paradigm, and takes advantage of some of the more
widespread tools that the open source community offers, while keeping
the user-side simple enough for C++ non-experts to learn in a
reasonable time. The distribution system includes unit and acceptance
testing in order to support rapid development of both the core
framework and contributed user code. Great attention has been paid to
the ease of installation.
(I. DE CIENCIAS NUCLEARES, UNAM)
Distributed Computing ServicesTheatersaal
Don Quijote - Data Management for the ATLAS Automatic Production System
As part of the ATLAS Data Challenges 2 (DC2), an automatic production system was
introduced and with it a new data management component.
The data management tools used for previous Data Challenges were built as separate
components from the existing Grid middleware. These tools relied on a database of its
own which acted as a replica catalog.
With the extensive use of Grid technology expected for the most part of the DC2
production, no longer can a data management tool be independent of the Grid
middleware. Each Grid relies on its own replica catalog and not on an ATLAS specific
ATLAS DC will attempt to use uniformly the resources provided by three Grids:
NorduGrid, US Grid3 and LCG-2. Lecagy system will be supported as well.
The proposed solution was to build a data management proxy system which consists of a
common high-level interface, whose implementation depends on each Grid's replica and
metadata catalog as well as the storage backend (mainly "classic" GridFTP servers and
Don Quijote provides management of replicas in a services oriented architecture,
across the several "flavours" of Grid middleware used by ATLAS DC.
With a higher-level interface common across several Grids (and legacy systems) a user
(such as the new automatic production system) can seamlessly manage replicas
independently of their hosting environment. Given the services-based architecture, a
lightweight command line tool is capable of interacting uniformly within each Grid
and between Grids (e.g. moving files from LCG-2 to US Grid 3 while maintaining
attributes such as the Global Unique Identifier).
Managed Data Storage and Data Access Services for Data Grids
The LHC needs to achieve reliable high performance access to vastly distributed storage resources across the
network. USCMS has worked with Fermilab-CD and DESY-IT on a storage service that was deployed at several
sites. It provides Grid access to heterogeneous mass storage systems and synchronization between them. It
increases resiliency by insulating clients from storage and network failures, and facilitates file sharing and
network traffic shaping.
This new storage service is implemented as a Grid Storage Element (SE). It consists of dCache as the core
storage system and an implementation of the Storage Resource Manager (SRM), that together allow both local
and Grid based access to the mass storage facilities. It provides advanced functionalities for managing,
accessing and distributing collaboration data.
USCMS is using this system both as Disk Resource Manager at Tier-1 and Tier-2 sites, and as Hierarchical
Resource Manager with Enstore as tape back-end at the Fermilab Tier-1. It is used for providing shared
managed disk pools at sites and for streaming data between the CERN Tier-0, the Fermilab Tier-1 and U.S.
Applications can reserve space for a time period, ensuring space availability when the application runs. Worker
nodes without WAN connection can trigger data replication to the SE and then access data via the LAN. Moving
the SE functionality off the worker nodes reduces load and improves reliability of the compute farm elements
We describe architecture, components, and experience gained in CMS production and the DC04 Data
FroNtier: High Performance Database Access Using Standard Web Components in a Scalable Multi-tier Architecture
A high performance system has been assembled using standard web components to deliver
database information to a large number (thousands?) of broadly distributed clients.
The CDF Experiment at Fermilab is building processing centers around the world
imposing a high demand load on their database repository. For delivering read-only
data, such as calibrations, trigger information and run conditions data, we have
abstracted the interface that clients use to retrieve database objects. A middle tier
is deployed that translates client requests into database specific queries and
returns the data to the client as HTTP datagrams. The database connection management,
request translation, and data encoding are accomplished in servlets running under
Tomcat. Squid Proxy caching layers are deployed near the Tomcat servers as well as
close to the clients to significantly reduce the load on the database and provide a
scalable deployment model. This system is highly scalable, readily deployable, and
has a very low administrative overhead for data delivery to a large, distributed
audience. Details of how the system is built and used will be presented including its
architecture, design, interfaces, administration, and performance measurements.
On Distributed Database Deployment for the LHC Experiments
While there are differences among the LHC experiments in their views of the role of
databases and their deployment, there is relatively widespread agreement on a number
1. Physics codes will need access to database-resident data. The need for database
access is not confined to middleware and services: physics-related data will reside
2. Database-resident data will be distributed, and replicated. A single,
centralized database, at CERN or elsewhere, does not suffice.
3. Distributed deployment infrastructure should be open to the use of different
technologies as appropriate at the various Tier N sites.
A variety of approaches to distributed deployment have been explored in the context
of individual experiments; indeed, a degree of distributed deployment has been
integral to the computing model tests of some experiments (cf. ATLAS) in their 2004
data challenges. Approaches to replication have also been investigated in the
context of specific databases, often with vendor-specific replication tools (e.g.,
Oracle Replication via Streams for the LCG File Catalog and the Oracle
instantiation of the LCG conditions database; MySQL tools for replication in the
MySQL instantiation of the LCG conditions database). XML exchange mechanisms have
also been discussed. Distributed database deployment, though, is more than a
middleware and applications software issue—a successful strategy must involve those
who will be responsible for systems deployment and administration at LHC grid
We describe the status of ongoing work in this area, and discuss the prospects for
components of a common approach to distributed deployment in the time frame of the
2005 LHC data challenges.
Experiences with Data Indexing services supported by the NorduGrid middleware
The NorduGrid middleware, ARC, has integrated support for querying and
registering to Data Indexing services such as the Globus Replica Catalog
and Globus Replica Location Server. This support allows one to use these
Data Indexing services for for example brokering during job-submission,
automatic registration of files and many other things. This
integrated support is complemented by a set of command-line tools for
registering to and querying these Data Indexing services.
In this talk we will describe experiences with these Data Indexing
services both from a daily work point of view and in production
environments such as the Atlas Data-Challenges 1 and 2. We will describe
the advantages of such Data Indexing services as well as their
shortcomings. Finally we will present a proposal for an extended Data
Indexing service which should deal with the shortcomings described. The
development of such a Data Indexing service is being planned at the
(Lund University, Sweden)
Evolution of LCG-2 Data Management
LCG-2 is the collective name for the set of middleware released for
use on the LHC Computing Grid in December 2003. This middleware,
based on LCG-1, had already several improvements in the Data
Management area. These included the introduction of the Grid File
Access Library(GFAL), a POSIX-like I/O Interface, along with MSS
integration via the Storage Resource Manager(SRM)interface.
LCG-2 was used in the Spring 2004 data challenges by all four LHC
experiments. This produced the first useful feedback on scalability
and functionality problems in the middleware, especially with regards
to data management.
One of the key goals for the Data Challenges in 2004 is to show that
the LCG can handle the data for the LHC, even if the computing model
is still quite simple. In light of the feedback from the data
challenges, and in conjunction with the LHC experiments, a strategy
for the improvements required in the data management area was
developed. The aim of these improvements was to allow both easier
interaction and better performance from the experiment frameworks and
other middleware such as POOL.
In this talk, we will first introduce the design of the current data
management solution in LCG-2. We will cover the problems and issues
highlighted by the data challenges, as well as the strategy for the
required improvements to allow LCG-2 to handle effectively data
management at LCG volumes. In particular, we will highlight the new
APIs provided, and the integration of GFAL and the EDG Replica
Manager functionality with ROOT.
The Next Generation Root File Server
As the BaBar experiment shifted its computing model to a ROOT-based
framework, we undertook the development of a high-performance file server
as the basis for a fault-tolerant storage environment whose ultimate goal
was to minimize job failures due to server failures. Capitalizing on our
five years of experience with extending Objectivity's Advanced
Multithreaded Server (AMS), elements were added to remove as many
obstacles to server performance and fault-tolerance as possible. The final
outcome was xrootd, upwardly and downwardly compatible with the current
file server, rootd. This paper describes the essential protocol elements
that make high performance and fault-tolerance possible; including
asynchronous parallel requests, stream multiplexing, data pre-fetch,
automatic data segmenting, and the framework for a structured peer-to-peer
storage model that allows massive server scaling and client recovery from
multiple failures. The internal architecture of the server is also
described to explain how high performance was maintained and full
compatibility was achieved. Now in production at Stanford Linear
Accelerator Center, Rutherford Appleton Laboratory (RAL), INFN, and IN2P3;
xrootd has shown that our design provides what we set out to achieve. The
xrootd server is now part of the standard ROOT distribution so that other
experiments can benefit from this data serving model within a standard HEP
event analysis framework.
Production mode Data-Replication framework in STAR using the HRM Grid
The STAR experiment utilizes two major computing facilities for its data processing
needs - the RCF at Brookhaven and the PDSF at LBNL/NERSC. The sharing of data
between these facilities utilizes data grid services for file replication, and the
deployment of these services was accomplished in conjunction with the Particle
Physics Data Grid (PPDG). For STAR's 2004 run it will be necessary to replicate
~100 TB. The file replication is based on Hierarchical Resource Managers (HRMs)
along with Globus tools for security (GSI) and data transport (GridFTP). HRMs are
grid middleware developed by the Scientific Data Management group at LBNL, and STAR
file replication consists of an HRM interfaced to HPSS at each site with GridFTP
transfers between the HRMs. Each site also has its own installation of the STAR
file and metadata catalog, which is implemented in MySQL. Queries to the catalogs
are used to generate file transfer requests. Single requests typically consist of
many thousands of files with a volume of hundreds of GBs. The HRMs implement a
plugin to a Replica Registration Service (or RRS) which is utilized for automatic
registration of new files as they are successfully transferred across sites. This
allows STAR users immediate use of the distributed data. Data transfer statistics
and system architecture will be presented.
(LAWRENCE BERKELEY LABORATORY)
Storage Resource Managers at Brookhaven
Providing Grid applications with effective access to large volumes of data residing
on a multitude of storage systems with very different characteristics prompted the
introduction of storage resource managers (SRM). Their purpose is to provide
consistent and efficient wide-area access to storage resources unconstrained by
their particular implementation (tape, large disk arrays, dispersed small disks). To
assess their viability in the context of the US Atlas Tier 1 facility at Brookhaven,
two implementations of SRM were tested: dCache (FNAL/DESY joint project) and HRM/DRM
(NERSC Berkeley). Both systems included a connection to the local HPSS mass data
store providing Grid access to the main tape repository. In addition, dCache offered
storage aggregation of dispersed small disks (local drives on computing farm nodes).
An overview of our experience with both systems will be presented, including details
about configurations, performance, inter-site transfers, interoperability and
File-Metadata Management System for the LHCb Experiment
The LHCb experiment needs to store all the information about the datasets and their
processing history of recorded data resulting from particle collisions at the LHC
collider at CERN as well as of simulated data.
To achieve this functionality a design based on data warehousing techniques was
chosen, where several user-services can be implemented and optimized individually
without losing functionality nor performance. This approach results in an experiment-
independent and flexible system. It allows fast access to the catalogue of available
data, to detailed history information and to the catalogue of data replicas. Queries
can be made based on these three sets of information. A flexible underlying database
schema allows the implementation and evolution of these services without the need to
change the basic database schema. The consequent implementation of interfaces based
on XML-RPC allows to access and to modify the stored information using a well
defined encapsulating API.
Data Management in EGEE
Data management is one of the cornerstones in the distributed production computing
environment that the EGEE project aims to provide for a European e-Science
infrastructure. We have designed a set of services based on previous experience in
other Grid projects, trying to address the requirements of our user communities.
In this paper we summarize the most fundamental requirements and constraints as well
as the security, reliability, stability and robustness considerations that have
driven the architecture and the particular choice for service decomposition in our
service-oriented architecture. We discuss the interaction of our services with each
other, their deployment models and how failures are being managed.
The three service groups for data management services are the Storage Element,
the Data Scheduling and the Catalog services. The Storage Element exposes interfaces
to Grid managed storage, with the appropriate semantics in the Grid distributed
environment. The Catalog services contain all the metadata related to data: The File
Catalog maintains a file-system-like view of the files in the Grid in a logical user
namespace, the Replica Catalog keeps track of identical copies of the files
distributed in different Storage Elements and the Metadata Catalog keeps application
specific information about the files. The Data Scheduling services take care of
controlled data transfer and keep the information in the Catalog services consistent
with what is actually available in the Storage Elements, acting as the binding
between the two.
We conclude with first experiences and examples of use-cases for High Energy Physics
SAMGrid Integration of SRMs
SAMGrid is the shared data handling framework of the two large Fermilab
Run II collider experiments: DZero and CDF. In production since 1999 at D0, and
since mid-2004 at CDF, the SAMGrid framework has been adapted over time to
accommodate a variety of storage solutions and configurations, as well as the
differing data processing models of these two experiments. This has been very
successful for both experiments. Backed by primary data repositories of
approximately 1 PB in size for each experiment, the SAMGrid framework delivers
over 100 TB/day to DZero and CDF analyses at Fermilab and around the world.
Each of the storage systems used with SAMGrid, however, has distinct
interfaces, protocols, and behaviors. This led to different levels of
integration of the various storage devices into the framework, which
complicated the exploitation of their functionality and limited in some cases
SAMGrid expansion across the experiments' Grid.
In an effort to simplify the SAMGrid storage interfaces, SAMGrid has
adopted the Storage Resource Manager (SRM) concept as the universal interface
to all storage devices. This has simplified the SAMGrid framework, expecially
the implementation of storage device interactions. It prepares the SAMGrid
framework for future storage solutions equipped with SRM interfaces, without
the need for long and risky software integration projects. In principle, any
storage device with an SRM interface can be used now with the SAMGrid
framework. The integration of SRMs is an important further step towards
evolving the SAMGrid framework into a co-operating collection of distinct,
modular grid-oriented services. To date, SRMs for Enstore, dCache, local
caches, and permanent disk locations are tested and in production use. This
report outlines how the SRMs were integrated into the existing SAMGrid
framework without disturbing on-going operations, and describes our operational
experience with SAMGrid and SRMs in the field.
(FERMI NATIONAL ACCELERATOR LABORATORY)
Distributed Computing Systems and ExperiencesBallsaal
The evolution of the distributed Event Reconstruction Control System in BaBar
The Event Reconstruction Control System of the BaBar experiment was redesigned in
2002, to satisfy the following major requirements: flexibility and scalability.
Because of its very nature, this system is continuously maintained to implement the
changing policies, typical of a complex, distributed production enviromnent.
In 2003, a major revolution in the BaBar computing model, the Computing Model 2,
brought a particularly vast set of new requirements in various respects, many of
which had to be discovered during the early production effort, and promptly dealt
with. Particularly, the reconstruction pipeline was expanded with the addition of a
third stage. The first fast calibration stage was kept running at SLAC, USA, while
the two stages doing most of the computation were moved to the ~400 CPU
reconstruction facility of INFN, Italy.
In this paper, we summarize the extent and nature of the evolution of the Control
System, and we demonstrate how the modular, well engineered architecture of the
system allowed to efficiently adapt and expand it, while making great reuse of
existing code, leaving virtually intact the core layer, and exploiting the
"engineering for flexibility" philosophy.
(SLAC / INFN PADOVA)
Production Management Software for the CMS Data Challenge
One of the goals of CMS Data Challenge in March-April 2004 (DC04) was to run
reconstruction for sustained period at 25 Hz input rate with distribution of the
produced data to CMS T1 centers for further analysis.
The reconstruction was run at the T0 using CMS production software, of which the main
components are RefDB (CMS Monte Carlo 'Reference Database' with Web interface) and
McRunjob (a framework for creation and submission of large numbers of Monte Carlo
This paper presents an overview of CMS production cycle , describing production
tools, covering data processing, bookkeeping and publishing issues, in the context of
their use during the T0 reconstruction part of DC04.
Production Management Software for the CMS Data Challenge
ATLAS Production System in ATLAS Data Challenge 2
In order to validate the Offline Computing Model and the
complete software suite, ATLAS is running a series of Data
Challenges (DC). The main goals of DC1 (July 2002 to April
2003) were the preparation and the deployment of the
software required for the production of large event samples,
and the production of those samples as a worldwide
DC2 (May 2004 until October 2004) is divided into three
phases: (i) Monte Carlo data are produced using GEANT4 on
three different Grids, LCG, Grid3 and NorduGrid; (ii)
simulate the first pass reconstruction of data expected in
2007, also called Tier0 exercise, using the MC sample; and
(iii) test the Distributed Analysis model.
A new automated data production system has been developed
for DC2. The major design objectives are minimal human
involvement, maximal robustness, and interoperability with
several grid flavors and legacy systems. A central
component of the production system is the production
database holding information about all jobs. Multiple
instances of a 'supervisor' component pick up unprocessed
jobs from this database, distribute them to 'executor'
processes, and verify them after execution. The 'executor'
components interface to a particular grid or legacy flavour.
The job distribution model is a combination of push and
pull. A data management system keeps track of all produced
data and allows for file transfers.
The basic elements of the production system are described.
Experience with the use of the system in world-wide DC2
production of ten million events will be presented. We also
present how the three Grid flavors are operated and
monitored. Finally we discuss the first attempts on using
the Distributed Analysis system.
A GRID approach for Gravitational Waves Signal Analysis with a Multi-Standard Farm Prototype
The standard procedures for the extraction of gravitational wave signals coming
from coalescing binaries provided by the output signal of an interferometric
antenna may require computing powers generally not available in a single computing
centre or laboratory. A way to overcome this problem consists in using the
computing power available in different places as a single geographically
distributed computing system. This solution is now effective within the GRID
environment, that allows distributing the required computing effort for specific
data analysis procedure among different sites according to the available computing
Within this environment we developed a system prototype with application software
for the experimental tests of a geographically distributed computing system for the
analysis of gravitational wave signal from coalescing binary systems. The facility
has been developed as a general purpose system that uses only standard hardware and
software components, so that it can be easily upgraded and configured. In fact, it
can be partially or totally configured as a GRID farm, as MOSIX farm or as MPI
farm. All these three configurations may coexist since the facility can be split
into configuration subsets. A full description of this farm is reported, together
with the results of the performance tests and planned developments.
(DIPARTIMENTO DI MATEMATICA ED APPLICAZIONI "R.CACCIOPPOLI")
The architecture of the AliEn system
AliEn (ALICE Environment) is a Grid framework developed by the Alice Collaboration and used in production
for almost 3 years. From the beginning, the system was constructed using Web Services and standard
network protocols and Open Source components. The main thrust of the development was on the design and
implementation of an open and modular architecture. A large part of the component came from state-of-the-
art modules available in the Open Source domain. Thus, in a very short time, the ALICE experiment had a
prototype Grid that, while constantly evolving, has allowed large distributed simulation and reconstruction
vital to the design of the experiment hardware and software to be performed with very limited manpower.
This proved to be the correct path to which many Grid project and initiatives are now converging. The
architecture of AliEn inspired the ARDA report and subsequently AliEn provided the foundation of components
for the first EGEE prototype. This talk presents the architecture of the original AliEn system, describes its
evolution. A critical review of the major technology choices, their implementation and the development
process is also presented.
A distributed, Grid-based analysis system for the MAGIC telescope
The observation of high-energetic gamma-rays with ground based air cerenkov telescopes is one of
the most exciting areas in modern astro particle physics. End of the year 2003 the MAGIC telescope
started operation.The low energy threshold for gamma-rays together with different background
sources leads to a considerable amount of data. The analysis will be done in different institutes
spread over Europe. The production of Monte Carlo events including the simulation of Cerenkov light
in the atmosphere is very computing intensive and another challenge for a collaboration like MAGIC.
Therefore the MAGIC telescope collaborations will take the opportunity to use Grid technology to set
up a distributed computational and data intensive analysis system with nowadays available
technology. The basic architecture of such a distributed, Europe wide Grid system will be presented.
First implementation results will be shown. This Grid might be the starting point for a wider distributed
astro particle Grid in Europe.
(FORSCHUNGSZENTRUM KARLSRUHE (FZK))
HEP Applications Experience with the European DataGrid Middleware and Testbed
The European DataGrid (EDG) project ran from 2001 to 2004, with the aim of producing
middleware which could form the basis of a production Grid, and of running a testbed
to demonstrate the middleware. HEP experiments (initially the four LHC experiments
and subsequently BaBar and D0) were involved from the start in specifying
requirements, and subsequently in evaluating the performance of the middleware, both
with generic tests and through increasingly complex data challenges. A lot of
experience has therefore been gained which may be valuable to future Grid projects,
in particular LCG and EGEE which are using a substantial amount of the middleware
developed in EDG. We report our experiences with job submission, data management and
mass storage, information and monitoring systems, Virtual Organisation management
and Grid operations, and compare them with some typical Use Cases defined in the
context of LCG. We also describe some of the main lessons learnt from the project,
in particular in relation to configuration, fault-tolerance, interoperability and
scalability, as well as the software development process itself, and point out some
areas where further work is needed. We also make some comments on how these issues
are being addressed in LCG and EGEE.
(Rutherford Appleton Laboratory)
Deploying and operating LHC Computing Grid 2 (LCG2) During Data Challenges
LCG2 is a large scale production grid formed by more than 40 worldwide distributed sites.
The aggregated number of CPUs exceeds 3000 several MSS systems are integrated in the system. Almost
all sites form an independent administrative domain.
On most of the larger sites the local computing resources have been integrated into the grid.
The system has been used for large scale production by LHC experiments
for several month.
During the operation the software went through several versions and had to
be upgraded including non backward compatible upgrades.
We report on the experience gained setting up the service, integrating sites and operating it under the
load of the production.
The Open Science Grid (OSG)
The U.S.LHC Tier-1 and Tier-2 laboratories and universities are developing production Grids to support LHC
applications running across a worldwide Grid computing system. Together with partners in computer science,
physics grid projects and running experiments, we will build a common national production grid
infrastructure which is open in its architecture, implementation and use.
The OSG model builds upon the successful approach of last year’s joint Grid2003 project. The Grid3 shared
infrastructure has for over eight months given significant computational resources and throughput to more
than six applications, including ATLAS and CMS data challenges, SDSS, LIGO and Biology analyses and
computer science demonstrators.
To move towards LHC-scale data management, access and analysis capabilities, we will need to increase the
scale, services, and sustainability of the current infrastructure by an order of magnitude. This requires a
significant upgrade in its functionalities and technologies.
The OSG roadmap is a strategy and work plan to build the U.S.LHC computing enterprise as a fully usable,
sustainable and robust grid, which is part of the LHC global computing infrastructure and open to partners.
The approach is to federate with other application communities in the U.S. to build a shared infrastructure
open to other sciences and capable of being modified and improved to respond to needs of other
applications, including CDF, D0, BaBar and RHIC experiments.
We describe the application driven engineered services of the OSG, short term plans and status, and the
roadmap for a consortium, its partnerships and national focus.
Use of Condor and GLOW for CMS Simulation Production
The University of Wisconsin distributed computing research groups
developed a software system called Condor for high throughput computing
using commodity hardware. An adaptation of this software, Condor-G, is
part of Globus grid computing toolkit. However, original Condor has
additional features that allows building of an enterprise level grid.
Several UW departments have Condor computing pools that are integrated
in such a way as to flock jobs from one pool to another as resources
become available. An interdisciplinary team of UW researchers recently
built a new distributed computing facility, the Grid Laboratory of
Wisconsin (GLOW). In total Condor pools in the UW have about 2000 Intel
CPUs (P-III and Xeon) which are available for scientific computation.
By exploiting special features of Condor such as checkpointing and
remote IO we have generated over 10 million fully simulated CMS events.
We were able to harness about 260 CPU-days per day for a period of 2
months when we were operational late fall. We have scaled to using 500
CPUs concurrently when opportunity to exploit unused resources in
laboratories on our campus. We have built a scalable job submission and
tracking system called Jug using Python and mySQL which enabled us to
scale to run hundreds of jobs simultaneously. Jug also ensured that the
data generated is transferred to US Tier-I center at Fermilab. We have
also built a portal to our resources and participated in Grid2003
project. We are currently adapting our environment for providing
analysis resources. In this paper we will discuss our experience and
observations regarding the use of opportunistic resources, and
generalize them to wider grid computing context.
(UNIVERSITY OF WISCONSIN)
Application of the SAMGrid Test-Harness for Performance Evaluation and Tuning of a Distributed Cluster Implementation of Data Handling Services
The SAMGrid team has recently refactored its test harness suite for
greater flexibility and easier configuration. This makes possible
more interesting applications of the test harness, for component
tests, integration tests, and stress tests. We report on the
architecture of the test harness and its recent application
to stress tests of a new analysis cluster at Fermilab, to explore the
extremes of analysis use cases and the relevant parameters for tuning
in the SAMGrid station services. This reimplementation of the test
harness is a python framework which usesXML for configuration and
small plug-in python modules for specific test purposes.
One current testing application is running on a 128-CPU analysis
cluster with access to 6 TB distributed cache and also to a 2 TB
centralized cache, permitting studies of different cache strategies.
We have studied the service parameters which affect the
performance of retrieving data from tape storage as well. The use
cases studied vary from those which will require rapid file delivery
with short processing time per file, to the opposite extreme of long
processing time per file. We also show how the same harness can be
used to run regular unit tests on a production system to aid
early fault detection and diagnosis.These results are interesting for
their implications with regard to Grid operations, and illustrate the
type of monitoring and test facilities required to accomplish such
(FERMI NATIONAL ACCELERATOR LABORATORY)
Job Submission & Monitoring on the PHENIX Grid*
The PHENIX collaboration records large volumes of data for each
experimental run (now about 1/4 PB/year). Efficient and timely
analysis of this data can benefit from a framework for distributed
analysis via a growing number of remote computing facilities in the
collaboration. The grid architecture has been, or is being deployed
at most of these facilities.
The experience being obtained in the transition to the Grid
infrastructure with minimum of manpower is presented with particular
emphasis on job monitoring and job submission in multi cluster
environment. The integration of the existing subsystems
(from Globus project, from several HEP collaborations), large
application libraries, and other software tools to render the
resulting architecture stable, robust, and useful for the end user is
(STATE UNIVERSITY OF NEW YORK AT STONY BROOK)
In the framework of the LCG Simulation Project, we present the Generator
Services Sub-project, launched in 2003 under the oversight of the LHC Monte
Carlo steering group (MC4LHC). The goal of the Generator Services Subproject
is to guarantee the physics generator support for the LHC experiments. Work is
divided into four work packages: Generator library; Storage, event
interfaces and particle services; Public event files and event database;
Validation and tuning. The current status and the future plans in the four
different work packages are presented. Some emphasis is put on the Monte
Carlo Generator Library (GENSER) and on the Monte Carlo Generator Database
GENSER is the central code repository for Monte Carlo generators and
generator tools. it was the first CVS repository in the LCG Simulation
project and it is currently distributed in AFS. GENSER comprises release and
building tools for librarian and end users. GENSER is going to gradually
replace the obsolete CERN library in Monte Carlo generators support.
MCDB is a public database for the configuration, book-keeping and storage of
the generator level event files. The generator events often need to be
prepared and documented by Monte Carlo experts. MCDB aims at facilitating
the communication between Monte-Carlo experts and end-users. Its use can be
optionally extended to the official event production of the LHC experiments.
FAMOS, a FAst MOnte-Carlo Simulation for CMS
An object-oriented FAst MOnte-Carlo Simulation (FAMOS) has recently been developed
for CMS to allow rapid analyses of all final states envisioned at the LHC while
keeping a high degree of accuracy for the detector material description and the
related particle interactions. For example, the simulation of the material effects in
the tracker layers includes charged particle energy loss by ionization and multiple
scattering, electron Bremsstrahlung and photon conversion. The particle showers are
developed in the calorimeters with an emulation of GFLASH, finely interfaced with the
calorimeter geometry (e.g., crystal positions, cracks, rear leakage, etc).
As the same software framework is used for FAMOS and ORCA (the full Object-oriented
Reconstruction software for CMS Analysis), the various Physics Objects (electrons,
photons, muons, taus, jets, missing ET, charged particle tracks, ...) can be accessed
with a similar code with both fast and full simulation, thus allowing any analysis
algorithm to be transported from FAMOS to ORCA (and later, to data analysis and DST
reading) or vice-versa without any additional work.
Altogether, a gain in CPU time of about a hundred can be achieved with respect to the
full simulation, with little loss in precision.
Applications of the FLUKA Monte Carlo code in High Energy and Accelerator Physics
The FLUKA Monte Carlo transport code is being used for different
applications in High Energy, Cosmic Ray and Accelerator Physics.
Here we review some of the ongoing projects which are
based on this simulation tool.
In particular, as far as accelerator physics is concerned, we wish
to summarize the work in progress for the LHC and the CNGS project.
From the point of view of experimental activity, a part the activity
in the framework of LHC detectors, we wish to discuss
as a major example the application of FLUKA to the ICARUS Liquid
Upgrades in cosmic ray calculations, to demonstrate the capability
of FLUKA to reproduce existing experimental data, are also presented
(INFN Milano, Italy)
Update On the Status of the FLUKA Monte Carlo Transport Code
The FLUKA Monte Carlo transport code is a well-known simulation tool in High Energy
Physics. FLUKA is a dynamic tool in the sense that it is being continually updated
and improved by the authors. Here we review the progresses achieved in the last
year on the physics models. From the point of view of hadronic physics, most of the
effort is still in the field of nucleus--nucleus interactions. The currently
available version of FLUKA already includes the internal capability to simulate
inelastic nuclear interactions beginning with lab kinetic energies of 100 MeV/A up
the the highest accessible energies by means of the DPMJET-II.5 event generator to
handle the interactions for >5 GeV/A and rQMD for energies below that. The new
developments concern, at high energy, the embedding of the DPMJET-III generator,
which represent a major change with respect to the DPMJET-II structure. This will
also allow to achieve a better consistency between the nucleus-nucleus section with
the original FLUKA model for hadron-nucleus collisions. Work is also in progress
to implenent a third event generator model based on the Master Boltzmann Equation
approach, in order to extend the energy capability from 100 MeV/A down to the
threshold for these reactions. In addition to these extended physics capabilities,
structural changes to the programs input and scoring capabilities are continually
being upgraded. In particular we want to mention the upgrades in the geometry
packages, now capable of reaching higher levels of abstraction. Work is also
proceeding to provide direct import into ROOT of the FLUKA output files for
analysis and to deploy a user-friendly GUI input interface.
(UNIVERSITY OF HOUSTON)
Geant4: status and recent developments
Geant4 is relied upon in production for increasing number of HEP
experiments and for applications in several other fields. Its
capabilities continue to be extended, as its performance and
modelling are enhanced.
This presentation will give an overview of recent developments in
diverse areas of the toolkit. These will include, amongst others,
the optimisation for complex setups using different production
thresholds, improvements in the propagation in fields, and highlights
from the physics processes and event biasing.
In addition it will note the physics validation effort undertaken in
collaboration with a number of experiments, groups and users.
Geant4: status and recent developments
Physics validation of the simulation packages in a LHC-wide effort
In the framework of the LCG Simulation Physics Validation Project, we present
comparison studies between the GEANT4 and FLUKA shower packages and LHC sub-detector
test-beam data. Emphasis is given to the response of LHC calorimeters to electrons,
photons, muons and pions. Results of "simple-benchmark" studies, where the above
simulation packages are compared to data from nuclear facilities, are also shown.
Overview and new developments in Geant4 electromagnetic physics
We will summarize the recent and current activities of the Geant4
working group responsible of the standard package of electromagnetic
physics. The major recent activities include an design iteration in
energy loss and multiple scattering domain providing "process versus
models" approach, and development of the following physics models:
multiple scattering, ultra relativistic muon physics, photo-
bsorbtion-ionisation model, ion ionisation, optical processes. An
automatic acceptance suite of validation of physics is under
development. Also we will comment on evolution of the concept of
Precision electromagnetic physics in Geant4: the atomic relaxation models
Various experimental configurations - such as, for instance, some
gaseous detectors, require a high precision simulation of
electromagnetic physics processes, accounting not only for the
primary interactions of particles with matter, but also capable of
describing the secondary effects deriving from the de-excitation of
atoms, where primary collisions may have created vacancies.
The Geant4 Simulation Toolkit encompasses a set of models to handle
the atomic relaxation induced by the photoelectric effect, Compton
scattering and ionization, with the production of X-ray fluorescence
and of Auger electrons.
We describe the physics models implemented in Geant4 to handle the
atomic relaxation, the object-oriented design of the software and
the validation of the models with respect to test beam data.
In particular, we present a novel development of an original model
for particle induced X-ray emission, to be released for the first
time in the summer of 2004.
We illustrate applications of Geant4 atomic relaxation models for
physics reach studies in a real-life experimental context
Ion transport simulation using Geant4 hadronic physics
The transportation of ions in matter is subject of much interest in not only
high-energy ion-ion collider experiments such as RHIC and LHC but also many
other field of science, engineering and medical applications. Geant4 is a tool
kit for simulation of passage of particles through matter and its OO designs
makes it easy to extend its capability for ion transports. To simulate ions
interaction, we had to develop two major functionalities to Geant4. One is
cross section calculators and the other is final stage generators for ion-ion
interactions. For cross sections calculator, several empirical cross section
formulas for the total reaction cross section of ion-ion interactions were
investigated. And for final stage generator, binary cascade and quark-gluon
string model of Geant4 were improved so that ions reaction with matter can also be
calculated. Having successfully developed both functionalities, Geant4 can be
applied to ion transportation problems. In the presentation we will
explain cross section and final stage generator in detail and show comparisons with
Adding Kaons to the Bertini Cascade Model
A version of the Bertini cascade model for hadronic interactions is part of
the Geant4 toolkit, and may be used to simulate pion-, proton-, and
neutron-induced reactions in nuclei. It is typically valid for incident
energies of 10 GeV and below, making it especially useful for the simulation of
hadronic calorimeters. In order to generate the intra-nuclear cascade, the
code depends on tabulations of exclusive channel cross section data,
parameterized angular distributions and phase-space generation of
multi-particle final states. To provide a more detailed treatment of hadronic
calorimetry, and kaon interactions in general, this model is being extended to
include incident kaons up to an energy of 15 GeV. Exclusive channel cross
sections, up to and including six-body final states, will be included for K+,
K-, K0, K0bar, lambda, sigma+, sigma0, sigma-, xi0 and xi-. K+nucleon and
K-nucleon cross sections are taken from various cross section catalogs, while
most of the cross sections for incident K0, K0bar and hyperons are estimated
from isospin and strangeness considerations. Because there is little data for
incident hyperon cross sections, use of the extended model will be restricted
to incident K+, K-, K0S and K0L. Hyperon cross sections are included only to
handle the secondary interactions of hyperons created in the intra-nuclear
CHIPS based hadronization of quark-gluon strings
Quark-gluon strings are usually fragmented on the light cone in hadrons
(PITHIA, JETSET) or in small hadronic clusters which decay in hadrons
(HERWIG). In both cases the transverse momentum distribution is
parameterized as an unknown function. In CHIPS the colliding hadrons
stretch Pomeron ladders to each other and, when the Pomeron ladders meet
in the rapidity space, they create Quasmons (hadronic clusters bigger
then Amati-Veneziano clusters of HERWIG). The Quasmon size and the
corresponding transverse momentum distributions are tuned by the
Drell-Yan mu+mu- pairs. The final Quasmon fragmentation in CHIPS is
tuned by the e+e- and proton-antiproton annihilation, which is already
Synergia: A Modern Tool for Accelerator Physics Simulation
Computer simulations play a crucial role in both the design and
operation of particle accelerators. General tools for modeling
single-particle accelerator dynamics have been in wide use for many
years. Multi-particle dynamics are much more computationally
demanding than single-particle dynamics, requiring supercomputers or
parallel clusters of PCs. Because of this, simulations of multi-
particle dynamics have been much more specialized. Although several
multi-particle simulation tools are now available, they tend to
cover a narrow range of topics. Most also present difficulties for
the end user ranging from platform portability to arcane interfaces.
In this presentation, we discuss Synergia, a multi-particle
accelerator simulation tool developed at Fermilab, funded by the DOE
SciDAC program. Synergia was designed to cover a variety of physics
processes while presenting a flexible and humane interface to the
end user. It is a hybrid application, primarily based on the
existing packages mxyzptlk/beamline and Impact. Our presentation
covers Synergia's physics capabilities and human interface. We focus
on the computational problems we encountered and solved in the
process of building an application out of codes written in Fortran
90, C++, and wrapped with a Python front-end. We also discuss some
approaches we have used in the visualization of the high-dimensional
data that comes out of a particle accelerator simulations,
especially our work with OpenDX.
(FERMI NATIONAL ACCELERATOR LABORATORY)
New experiences with the ALICE High Level Trigger Data Transport Framework
The Alice High Level Trigger (HLT) is foreseen to consist of a
cluster of 400 to 500 dual SMP PCs at the start-up of the
experiment. It's input data rate can be up to 25GB/s. This has to be
reduced to at most 1.2 GB/s before the data is sent to DAQ through
event selection, filtering, and data compression. For these
processing purposes, the data is passed through the cluster in
several stages and groups for successive merging until, at the last
stage, fully processed complete events are available. For the
transport of the data through the stages of the cluster, a
software framework is being developed consisting of multiple
components. These components can be connected via a common interface
to form complex configurations that define the data flow in the
cluster. For the framework, new benchmark results are available as
well as experience from tests and data challenges run in Heidelberg.
The framework is scheduled to be used during upcoming testbeam
(KIRCHHOFF INSTITUTE OF PHYSICS, RUPRECHT-KARLS-UNIVERSITY HEIDELBERG, for the Alice Collaboration)
The Architecture of the ZEUS Second Level Global Tracking Trigger
The architecture and performance of the ZEUS Global Track Trigger
(GTT) are described. Data from the ZEUS silicon Micro Vertex
detector's HELIX readout chips, corresponding to 200k channels, are
digitized by 3 crates of ADCs and PowerPC VME board computers push
cluster data for second level trigger processing and strip data for
event building via Fast and GigaEthernet network connections.
Additional tracking information from the central tracking chamber
and forward straw tube tracker are interfaced into the 12 dual CPU
PC farm of the global track trigger where track and vertex finding
is performed by separately threaded algorithms.
The system is data driven at the ZEUS first level trigger rates
<500Hz, generating trigger results after a mean time of 10ms. The
GTT integration into the ZEUS second level trigger and recent
performance are reviewed.
(UNIVERSITY COLLEGE LONDON)
A Level-2 trigger algorithm for the identification of muons in the Atlas Muon Spectrometer
The Atlas Level-2 trigger provides a software-based event selection
after the initial Level-1 hardware trigger. For the muon events, the
selection is decomposed in a number of broad steps: first, the Muon
Spectrometer data are processed to give physics quantities
associated to the muon track (standalone features extraction) then,
other detector data are used to refine the extracted features.
The "muFast" algorithm performs the standalone feature extraction,
providing a first reduction of the muon event rate from Level-1. It
confirms muon track candidates with a precise measurement of the
muon momentum. The algorithm is designed to be both conceptually
simple and fast so as to be readily implemented in the demanding
online environment in which the Level-2 selection code will run.
Never-the-less its physics performance approaches, in some cases,
those of the offline reconstruction algorithms. This paper describes
the implemented algorithm together with the software techniques
employed to increase its timing performance.
A. Di Mattia
An Embedded Linux System Based on PowerPC
This article introduces a Embedded Linux System based on vme series
PowerPC as well as the base method on how to establish the system.
The goal of the system is to build a test system of VMEbus device.
It also can be used to setup the data acquisition and control
system. Two types of compiler are provided by the developer system
according to the features of the system and the PowerPC. At the top
of the article some typical embedded Operation system will be
introduced and the features of different system will be provided.
And then the method on how to build a embedded Linux system as well
as the key technique will be discussed in detail. Finally a
successful data acquisition example will be given based on the test
(INSTITUTE OF HIGH ENERGY PHYSICS, ACADEMIA SINICA)
The introduction to BES computing environment
BES is an experiment on Beijing Electron-Positron Collider (BEPC).
BES computing environment consists of PC/Linux cluster and mainly relies on the free
software. OpenPBS and Ganglia are used as job schedule and monitor system. With
helps from CERN IT Division, CASTOR was implemented as storage management system.
BEPC is being upgraded and luminosity will increase one hundred times comparing to
current machine. The data produced by new BES-III detector will be about 700
Terabytes per year. To meet the computing demand, we proposed a solution based on
PC/Linux/Cluster and SAN technology. CASTOR will be used to manage the storage
resources of SAN. We started to develop a graphical interface for CASTOR. Some tests
on data transmission performance of SAN environment were carried out. The result
shows that I/O performance of SAN is better than that of traditional storage
connection method including IDE, SCSI etc and it can satisfy BESIII experiment’s
demand for data processing.
(COMPUTING CENTER,INSTITUTE OF HIGH ENERGY PHYSICS,CHINESE ACADEMY OF SCIENCES)
The DAQ system for the Fluorescence Detectors of the Pierre Auger Observatory
S.Argiro`(1), A. Kopmann (2), O.Martineau (2), H.-J. Mathes (2)
for the Pierre Auger Collaboration
(1) INFN, Sezione Torino
(2) Forschungszentrum Karlsruhe
The Pierre Auger Observatory currently under construction in Argentina will
investigate extensive air showers at energies above 10^18 eV. It
consists of a ground array of 1600 Cherenkov water detectors and 24
fluorescence telescopes to discover the nature and origin of cosmic rays
at these ultra-high energies.
The ground array is overlooked by 4 different fluorescence buildings which are
equipped with 6 telescopes each. An independent local data acquisition (DAQ) is
running in each building to readout 480 channels per telescope. In addition, a
central DAQ merges data coming from the water detectors and all fluorescence
The system architecture follows the object oriented paradigm and has been
implemented using several of the most widespread open source tools for
interprocess communication, data storage and user interfaces.
Each local DAQ is connected with further sub-systems for calibration,
for monitoring of atmospheric parameters and slow control. The latter is
responsible for general safety functions and the experiment control.
After a prototype phase to validate the system concept the Observatory is
taking data in the final setup since September 2003. The data taking will
continue during the construction phase and the integration of all sub-systems.
We present the design and the present status of the system currently running
in two different buildings with a total of 8 telescopes installed.
(FORSCHUNGSZENTRUM KARLSRUHE, INSTITUT FüR KERNPHYSIK)
Migrating PHENIX databases from object to relational model
To benefit from substantial advancements in Open Source database
technology and ease deployment and development concerns with
Objectivity/DB, the Phenix experiment at RHIC is migrating its principal
databases from Objectivity to a relational database management system
(RDBMS). The challenge of designing a relational DB schema to store a
wide variety of calibration classes was
solved by using ROOT I/O and storing each calibration object opaquely as
a BLOB ( Binary Large OBject ). Calibration metadata is stored as
built-in types to allow fast index-based database search. To avoid a
database back-end dependency the application
was made ODBC-compliant (Open DataBase Connectivity is a standard
database interface). An existent well-designed calibration DB API
allowed users to be shielded from the underlying database technology
change. Design choices and experience with transferring a large amount
of Objectivity data into relational DB will be presented.
(BROOKHAVEN NATIONAL LABORATORY)
The PHENIX Event Builder
The PHENIX detector consists of 14 detector subsystems. It is designed such
that individual subsystems can be read out independently in parallel as well
as a single unit. The DAQ used to read the detector is a highly-pipelined
parallel system. Because PHENIX is interested in rare physics events, the DAQ
is required to have a fast trigger, deep buffering, and very high bandwidth.
The PHENIX Event Builder is a critical part of the back-end of the PHENIX DAQ.
It is reponsible for assembling event fragments from each subsystem into
complete events ready for archiving. It allows subsystems to be read out
either in parallel or simultaneously and supports a high rate of archiving.
In addition, it implements an environment where Level-2 trigger algorithms may
be optionally executed, providing the ability to tag and/or filter rare
The Event Builder is a set of three Windows NT/2000 multithreaded executables
that run on a farm of over 100 dual-cpu 1U servers. All control and data
messaging is transported over a Foundry Layer2/3 Gigabit switch. Capable of
recording a wide range of event sizes from central Au-Au to p-p interactions,
data archiving rates of over 400 MB/s at 2 KHz event rates have been achieved
in the recent Run 4 at RHIC. Further improvements in performance are expected
from migrating to Linux for Run 5.
The PHENIX Event Builder design and implementation, as well as performance and
plans for future development, will be discussed.
The DZERO Run II Level 3 Trigger and Data Aquisition System
The DZERO Level 3 Trigger and Data Aquisition (L3DAQ) system has been
running continuously since Spring 2002. DZERO is loacated at one of the
two interaction points in the Fermilab Tevatron Collider. The L3DAQ
moves front-end readout data from VME crates to a trigger processor
farm. It is built upon a Cisco 6509 Ethernet switch, standard PCs, and
commodity VME single board computers. We will report on operating
experience, performance, and upgrades. In particular, issues related to
hardware quality, networking and security, and an expansion of the
trigger farm will be discussed.
The DZERO Run II Level 3 Trigger and Data Aquisition System
Testbed Management for the ATLAS TDAQ
The talk presents the experience gathered during the testbed
administration (~100 PC and 15+ switches) for the ATLAS Experiment at
It covers the techniques used to resolve the HW/SW conflicts, network
related problems, automatic installation and configuration of the
cluster nodes as well as system/service monitoring in the heterogeneous
dynamically changing cluster environment.
Techniques range from manual actions to the fully automated procedures
based on tools like Kickstart, SystemImager, Nagios, MRTG and
Spectrum. Booting diskless nodes using EtherBoot, PXEboot is also
investigated as a possible technique of managing Atlas Production
Kernel customization techniques (building, deploying, distribution
policy) allow users to freely choose proffered kernel flavors without
sysadmin intervention. At the same time administrator retains full
control over entire testbed.
The overall experience has shown that the proper use of the
open-source tools addresses very well the needs of the ATLAS Trigger
DAQ community. This approach may also be interesting for addressing
certain aspects of GRID Farm Management.
(CERN, IFJ KRAKOW)
Integration of ATLAS Software in the Combined Beam Test
The ATLAS collaboration had a Combined Beam Test from May until
October 2004. Collection and analysis of data required integration
of several software systems that are developed as prototypes for
the ATLAS experiment, due to start in 2007. Eleven different detector
technologies were integrated with the Data Acquisition system and were
taking data synchronously. The DAQ was integrated with the High Level
Trigger software, which will perform online selection of ATLAS events.
The data quality was monitored at various stages of the Trigger and
DAQ chain. The data was stored in a format foreseen for ATLAS and was
analyzed using a prototype of the experiments' offline software, using
the Athena framework. Parameters recorded by the Detector Control System
were recorded in a prototype of the ATLAS Conditions Data Base and were
made available for the offline analysis of the collected event data.
The combined beam test provided a unique opportunity to integrate and
to test the prototype of ATLAS online and offline software in its complete
Performance of the ATLAS DAQ DataFlow system
The ATLAS Trigger and DAQ system is designed to use the Region of
Interest (RoI)mechanism to reduce the initial Level 1 trigger rate of
100 kHz down to about 3.3 kHz Event Building rate.
The DataFlow component of the ATLAS TDAQ system is responsible
for the reading of the detector specific electronics via 1600 point
to point readout links, the collection and provision of RoI to the
Level 2 trigger, the building of events accepted by the Level 2
trigger and their subsequent input to the Event Filter
system where they are subject to further selection criteria.
To validate the design and implementation of the DAQ DataFlow system,
a prototype setup representing 20% of the final system, has been put
together at CERN. Thisbaseline prototype contains 68 PCs running
Linux, and exchanging data via a 64-portand a 31-port Gigabit
Ethernet switches for Event Building and RoI Collection. The
system performance is measured by playing back simulated data through
the system andrunning prototype algorithms in the Level 2 trigger. In
parallel a full discrete event model of the system has been developed
and tuned to the testbed results as an aid to studying the system
performance at and beyond the size of the prototype setup.
Measurements will be presented on the performance of the prototype
setup, showing that the components of the current integrated system
implementation can already sustain the their nominal ATLAS
requirements using existing hardware and Gigabit network technology:
20 kHz RoI Collection rate per readout link, 3 kHz Event Building
rate and 70 Mbyte/s throughput per event building node. The use of
these results to calibrate the model will also be presented along
with the model predications for the performance of the final DAQ
(UNIVERSITY OF CALIFORNIA AT IRVINE AND CERN)
The talk will cover briefly the current status of the LHC Computing Grid project
and will discuss the main challenges facing us as we prepare for the startup of LHC.
Video in CDS
Operating the LCG and EGEE Production Grids for HEP
In September 2003 the first LCG-1 service was put into production at most of the
large Tier 1 sites and was quickly expanded up to 30 Tier 1 and Tier 2 sites by the
end of the year. Several software upgrades were made and the LCG-2 service was put
into production in time for the experiment data challenges that began in February
2004 and continued for several months. In particular LCG-2 introduced transparent
access to mass storage and managed disk-only storage elements, and a first release
of the Grid File Access library. Much valuable experience was gained during the
data challenges in all aspects from the functionality and use of the middleware, to
the deployment, maintenance, and operation of the services at many sites. Based on
this experience a program of work to address the functional and operational issues
is being implemented. The goal is to focus on essential areas such as data
management and to build by the end of 2004 a basic grid system capable of handling
the basic needs of LHC computing, providing direction for future middleware and
The LCG-2 infrastructure also forms the production service of EGEE. This involves
supporting new application communities, bringing in new sites not associated with
HEP and evolving a full scale 24x7 user and operational support structure. We will
describe the EGEE infrastructure, how it supports and interacts with LCG, and how
we expect the infrastructure to evolve over the next year of the EGEE project.
Video in CDS
Grid3: An Application Grid Laboratory for Science
The U.S. Trillium Grid projects in collaboration with High Energy Experiment groups
from the Large Hadron Collider (LHC), ATLAS and CMS, Fermi-Lab's BTeV, members of
the LIGO , SDSS collaborations and groups from other scientific disciplines and
computational centers have deployed a multi-VO, application-driven grid laboratory
("Grid3"). The grid laboratory has sustained for several months the production-
level services required by the participating experiments. The deployed
infrastructure has been operating since November 2003 with 27 sites, a peak of 2800
processors, work loads from 10 different applications exceeding 1300 simultaneous
jobs, and data transfers among sites of greater than 2 TB/day.
The Grid3 infrastructure was deployed from grid level services provided by groups
and applications within the collaboration. The services were organized into four
distinct "grid level services" including: Grid3 Packaging, Monitoring and
Information systems, User Authentication and the iGOC Grid Operations Center. In
this paper we describe the Grid3 operational model, deployment strategies, and site
installation and configuration procedures. We describe the grid middleware
components used, how the components were packaged and deployed on sites each under
its own loacl administrative domain, and how the pieces fit together to form the
Grid3 grid infrastructure.
Video in CDS
Poster Session 1: Online Computing, Computer Fabrics, Wide Area NetworkingCoffee
A database prototype for managing computer systems configurations
We describe a database solution in a web application to centrally
manage the configuration information of computer systems. It extends the
modular cluster management tool Quattor with a user friendly web interface.
System configurations managed by Quattor are described with the aid of PAN, a
declarative language with a command line and a compiler interface. Using a
relational schema, we are able to build a database for efficient data storage and
configuration data processing. The relational schema ensures the
consistency of the described model while the standard database interface
ensures the fast retrieval of configuration information and statistic data.
The web interface simplifies the typical administration and routine operations
tasks, e.g. definition of new types, configuration comparisons and updates etc.
We present a prototype built on the above ideas and used to manage a cluster of
developer workstations and specialised services in CMS.
AutoBlocker: A system for detecting and blocking of network scanning based on analysis of netflow data.
In a large campus network, such as Fermilab's ten thousand nodes, scanning initiated
from either outside of or within the campus network raises security concerns, may
have very serious impact on network performance, and even disrupt normal operation of
many services. In this paper we introduce a system for detecting and automatic
blocking of excessive traffic of different nature, scanning, DoS attacks, virus
infected computers. The system, called AutoBlocker, is a distributed computing system
based on quasi-real time analysis of network flow data collected from the border
router and core routers. AutoBlocker also has an interface to accept alerts from the
IDS systems (e.g. BRO, SNORT) that are based on other technologies. The system has
multiple configurable alert levels for the detection of anomalous behavior and
configurable trigger criteria for automated blocking of the scans at the core or
border routers. It has been in use at Fermilab for about 2 years, and become a very
valuable tool to curtail scan activity within the Fermilab campus network.
Boosting the data logging rates in run 4 of the PHENIX experiment
With the improvements in CPU and disk speed over the past years, we
were able to exceed the original design data logging rate of 40MB/s by
a factor of 3 already for the Run 3 in 2002. For the Run 4 in 2003, we
increased the raw disk logging capacity further to about 400MB/s.
Another major improvement was the implementation of compressed data
logging. The PHENIX raw data, after application of the standard data
reduction techniques, were found to be further compressible by
utilities like gzip by almost a factor of 2, and we defined a PHENIX
standard of a compressed raw data format. The buffers that make up a
raw data file consist of buffers that would get compressed and the
resulting smaller data volume written out to disk. For a long time,
this proved to be much too slow to be usable in the DAQ, until we
could shift the compression to the event builder machines and so
distributed the load over many fast CPU's. We also selected a
different compression algorithm, LZO, which is about a factor of 4
faster than the "compress2" algorithm used internally in gzip. With
the compression, the raw data volume shrinks to about 60% of the
original size, boosting the original data rate before compression to
more than 700MB/s.
We will the present the techniques and architecture, and the impact
this has had on the data taking in Run 4.
Cluster architectures used to provide CERN central CVS services
There are two cluster architecture approaches used at CERN to provide central CVS
services. The first one (http://cern.ch/cvs) depends on AFS for central storage of
repositories and offers automatic load-balancing and fail-over mechanisms.
The second one (http://cern.ch/lcgcvs) is an N + 1 cluster based on local file
systems, using data replication and not relying on AFS. It does not provide either
dynamic load-balancing or automatic fail-over. Instead a series of tools were
developed for repository relocation in case of fail-over and for manual load-
Both architectures are used in production at CERN and project managers can chose one
or the other, depending on their needs. If, eventually, one architecture proves to
be significantly better, the other one may be phased out. This paper presents in
detail both approaches and describes their relative advantages and drawbacks, as
well as some data about them (number of repositories, average repository size, etc).
Control and state logging for the PHENIX DAQ System
The PHENIX DAQ system is managed by a control system responsible for
the configuration and monitoring of the PHENIX detector hardware and
readout software. At its core, the control system, called Runcontrol,
is a process that manages the various components by way of a
distributed architecture using CORBA. The control system, called
Runcontrol, is a set of process that manages virtually all detector
components through a distributed architecture base on CORBA.
A key aspect of the distributed control system, the messaging
system, is the ability to access critical detector state
information, and deliver it to operators and applications of the
control system. The goal of the system is to concentrate all output
messages of the distributed processes, which would normally end up
in log files or on a terminal, in a central place. The messages may
originate from or be received by applications running on any of the
multiple platforms which are in use including Linux, Windows,
Solaris, and VxWorks. Listener applications allow the DAQ operators
to get a comprehensive overview of all messages they are interested
in, and also allows scripts or other programs to take automated
action in response to certain messages.
Messages are formatted to contain information about the source of the
message, the message type, and its severity. Applications written to
provide filtering of messages by the DAQ operators by type, severity
and source will be presented.
We will discuss the mechanism underlying this system, present
examples of the use, and discuss performance and reliability issues.
Designing a Useful Email System
Email is an essential part of daily work. The FNAL gateways process in excess of
700,000 messages per week. Amomng those messages are many containing viruses and
unwanted spam. This paper outlines the FNAL email system configuration. We will
discuss how we have defined our systems to provide optimum uptime as well as
protection against viruses, spam and unauthorized users.
Distributed Filesystem Evaluation and Deployment at the US-CMS Tier-1 Center
The scalable serving of shared filesystems across large clusters of computing resources continues to be a
difficult problem in high energy physics computing. The US CMS group at Fermilab has performed a detailed
evaluation of hardware and software solutions to allow filesysystem access to data from computing systems.
The goal of the evaluation was to arrive at a solution that was able to meet the growing needs of the US-CMS
Tier-1 facility. The system needed to be scalable and be able to grow with the increasing size of the facility,
load balanced and with high performance for data access, reliable and redundant with protection against
failures, and manageable and supportable given a reasonable level of effort.
Over the course of a one year evaluation the group developed a suite of tools to analysis performance and
reliability under load conditions, and then applied these tools to evaluations systems at Fermilab. In this
presentation we will describe the suite of tools developed, the results of the evaluation process, the system
and architecture that were eventually chosen, and the experience so far supporting a user community.
L. Lisa Giacchetti
Experience with CORBA communication middleware in the ATLAS DAQ
As modern High Energy Physics (HEP) experiments require more
distributed computing power to fulfill their demands, the need for
an efficient distributed online services for control, configuration
and monitoring in such experiments becomes increasingly important.
This paper describes the experience of using standard Common Object
Request Broker Architecture (CORBA) middleware for providing a high
performance and scalable software, which will be used for the online
control, configuration and monitoring in the ATLAS Data Acquisition
(DAQ) system. It also presents the experience, which was gained from
using several CORBA implementations and replacing one CORBA broker
Finally the paper introduces results of the large scale tests, which
have been done on the cluster of more then 300 nodes, demonstrating
the performance and scalability of the ATLAS DAQ online services.
These results show that the CORBA standard is truly appropriate for
the highly efficient online distributed computing in the area of
modern HEP experiments.
Experiences Building a Distributed Monitoring System
The NGOP Monitoring Project at FNAL has developed a package which has demonstrated
the capability to efficiently monitor tens of thousands of entities on thousands of
hosts, and has been in operation for over 4 years. The project has met the majority
of its initial reqirements, and also the majority of the requirements discovered
along the way. This paper will describe what worked, and what did not, in the first
4 years of the NGOP Project at Fermilab; and we hope will provide valuable lessons
for others considering undertaking even larger (GRID-scale) monitoring projects.
Future processors: What is on the horizon for HEP farms?
In 1995 I predicted that the dual-processor PC would start invading HEP computing and
a couple of years later the x86-based PC was omnipresent in our computing facilities.
Today, we cannot imagine HEP computing without thousands of PCs at the heart.
This talk will look at some of the reasons why we may one day be forced to leave this
sweet-spot. This would be not because we (the HEP community) want to, but rather
because other market forces may pull in different directions. Amongst such forces, I
will review the new generation of powerful game consoles where IBM's Power processor
is currently making strong inroads. Then I will look at the huge mobile market where
low-powered processing rules rather than power-hungry DP Xeon/Xeon-like processors,
and thirdly I will explore in my talk the promise of enterprise servers with a large
number of processors on each die (so-called Core Multi-Processors). For all the
scenarios, we must, of course, keep in mind that HEP can only move when the
price-performance ratio is right.
InGRID - Installing GRID
The "gridification" of a computing farm is usually a complex and time consuming task.
Operating system installation, grid specific software, configuration files
customization can turn into a large problem for site managers.
This poster introduces InGRID, a solution used to install and maintain grid software
on small/medium size computing farms.
Grid elements installation with InGRID consists in three steps.
In the first step nodes are installed using RedHat Kickstart, an installation
method that automate most of a Linux distribution installation, including disk
partitioning, boot loader configuration, network configuration, base package selection.
Grid specific software is than integrated using apt4rpm, a package management wrapper
over the rpm commands. Apt automatically manages packages dependencies, and is able
to download, install and upgrade RPMs from a central software repository.
Finally, grid configuration files are customized through LCFGng, a system to setup
and maintain Unix machines, that can configure many system files, execute scripts,
create users, etc.
(INFM - INFN)
Integrating Mutiple PC Farms into an uniform computing System with Maui
These are several on-going experiments at IHEP, such as BES, YBJ, and CMS
collaboration with CERN. each experiment has its own computing system, these
computing systems run separately. This leads to a very low CPU utilization due
to different usage period of each experiment. The Grid technology is a very
good candidate for integrating these separate computing systems into a "single
image", but it is too early to be put into a production system as it is not
stable and user-friendly as well. A realistic choice is to implement such
an integration and sharing with Maui, an advacned scheduler. Each PC farm
is thought as a partition, which is assigned high priority to its owner users
with preemtor feature. this paper will describe the detail of implementation
with Maui scheduler, as well as the entire system architecture and configuration
(INSTITUE OF HIGH ENERGY PHYSICS)
Linux for the CLEO-c Online system
The CLEO collaboration at the Cornell electron positron storage ring
CESR has completed its transition to the CLEO-c experiment. This new
program contains a wide array of Physics studies of $e^+e^-$
collisions at center of mass energies between 3 GeV and 5 GeV.
New challenges await the CLEO-c Online computing system, as the
trigger rates are expected to rise from < 100 Hz to around 300 Hz at
the J/Psi production threshold, with a moderate increase in data
throughput requirements. While the current Solaris and VxWorks based
readout system will perform adequately under those conditions, there
is a desire to improve the performance of the central components to
extend monitoring capabilities and provide larger safety margins.
The solution, as in most modern particle detector systems, is to
deploy Linux on Intel architecture computers for the performance
For reasons of hardware and software availability, the existing CLEO
Online and Offline computing environment has been ported to the
Linux platform. This development allows the described challenge to
In this presentation, we will report on our experiences adapting
the CLEO Online computing system for operation under Linux. Issues
third party software and code portability will be addressed.
measurements will be presented.
Managing software licences for a large research laboratory
The Product Support (PS) group of the IT department at CERN distributes and
supports more than one hundred different software packages, ranging from tools
for computer aided design, field calculations, mathematical and structural
analysis to software development. Most of these tools, which are used on
a variety of Unix and Windows platforms by different user populations, are
commercial packages requiring a licence. The group is also charged with
license negotiations with the software vendors.
Keeping track of large number and variety of licences is no easy task, so in
order to provide a more automated and more efficient service, the PS group has
developed a database system to both track detailed licence configurations
and to monitor the their use. The system is called PSLicmon (PS Licence
Monitor) and is based on an earlier development from the former CE group.
PSLicmon consists of four main components: report generation, data loader,
Oracle product database and a PHP-based Web-interface. The license log
parser/loader is implemented in Perl and loads reports from the different
license managers into the Oracle database. The database contains information
about products, licenses and suppliers and is linked to CERN's human resource
database. The web-interface allows for on the fly generation of statistics
plots as well as data entry and updates. The system also includes an alarm
system for licence expiry.
Thanks to PSLicmon, the support team is able to better match licence
aquisitions with the diverse needs of its user community, and to be in
control of migration and phaseout scenarios between different products
and/or product versions. The tool has proved to be a useful aid when making
decisions regarding product support policy and licence aquisitions, in
particular ensuring the provision of the correct number of often expensive
software licences to match CERN's needs.
Methodologies and techniques for analysis of network flow data
Network flow data gathered on border routers and core network switch/routers is used
at Fermilab for statistical analysis of traffic patterns, passive network monitoring,
and estimation of network performance characteristics. Flow data is also a critical
tool in the investigation of computer security incidents. Development and enhancement
of flow- based tools is on-going effort. The current state of flow analysis is based
on the open source Flow-Tools package. This paper describes the most recent
developments in flow analysis at Fermilab. Our goal is to provide a multidimensional
view of network traffic patterns, with a detailed breakdown based on site,
experiment, domain, subnet, hosts, protocol, or application. The latest analysis
tool provides a descriptive and graphical representation of network traffic broken
down by combinations of experiment and DNS domain. The tool can be utilized in
real-time mode, as well as to provide a historical view. Another tool analyzes flow
data to provide performance characteristics of completed multistream GridFTP data
transfers. The current prototype provides a web interface for dynamic administration
of the flow reports. We will describe and discuss the new features that we plan on
developing in future enhancements to our flow analysis tool set.
Monitoring the CDF distributed computing farms
CDF is deploying a version of its analysis facility (CAF) at several globally
distributed sites. On top of the hardware at each of these sites is either an FBSNG
or Condor batch manager and a SAM data handling system which in some cases also
makes use of dCache.
The jobs which run at these sites also make use of a central database located at
Fermilab. Each of these systems has its own monitoring.
In order to maintain and effectively use the distributed system, it isimportant that
both the administrators and the users can get a complete global view of the system.
We will present a system which integrates the monitoring of all of these services
into one globally accessible system based on the Monalisa product. This system is
intended for administrators to monitor the system status and service level and for
users to better locate resources and monitor job progress.
In addition, it is meant to satisfy the request by the CDF International Finance
Committee that global computing resource usage by CDF can be audited.
New compact hierarchical mass storage system at Belle realizing a peta-scale system with inexpensive ice-raid disks and an S-ait tape library
The Belle experiment has accumulated an integrated luminosity of more
than 240fb-1 so far, and a daily logged luminosity now exceeds 800pb-
1. These numbers correspond to more than 1PB of raw and processed
data stored on tape and an accumulation of the raw data at the rate
of 1TB/day. To meet these storage demands, a new cost effective,
compact hierarchical mass storage system has been constructed. The
system consists of commodity RAID systems using IDE disks and Linux
PC servers as the front-end and a tape library system using the new
high density SONY S-AIT tape as the back-end.The SONY Peta Serv
software manages migration and restoration of the files between tapes
and disks. The capacity of the tape library is, at the moment, 500 TB
in three 19 inch racks and the RAID system, 64 TB in two 19 inch
racks. An extension of the system to 1.2 PB tape library in eight
racks with 150 TB RAID in four racks is planned. In this talk,
experiences with the new system will be discussed and the performance
of the system when used for data processing and physics analysis of
the Belle experiment will be demonstrated.
Online Monitoring and online calibration/reconstruction for the PHENIX experiment
The PHENIX experiment consists of many different detectors and
detector types, each one with its own needs concerning the
monitoring of the data quality and the calibration. To ease the task
for the shift crew to monitor the performance and status of each
subsystem in PHENIX we developed a general client server based
framework which delivers events at a rate in excess of 100Hz.
This model was chosen to minimize the possibility of accidental
interference with the monitoring tasks themselves. The user only
interacts with the client which can be restarted any time without
loss or alteration of information on the server side. It also
enables multiple people to check simultaneously the same detector -
if need be even from remote locations. The information is
transferred in form of histograms which are processed by the client.
These histograms are saved for each run and some html output is
generated which is used later on to remove problematic runs from the
offline analysis. An additional interface to a data base is provide
to enable the display of long term trends.
This framework was augmented to perform an immediate calibration
pass and a quick reconstruction of rare signals in the counting
house. This is achieved by filtering out interesting triggers and
processing them on a local Linux cluster. That enabled PHENIX to
e.g. keep track of the number of J/Psi's which could be expected
while still taking data.
Parallel implementation of Parton String Model event generator
We report the results of parallelization and tests of the Parton
String Model event generator at the parallel cluster of St.Petersburg
State University Telecommunication center.
Two schemes of parallelization were studied. In the first approach
master process coordinates work of slave processes, gathers and
analyzes data. Results of MC calculations are saved in local files.
Local files are sent to the host computer on which the program of
data processing is started. The second approach uses the parallel
write in the common file shared between all processes. In this case
the load of a communication subsystem of the cluster grows. Both
approaches are realized with MPICH library. Some problems including
the pseudorandom number generation inparallel computations were
The modified parallel version of the PSM code includes a number of
the additional possibilities: a selection of the impact parameter
windows,the account of acceptance of the experimental setup and
trigger selection data, and the calculation of various long range
correlations between such observables as mean transverse momentum and
charged particles multiplicity.
FNAL has over 5000 PCs running either Linux or Windows software. Protecting these
systems efficiently against the latest vulnerabilities that arise has prompted FNAL
to take a more central approach to patching systems. We outline the lab support
structure for each OS and how we have provided a central solution that works within
existing support boundaries. The paper will cover how we identify what patches are
considered crucial for a system on the FNAL network and how we verify that systems
are appropriately patched.
Portable Gathering System for Monitoring and Online Calibration at Atlas
During the runtime of any experiment, a central monitoring system that
detects problems as soon as they appear has an essential role. In a large
experiment, like Atlas, the online data acquisition system is
distributed across the nodes of large farms, each of them running several
processes that analyse a fraction of the events. In this architecture, it is
necessary to have a central process that collects all the monitoring data from the
different nodes, produces full statistics histograms and analyses them.
In this paper we present the design of such a system, called the "gatherer". It
allows to collect any monitoring object, such as histograms, from the farm nodes,
from any process in the DAQ, trigger and reconstruction chain. It also adds up the
statistics, if required, and processes user defined algorithms in order
to analyse the monitoring data. The results are sent to a centralized display, that
shows the information online, and to the archiving system, triggering alarms in case
The innovation of our approach is that conceptually it abstracts the
several communication protocols underneath, being able to talk with different
processes using different protocols at the same time and, therefore, providing
maximum flexibility. The software is easily adaptable to any trigger-DAQ system.
The first prototype of the gathering system has been implemented for Atlas and will
be running during this year's combined test beam.
An evaluation of this first prototype will also be presented.
P. Conde MUINO
Raw Ethernet based hybrid control system for the automatic control of suspended masses in gravitational waves interferometric detectors
In this paper we examine the performance of the raw Ethernet
protocol in deterministic, low-cost, real-time communication. Very
few applications have been reported until now, and they focus on the
use of the TCP and UDP protocols, which however add a sensible
overhead to the communication and reduce the useful bandwidth. We
show how low-level Ethernet access can be used for peer-to-peer,
short distance communication, and how it allows the writing of
applications requiring large bandwidth. We show some examples
running on the Lynx real-time OS and on Linux, both in mixed and
homogeneous environments. As an example of application of
this technique, we describe the architecture of an hybrid Ethernet
based real-time control system prototype we implemented in Napoli,
discussing its characteristics and performances. Finally we discuss
its application to the real-time control of a suspended mass of the
mode cleaner of the 3m prototype optical interferometer for
gravitational wave detection operational in Napoli.
(DIPARTIMENTO DI SCIENZE FISICHE - UNIVERSITà DI NAPOLI FEDERICO II)
Remote Shifting at the CLEO Experiment
The CLEO III data acquisition was from the beginning in the late 90's
designed to allow remote operations and monitoring of the experiment.
Since changes in the coordination and operation of the CLEO experiment
two years ago enabled us to separate tasks of the shift crew into an
operational and a physics task, existing remote capabilities have
been revisited. In 2002/03 CLEO started to deploy its remote
tasks for performing remote shifts and evaluated various communication
tools e.g. video conferencing and remote desktop sharing. Remote,
collaborating institutions were allowed to perform the physicist shift
part from their home institutions keeping only the professional
of the CLEO experiment on site. After a one year long testing and
evaluation phase the remote shifting for physicists is now in
This talk reports on experiences made when evaluating and deploying
various options and technologies used for remote control, operation
and monitoring e.g. CORBA's IIOP, X11 and VNC in the CLEO experiment.
Furthermore some aspects of the usage of video conferencing tools by
distributed shift crews are being discussed.
Simplified deployment of an EDG/LCG cluster via LCFG-UML
The clusters using DataGrid middleware are usually installed and
managed by means of an "LCFG" server. Originally developed by the
Univ. of Edinburgh and extended by DataGrid, this is a complex piece
of software. It allows for automated installation and configuration of
a complete grid site. However, installation of the "LCFG"-Server takes
most of the time, thus hinder widespread use.
Our approach was to set up and preconfigure the LCFG-server inside a
"User Mode Linux" (UML) instance in order to make deployment
faster. The result is the "UML-LCFG-Sserver". It is provided as a
prebuilt root-filesystem image which can be up and running within only
with few configuration steps. Detailed instructions and experience are
also provided on the basis of tests within the CrossGrid
project. Altogether UML-LCFG makes it easier for a new site to join an
EDG/LCG based Grid by bypassing most of the LCFG server installation.
(KARLSRUHE RESEARCH CENTER (FZK))
Status of the alignment calibrations in the ATLAS-Muon experiment
ATLAS is a particle detector which will is being built at CERN in
Geneva. The muon detection system is made up among other things, of
600 chambers measuring 2 to 6 m2 and 30 cm thick. The chambers'
position must be known with an accuracy of +/- 30 m for translations
and +/-100 rad for rotations for a range of +/- 5mm and +/-5mrad.
In order to fulfill these requirements, we have designed different
Due to (i) the very high accuracy required, (ii) the number of
sensors (over 1000) and (iii) the different type of sensors, we
developed one user interface which manages among other things
several control command software. Each of this software is
associated with an accurate calibration bench. In this conference,
we will present only the most complex one which combines command
control, an analysis module, real time processing and database
access. These softwares are now currently used for sensors
The ATLAS DAQ system
The 40 MHz collision rate at the LHC produces ~25 interactions per bunch crossing
within the ATLAS detector, resulting in terabytes of data per second to be handled
by the detector electronics and the trigger and DAQ system. A Level 1 trigger system
based on custom designed and built electronics will reduce the event rate to 100 kHz.
The DAQ system is responsible for the readout of the detector specific electronics
via 1600 point to point links hosted by Readout Subsystems, the collection and
provision of ''Region of Interest data'' to the Level 2 trigger, the building of
events accepted by the Level 2 trigger and their subsequent input to the Event Filter
system where they are subject to further selection criteria. Also the DAQ provides
the functionality for the configuration, control, information exchange and monitoring
of the whole ATLAS detector.
The baseline ATLAS DAQ architecture and its implementation will be introduced. In
this implementation, the configuration, control, information exchange and monitoring
functionalities are provided with CORBA; the
control aspects are handled by an expert system based on CLIPS and the data
connection between 150 Readout Subsystems, up to 500 Level 2 Processing Units and to
80 Event building nodes is done Gigabit Ethernet network technology.
The experience from using the DAQ system in a combined test beam environment where
all ATLAS subdetectors are participating will be presented. The current performances
of some DAQ components as measured in the laboratory environment will be summarized.
Some results from the large scale functionality tests, on a system of a 300 nodes,
aimed at understanding the scalability of the current implementation will also be shown.
(UNIVERSITY OF CALIFORNIA AT IRVINE AND CERN)
The CMS User Analysis Farm at Fermilab
US-CMS is building up expertise at regional centers in preparation for analysis of LHC data. The User Analysis
Farm (UAF) is part of the Tier 1 facility at Fermilab. The UAF is being developed to support the efforts of the
Fermilab LHC Physics Center (LPC) and to enableefficient analysis of CMS data in the US.
The support, infrastructure, and services to enable a local analysis community at a computing center which is
remote from the physical detector and the majority of the collaboration present unique challenges.
The current UAF is a farm running the LINUX operating system providing interactive and batch computing for
users. Load balancing, resource and process management are realized with FBSNG, the batch system
developed at Fermilab. Over the course of the next three years the UAF must grow in size and functionality,
while continuing to support simulated analysis activities and test beam applications.
In this presentation we will describe the development of the current cluster, the technology choices made, the
services required to support regional analysis activities, and plans for the future.
The Condor based CDF CAF
The CDF Analysis Facility (CAF) has been in use since April 2002
and has successfully served 100s of users on 1000s of CPUs.
The original CAF used FBSNG as a batch manager.
In the current trend toward multisite deployment,
FBSNG was found to be a limiting factor,
so the CAF has been reimplemented to use Condor instead.
Condor is a more widely used batch system and
is well integrated with the emerging grid tools.
One of the most useful being the ability to run seamlessly
on top of other batch systems.
The transition has brought us a lot of additional benefits,
such as ease of installation, fault tolerance and
increased manageability of the cluster.
The CAF infrastructure has also been simplified a lot
since Condor implements a number of features we had to
implement ourselves with FBSNG.
In addition, our users have found that Condor's fair share mechanism
provides a more equitable and predictable distribution of resources.
In this talk the Condor based CAF will be presented,
with particular emphasis on the changes needed to run with Condor,
the problems found during and the advantages gained by the transition.
Some background and the plans for the future, as well as results
from Condor scalability tests will also be presented.
The Configurations Database Challenge in the ATLAS DAQ System
The ATLAS data acquisition system uses the database to describe configurations
for different types of data taking runs and different sub-detectors. Such
configurations are composed of complex data objects with many inter-relations.
During the DAQ system initialisation phase the configurations database is
simultaneously accessed by a large number of processes. It is also required that
such processes be notified about database changes that happen during or between
The paper describes the architecture of the configurations database. It presents
the set of graphical tools which are available for the database schema design and
the data editing. The automatic generation of data access libraries for C++ and Java
languages is also described. They provide the programming interfaces to access the
database either via a common file system or via remote database servers, and the
notification mechanism on data changes.
The paper presents results of recent performance and scalability tests, which
allow a conclusion to be drawn about the applicability of the current configurations
database implementation in the future DAQ system.
The Design, Installation and Management of a Tera-Scale High Throughput Cluster for Particle Physics Research
We describe our experience in building a cost efficient High Throughput Cluster (HTC)
using commodity hardware and free software within a university environment.
Our HTC has a modular system architecture and is designed to be upgradable.
The current, second phase configuration, consists of 344 processors and 20 Tbyte of
In order to rapidly install and upgrade software, we have developed
automatic remote system installation and configuration tools to deploy standard
software configurations on individual machines. To efficiently manage machines we
have written a custom cluster configuration database. This database is used to track
all hardware components in the cluster, the network and power distribution and the
software configuration. Access to this database and the cluster performance and
monitoring systems is provided by a web portal, which allows efficient remote
management in our low-manpower environment.
We describe the performance of our system under a mixed load of scalar and parallel
tasks and discuss future possible improvements.
(QUEEN MARY, UNIVERSITY OF LONDON)
The Project CampusGrid
A central idea of Grid Computing is the virtualization of
heterogeneous resources. To meet this challenge the Institute for
Scientific Computing, IWR, has started the project CampusGrid. Its
medium term goal is to provide a seamless IT environment
supporting the on-site research activities in physics,
bioinformatics, nanotechnology and meteorology. The environment
will include all kinds of HPC resources: vector computers, shared
memory SMP servers and clusters of commodity components as well as
a shared high-performance storage solution. After introducing the
general ideas the talk will inform about the current project
status and scheduled development tasks. This is associated with reports on other
activities in the fields of Grid computing and
high performance computing at IWR.
Using HEP Systems to Provide Storage for Biologists
Protein analysis, imaging, and DNA sequencing are some of the branches
of biology where growth has been enabled by the availability of
computational resources. With this growth, biologists face an
associated need for reliable, flexible storage systems. For decades
the HEP community has been driving the development of such storage
systems to meet their own needs. Two of these systems - the dCache
disk caching system and the Enstore hierarchical storage manager - are
viable candidates for addressing the storage needs of biologists.
Both incorporate considerable experience from the HEP community.
While biologists have much to gain from the HEP community's experience
with storage systems, they face several issues that are unique to the
biological sciences. There is a wider diversity in experiments, in
number and size of datafiles, and in client operating systems in
biology than there is in HEP. Patient information must be kept
confidential. Disparate IT departments set up firewalls that separate
client systems and the storage system.
Vanderbilt University is developing a storage system with the goal of
meeting biologists' needs. This system will use Enstore for its
robustness and reliability, and will use the flexible door-based
architecture of dCache to provide storage services to biologists via
web-portal, the dCache copy command, and custom applications. This
system will be deployed using an automated tape library, several
secure central servers, and nodes placed near biologists' existing
compute infrastructure to ensure locality of caches and secure data
channels between researchers and the central servers.
WAN Emulation Development and Testing at Fermilab
The Compact Muon Solenoid (CMS) experiment at CERN's Large Hadron Collider (LHC) is
scheduled to come on-line in 2007. Fermilab will act as the CMS Tier-1 center for the
US and make experiment data available to more than 400 researchers in the US
participating in the CMS experiment. The US CMS Users Facility group, based at
Fermilab, has initiated a project to develop a model for optimizing movement of CMS
experiment data between CERN and the various tiers of US CMS data centers. Fermilab
has initiated a project to design a WAN emulation facility which will enable
controlled testing of unmodified or modified CMS applications and TCP
implementations locally under conditions that emulate WAN connectivity. The WAN
emulator facility is configurable for latency, jitter, and packet loss. The initial
implementation is based on the NISTnet software product. In this paper we will
describe the status of this project to date, the results of validation and comparison
of performance measurements obtained in emulated and real environment for different
applications including multistreams GridFTP. We also will introduce future short term
and intermediate term plans, as well as outstanding problems and issues.
Plenary: Session 4Kongress-Saal
The BIRN Project: Distributed Information Infrastructure and Multi-scale Imaging of the Nervous System (BIRN = Biomedical Informatics Research Network)
The grand goal in neuroscience research is to understand how the interplay of
structural, chemical and electrical signals in nervous tissue gives rise to
behavior. Experimental advances of the past decades have given the individual
neuroscientist an increasingly powerful arsenal for obtaining data, from the level
of molecules to nervous systems. Scientists have begun the arduous and challenging
process of adapting and assembling neuroscience data at all scales of resolution and
across disciplines into computerized databases and other easily accessed sources.
These databases will complement the vast structural and sequence databases created
to catalogue, organize and analyze gene sequences and protein products. The general
premise of the neuroscience goal is simple; namely that with "complete" knowledge of
the genome and protein structures accruing rapidly we next need to assemble an
infrastructure that will facilitate acquisition of an understanding for how
functional complexes operate in their cell and tissue contexts. Our U.C. San Diego-
based group is leading several interdisciplinary projects around this grand
challenge. We are evolving a shared infrastructure that allows for mapping
molecular and cellular brain anatomy in the context of a shared multi-scale mouse
brain atlas system, the Cell-Centered Database (CCDB). Complementary to these
neuroinformatics activities at the National Center for Microscopy and Imaging
Research in San Diego (NCMIR) we have developed new molecular labeling methods
compatible with advanced ultra-wide field laser-scanning light microscopy and multi-
resolution 3 dimensional electron microscopy. These new labeling and imaging
methods are being used to populate the CCDB, using as a driver mouse models of
neurological and neuropsychiatric disorders. The informatics framework is
facilitating cooperative work by distributed teams of scientists engaged in focused
collaborations aimed to deliver new fundamental understanding of structures on the
scale of 1 nm3 to 10's of µm3, a dimensional range that encompasses macromolecular
complexes, organelles, and multi-component structures like synapses and the cellular
interactions in the context of the complex organization of the entire nervous
system. This is a unique and pioneering effort that links new neuroscience
techniques and revolutionary advances in information technology. Database
federation tools are critical to the scalability of these efforts and future
development plans will be described in the context of the NIH-supported project to
create a new framework for collaboration and data integration in the Biomedical
Informatics Research Network (BIRN). BIRN is the leading example of a virtual
database effort that is using the challenge of federating multi-scale distributed
data about the nervous systems to help guide the evolution of an International
Cyberinfrastructure serving all science disciplines, including biomedicine.
(National Center for Microscopy and Imaging Research of the Center for Research in Biological Systems - The Department of Neurosciences, University of California San Diego School of Medicine - La Jolla, California - USA)
More information biography
Video in CDS
The aim of Grid computing is to enable the easy and open sharing of resources
between large and highly distributed communities of scientists and institutes across
many independent administrative domains. Convincing site security officers and
computer centre managers to allow this to happen in view of today's ever-increasing
Internet security problems is a major challenge. Convincing users and application
developers to take security seriously is equally difficult. This paper will describe
the main Grid security issues, both in terms of technology and policy, that have
been tackled over recent years in LCG and related Grid projects. Achievements to
date will be described and opportunities for future improvements will be addressed.
Video in CDS
The impact of e-science
Just as the development of the World Wide Web has had its greatest
impact outside particle physics, so it will be with the development
of the Grid.
E-science, of which the Grid is just a small part, is already making
a big impact upon many scientific disciplines, and facilitating new
scientific discoveries that would be difficult to achieve in any
other way. Key to this is the definition and use of metadata.
Video in CDS
EU Grid Research - Projects and Vision
The European Grid Research vision as set out in the Information
Society Technologies Work Programmes of the EU's Sixth Research
Framework Programme is to advance, consolidate and mature Grid
technologies for widespread e-science, industrial, business and
societal use. A batch of Grid research projects with 52 Million EUR EU
support was launched during the European Grid Technology Days 15 - 17
September 2004. The portfolio of projects has the potential for
turning Europe's strong competence and critical mass in Grid Research
into competitive advantages. In this presentation, the Grid research
vision of the programme and the new project portfolio will be
introduced. More information: www.cordis.lu/ist/grids.
The role of scientific middleware in the future of HEP computing
In the 18 months since the CHEP03 meeting in San Diego, the HEP community deployed
the current generation of grid technologies in a veracity of settings. Legacy
software as well as recently developed applications was interfaced with middleware
tools to deliver end-to-end capabilities to HEP experiments in different stages of
their life cycles. In a series of data challenges, reprocessing efforts and data
distribution activities the community demonstrated the benefits distributed
computing can offer and the power a range of middleware tools can deliver. After
running millions of jobs, moving tera-bytes of data, creating millions of files and
resolving hundreds of bug reports, the community also exposed the limitations of
these middleware tools. As we move to the next level of challenges, requirements
and expectations, we must also examine the methods and procedures we employ to
develop, implement and maintain our common suite of middleware tools. The talk will
focus on the role common middleware developed by the scientific community can and
should play in the software stack of current and future HEP experiments.
Video in CDS
The Evolution of Computing: Slowing down? Not Yet!
Dr Sutherland will review the evolution of computing over the past
decade, focusing particularly on the development of the database and
middleware from client server to Internet computing.
But what are the next steps from the perspective of a software
company? Dr Sutherland will discuss the development of Grid as well
as the future applications revolving around collaborative working,
which are appearing as the next wave of computing applications.
Video in CDS
Grand Challenges facing Storage Systems
In this talk, we will discuss the future of storage systems. In particular, we will
focus on several big challenges which we are facing in storage, such as being able
to build, manage and backup really massive storage systems, being able to find
information of interest, being able to do long-term archival of data, and so on. We
also present ideas and research being done to address these challenges, and provide
a perspective on how we expect these challenges to be resolved as we go forward.
Video in CDS
Poster Session 2: Distributed Computing Services, Distributed Computing systems and ExperiencesCoffee
A general and flexible framework for virtual organization application tests in a grid system
A grid system is a set of heterogeneous computational and storage
resources, distributed on a large geographic scale, which belong to
different administrative domains and serve several different
scientific communities named Virtual Organizations (VOs). A virtual
organization is a group of people or institutions which collaborate
to achieve common objectives. Therefore such system has to guarantee
the coexistence of different VO’s applications providing them the
suitable run-time environment. Hence tools are needed both at local
and central level for testing and detecting eventually bad software
configuration on a grid site.
In this paper we present a web based tool which permits to a Grid
Operational Centre (GOC) or a Site Manager to test a grid site from
the VO viewpoint.
The aim is to create a central repository for collecting both
existing and emerging VO tests. EachVO test may include one ore more
specific application tests, and each test could include one ore more
subtests, arranged in a hierarchic structure.
A general and flexible framework is presented capable to include VO
tests straightforwardly by means of a description file. Submission of
a bunch of tests to a particular grid site is made available through
a web portal. On the same portal, past and current results and logs
can be browsed.
(INFN Via E. Orabona 4 I - 70126 Bari Italy)
A Multidimensional Approach to the Analysis of Grid Monitoring Data
Analyzing Grid monitoring data requires the capability of dealing with
multidimensional concepts intrinsic to Grid systems. The meaningful
dimensions identified in recent works are the physical dimension
referring to geographical location of resources, the Virtual
Organization (VO) dimension, the time dimension and the monitoring
metrics dimension. In this paper, we discuss the application of
On-Line Analytical Processing (OLAP), an approach to the fast
analysis of shared multidimensional information, to the mentioned
problem. OLAP relies on structures called `OLAP cubes', that are
created by a reorganization of data contained inside a
relational database, thus transforming operational data into
Our OLAP model is a four-dimension cuboid based on time, geographic,
Virtual Organization (VO), and monitoring metric. Time and geographic
dimensions have total order relation and form two concept
hierarchies, respectively hours
Alibaba: A heterogeneous grid-based job submission system used by the BaBarexperiment
The BaBar experiment has accumulated many terabytes of data on
particle physics reactions, accessed by a community of hundreds of
Typical analysis tasks are C++ programs, individually written by the
user, using shared templates and libraries. The resources have
outgrown a single platform and a distributed computing model is
needed. The grid provides the natural toolset. However, in contrast
to the LHC experiments, BaBar has an existing user community with an
existing non-Grid usage pattern, and providing users with an
acceptable evolution presents a challenge.
The 'Alibaba' system, developed as part of the UK GridPP project,
provides the user with a familiar command line environment. It draws
on the existing global file systems employed and understood by the
current user base. The main difference is that they submit jobs with
a 'gsub' command that looks and feels like the familiar'qsub'.
However it enables them to submit jobs to computer systems at
different institutions, with minimal requirements on the remote
sites. Web based job monitoring is also provided. The problems and
features (the input and output sandboxes, authentication, data
location) and their solutions are described.
An intelligent resource selection system based on neural network for optimal application performance in a grid environment
Grid computing is a large scale geographically distributed and
heterogeneous system that provides a common platform for running
different grid enabled applications. As each application has
different characteristics and requirements, it is a difficult
task to develop a scheduling strategy able to achieve optimal
performance because application-specific and dynamic system status
have to be taken into account.
Moreover it may be possible to obtain optimal performance for
multiple application simultaneously using a single scheduler. Hence
in a lot of cases the application scheduling strategy is assigned to
an expert application user who provides a ranking criterion for
selecting the best computational element on a set of
available resources. Such criteria are based on user perception of
system capabilities and knowledge about the features and requirements
of his application.
In this paper an intelligent mechanism has been both implemented and
evaluated to select the best computational resource in a grid
environment from the application viewpoint.
A neural network based system has been used to capture automatically
the knowledge of a grid application expert user. The system
scalability problem is also tackled and a preliminary solution based
on sorting algorithm is discussed. The aim is to allow a
common grid application user to benefit of this expertise.
(DEE – POLITECNICO DI BARI, V. ORABONA, 4, 70125 – BARI,ITALY)
ARDA Project Status Report
The ARDA project was started in April 2004 to support
the four LHC experiments (ALICE, ATLAS, CMS and LHCb)
in the implementation of individual
production and analysis environments based on the EGEE middleware.
The main goal of the project is to allow a fast feedback between the
experiment and the middleware development teams via the
construction and the usage of end-to-end prototypes
allowing users to perform analyses out of the present
data sets from recent montecarlo productions.
In this talk the project is presented with highlights of the
first results and lessons learnt so far.
The relations of the project with similar initiatives within
and outside the High Energy Physics community are reviewed
(notably in the EGEE application identification and support).
The ARDA Team
Beyond Persistence: Developments and Directions in ATLAS Data Management
As ATLAS begins validation of its computing model in 2004, requirements
imposed upon ATLAS data management software move well beyond simple persistence,
and beyond the "read a file, write a file" operational model that has sufficed for
most simulation production. New functionality is required to support the
ATLAS Tier 0 model, and to support deployment in a globally distributed environment
in which the preponderance of computing resources--not only CPU cycles but
data services as well--reside outside the host laboratory.
This paper takes an architectural perspective in describing new developments in ATLAS
data management software, including the ATLAS event-level metadata system and related
infrastructure, and the mediation services that allow one to distinguish writing from
registration and selection from retrieval, in a manner that is consistent both for
event data and for time-varying conditions. The ever-broader role of databases and
catalogs, and issues relatedto the distributed deployment thereof, are also
Building the LCG: from middleware integration to production quality software
In the last few years grid software (middleware) has become available
from various sources. However, there are no standards yet which
allow for an easy integration of different services.
Moreover, middleware was produced by different projects with the main
goal of developing new functionalities rather than production quality
In the context of the LHC Computing Grid project (LCG) an integration,
testing and certification activity is ongoing which aims at producing
a stable coherent set of services.
Here we report on the processes employed to produce the LCG middleware
release and related activities, including the infrastructures used, the
activities needed to integrate the various components and the
Our certification process consists of a continuous iterative cycle that
also involves feedback from the LCG production system and input from
the software providers.
The architecture of the LCG middleware is described, including
additional components developed by LCG to improve scalability and
Other associated activities include packaging for deployment, porting
to different platforms, debugging and patching of the software.
Functionality and stress tests are performed via a large test-bed
infrastructure that allows for benchmarking of different configurations.
We describe also the results of our tests and our experience
collected during the building of the LCG infrastructure.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory
A description of a Condor-based, Grid-aware batch
software system configured to function asynchronously
with a mass storage system is presented. The software
is currently used in a large Linux Farm (2700+
processors) at the RHIC and ATLAS Tier 1 Computing
Facility at Brookhaven Lab. Design, scalability,
reliability, features and support issues with a
complex Condor-based batch system are addressed within
the context of a Grid-like, distributed computing
(Brookhaven National Lab)
CERN Modular Physics Screensaver or Using spare CPU cycles of CERN's Desktop PCs
CERN has about 5500 Desktop PCs. These computers offer a large pool of resources
that can be used for physics calculations outside office hours.
The paper describes a project to make use of the spare CPU cycles of these PCs for
LHC tracking studies. The client server application is implemented as a lightweight,
modular screensaver and a Web Application containing the physics job repository. The
information exchange between client and server is done using the HTTP protocol. The
design and implementation is presented together with results of performance and
scalability studies. A typical LHC tracking study involves some 1500 jobs, each over
100,000 turns, requiring about 1 hour of CPU on a modern PC. A reliable and easy to
use Linux interface to the CPSS Web application has been provided. It has been used
for a production run of 15,000 jobs, using some 50 desktop Windows PCs, which
uncovered a numerical incompatibility between Windows 2000 and XP. It is expected
to make available up to two orders of magnitude more computing power for these
studies at zero cost.
Cross Experiment Workflow Management: The Runjob Project
Building on several years of sucess with the MCRunjob projects at
DZero and CMS, the fermilab sponsored joint Runjob project aims to
provide a Workflow description language common to three experiments:
DZero, CMS and CDF. This project will encapsulate the remote
processing experiences of the three experiments in an extensible
software architecture using web services as a
communication medium. The core of the Runjob project will be the
Shahkar software packages that provide services for describing jobs
and targeting them at different execution environments. A common
interface to multiple storage and compute grid elements will be
provided, alllowing the three experiments to share hardware resources
in a transparent manner. Several tools provided by Shahkar are
discussed including FileMetaBrokers, hich provide a
uniform way to handle files and metadata over a distributed cluster,
the ShREEK runtime execution environment that allows executable jobs
to provide a real time monitoring and control interface to any
system, the scriptObject generic task encapsulation objects and
XMLProcessor object persistency tool.
D0 data processing within EDG/LCG
The D0 experiment at the Tevatron is collecting some 100 Terabytes of data
each year and has a very high need of computing resources for the various
parts of the physics program. D0 meets these demands by establishing a
world - increasingly based on GRID technologies.
Distributed resources are used for D0 MC production and data
reprocessing of 1 billion events, requiring 250 TB to be transported over
WANs. While in 2003 most of this computing at remote sites was
distributed manually, some data reprocessing was performed with the EDG.
In 2004 GRID tools are increasingly and successfully employed.
We will report on performing MC production and data reprocessing using
EDG and LCG. We will explain how the D0 computing
environment was linked to these GRID platforms, and will discuss some lessons
learned (for both Grid computing and preparing applications for
distributed operation) from the D0 reprocessing on EDG, subjecting a
generic Grid infrastructure to real data for the first time.
An outlook on plans for applying LCG within D0 is given.
(UNIVERSITY OF WUPPERTAL)
Data management services of NorduGrid
In common grid installations, services responsible for storing big data
chunks, replication of those data and indexing their availability are usually
completely decoupled. And a task of synchronizing data is passed to either
user-level tools or separate services (like spiders) which are subject to
failure and usually cannot perform properly if one of underlying services
The NorduGrid Smart Storage Element (SSE) was designed to try to
overcome those problems by combining the most desirable features into one
service. It uses HTTPS/G for secure data transfer, Web Services for
control (through same HTTPS/G channel) and can provide information to
indexing services used in middlewares based on the Globus Toolkit (TM). At
the moment, those are the Replica Catalog and the Replica Location
Service. The modular internal design of the SSE and the power of C++
object programming allows to add support for other indexing services in
an easy way.
There are plans to complement it with a Smart Indexing Service capable of
resolving inconsistencies hence creating a robust distributed data storage
(Lund University, Sweden)
Data Rereprocessing on Worldwide Distributed Sytems
The D0 experiment faces many challenges enabling access to large
datasets for physicists on 4 continents. The strategy of solving these
problems on worlwide distributed computing clusters is followed.
Already since the begin of TEvatron RunII (March 2001) all Monte-Carlo
simulations are produced outside of Fermilab at remote systems. For
as system of regional analysis centers (RACs) was established which
supply the associated institutes with the data. This structure which
is similar the the Tier structure foreseen for LHC was used in autumn
2003 to rereprocess all D0-data with the uptodate and
much improved recontruction software.
As the first running experiment D0 has implemented and operated all
important computing dask of a high energy physics experiment on
worldwide distributed systems.
The experiences gained in D0 can be applied to judge the LHC
Database Usage and Performance for the Fermilab Run II Experiments
The Run II experiments at Fermilab, CDF and D0, have extensive database needs
covering many areas of their online and offline operations. Delivery of the data to
users and processing farms based around the world has represented major challenges to
both experiments. The range of applications employing databases includes data
management, calibration (conditions), trigger information, run configuration, run
quality, luminosity, and others. Oracle is the primary database product being used
for these applications at Fermilab and some of its advanced features have been
employed, such as table partitioning and replication. There is also experience with
open source database products such as MySQL for secondary databases. A general
overview of the operation, access patterns, and transaction rates is examined and the
potential for growth in the next year presented. The two experiments, while having
similar requirements for availability and performance, employ different architectures
for database access. Details of the experience for these approaches will be compared
and contrasted, as well as the evolution of the delivery systems throughout the run.
Tools employed for monitoring the operation and diagnosing problems will also be
Deployment of SAM for the CDF Experiment
CDF is an experiment at the Tevatron at Fermilab. One dominating
factor of the experiments' computing model is the high volume of raw,
reconstructed and generated data. The distributed data handling
services within SAM move these data to physics analysis
applications. The SAM system was already in use at the D-Zero
experiment. Due to difference in the computing model of the two
experiments some aspects of the SAM system had to be adapted. We will
present experiences from the adaptation and the deployment phase. This
includes the behavior of the SAM system on batch systems of very
different sizes and type as well as the interaction between the
datahandling and the storage systems, ranging from disk pools to tape
systems. In particular we will cover the problems faced on large scale
compute farms. To accommodate the needs of Grid computing, CDF deployed
installations consisting of SAM for datahandling and CAF for high
throughput batch processing. The CDF experiment already had experiences
with the CAF system. We will report on the deployment of the combined
(Fermi National Accelerator Laboratory / University of Oxford)
DIRAC Lightweight information and monitoring services using XML-RPC and Instant Messaging
The DIRAC system developed for the CERN LHCb experiment is a grid
infrastructure for managing generic simulation and analysis jobs. It
enables jobs to be distributed across a variety of computing
resources, such as PBS, LSF, BQS, Condor, Globus, LCG, and individual
A key challenge of distributed service architectures is that there is
no single point of control over all components. DIRAC addresses this
via two complementary features:
a distributed Information System, and an XMPP (Extensible Messaging
and Presence Protocol) Instant Messaging framework.
The Information System provides a concept of local and remote
Any information which is not found locally will be fetched from
remote sources. This allows a component to define its own state,
while fetching the state of other components directly from those
components, or via a central Information Service. We will present the
architecture, features, and performance of this system.
XMPP has provided DIRAC with numerous advantages. As an
authenticated, robust,lightweight, and scalable asynchronous message
passing system, XMPP is used, in addition to XML-RPC, for inter-
Service communication, making DIRAC very fault-tolerant, a critical
feature when using Service Oriented Architectures. XMPP
is also used for monitoring real-time behaviour of the various DIRAC
Finally, XMPP provides XML-RPC like facilities which are being
developed to provide control channels direct to Services, Agents, and
Jobs. We will describe our novel use of Instant Messaging in DIRAC
and discuss directions for the future.
(UNIVERSITY OF OXFORD PARTICLE PHYSICS)
DIRAC Workload Management System
The Workload Management System (WMS) is the core component of the
DIRAC distributed MC production and analysis grid of the LHCb
experiment. It uses a central Task database which is accessed via
a set of central Services with Agents running on each of the LHCb
sites. DIRAC uses a 'pull' paradigm where Agents request tasks
whenever they detect their local resources are available.
The collaborating central Services allow new components to be
plugged in easily. These Services can perform functions such as
scheduling optimization, task prioritization, job splitting and merging,
to name a few. They provide also job status information for various
monitoring clients. We will discuss the services deployment and operation
with particular emphasis on the robustness and scalability issues.
The distributed Agents have modular design which allows easy functionality
extensions to adapt to the needs of a particular site. The Agent
installation have only basic pre-requisites which makes it easy for new
sites to be incorporated. An Agent can be deployed on a gatekkeeper of a
large cluster or just on a single worker node of the LCG grid. PBS,LSF,BQS,
Condor,LCG,Globus can be used as the DIRAC computing resources.
The WMS components use XML-RPC and instant messaging Jabber protocols
for communication which increases the overall reliability of the
system. The jobs handled by the WMS are described using Classad library
which facilitates the interoperability with other grids.
Distributed computing and oncological radiotherapy: technology transfer from HEP and experience with prototype systems
We show how nowadays it is possible to achieve the goal of accuracy and fast computation response in radiotherapic dosimetry using Monte Carlo
methods, together with a distributed computing model.
Monte Carlo methods have never been used in clinical practice because, even if they are more accurate than available commercial software, the
calculation time needed to accumulate sufficient statistics is too long for a realistic use in radiotherapic treatment.
We present a complete, fully functional prototype dosimetric system for radiotherapy, integrating various components based on HEP software
systems: a Geant4-based simulation, an AIDA-based dosimetric analysis, a web-based user interface, and distributed processing either on a local
computing farm or on geographically spread nodes.
The performance of the dosimetric system has been studied in three execution modes: sequential on a single dedicated machine, parallel on a
dedicated computing farm, parallel on a grid test-bed. An intermediate software layer, the DIANE system, makes the three execution modes
completely transparent to the user, allowing to use the same code in any of the three configurations.
Thanks to the integration in a grid environment, any hospital, even small ones or in less wealthy countries, that could not afford the high costs of
commercial treatment planning software, may get the chance of using advanced software tools for oncological therapy, by accessing distributed
computing resources, shared with other hospitals and institutes belonging to the same virtual organization
Distributed Testing Infrastructure and Processes for the EGEE Grid Middleware
Extensive and thorough testing of the EGEE middleware is essential to ensure that
a production quality Grid can be deployed on a large scale as well as
across the broad range of heterogeneous resources that make up the hundreds of
Grid computing centres both in Europe and worldwide.
Testing of the EGEE middleware encompasses the tasks of both verification and
validation. In adition we test the integrated middleware for
stability, platform independence, stress resilience, scalability and
The EGEE testing infrastructure is distributed across three
major EGEE grid centres in three countries: CERN, NIKHEF and RAL.
As much as is possible the testing procedures are automated and
integrated with the EGEE build system. This allows for continuous
testing together with the incremental daily code builds, fast and
early feedback to developers of bug, and for the easy inclusion of
This paper will report on the initial results of the testing
procedures, frameworks and automation techniques adopted by the EGEE project,
the advantages and disadvantages of test automation and the
issues involved in testing a complex distributed middleware system in a
Distributed Tracking, Storage, and Re-use of Job State Information on the Grid
The Logging and Bookkeeping service tracks job passing through the Grid. It collects
important events generated by both the grid middleware components and
applications, and processes them at a chosen L&B server to provide the job
state. The events are transported through secure reliable channels. Job
tracking is fully distributed and does not depend on a single information
source, the robustness is achieved through speculative job state computation in
case of reordered, delayed or lost events. The state computation is easily
adaptable to modified job control flow.
The events are also passed to the related Job Provenance service. Its purpose
is a long-term storage of information on job execution, environment, and the
executable and input sandbox files. The data can be used for debugging,
post-mortem analysis, or re-running jobs. The data are kept by the
job-provenance storage service in a compressed format, accessible on
per-job basis. A complementary index service is able to find particular jobs
according to configurable criteria, e.g. submission time or "tags" assigned by
the user. A user client to support job re-execution is planned.
Both the L&B and Job Provenance index server provide web-service interfaces
for querying. Those interfaces comply with the On-demand producer specification
of the R-GMA infrastructure. Hence R-GMA capabilities can be utilized to
perform complex distributed queries across multiple servers. Also,
aggregate information about job collections can be easily provided.
The L&B service was deployed in the EU DataGrid and Cern LCG projects,
the Job Provenance will be deployed in the EGEE project.
(CESNET, CZECH REPUBLIC)
Experience integrating a General Information System API in LCG Job Management and Monitoring Services
In a Grid environment, the access to information on system resources is a necessity
in order to perform common tasks such as matching job requirements with available
resources, accessing files or presenting monitoring information. Thus both
middleware service, like workload and data management, and applications, like
monitoring tools, requiere an interface to the Grid information service which
provides that data.
Even though a unique schema for the published information is defined, actual
implementations use different data models, and define different access protocols.
Applications interacting with the information service must therefore deal with
several APIs, and be aware of the underlying technology in order to use the
appropiate syntax for their queries or to publish new information.
We have produced a new hign level C++ API that accomodates several existing
implementations of the information service such as Globus MDS(LDAP based), MDS3(XML
based) an R-GMA(SQL based). It allows applications to access information in a
transparent manner loading the needed implementation specific library on demand.
Features allowing for the adding and removal of dynamic information have been
included as well. A general query language to make the API compatible with future
protocols has been used.
In this paper we described the design of this API and the results obtained
integrating this API in the Workload Management system and in the GridIce monitoring
system of LCG.
P. Mendez Lorenzo
Experience with Deployment and Operation of the ATLAS Production System and the Grid3+ Infrastructure at Brookhaven National Lab
This paper describes the deployment and configuration of the
production system for ATLAS Data Challenge 2 starting in May 2004,
at Brookhaven National Laboratory, which is the Tier1 center in
the United States for the International ATLAS experiment. We will
discuss the installation of Windmill (supervisor) and Capone (executor)
software packages on the submission host and the relevant security
issues. The Grid3+ infrastructure and information service are used
for the deployment of grid enabled ATLAS transformations on the Grid3+
computing elements. The Tier 1 hardware configuration includes 95
dual processor Linux compute nodes, 24 TB of NFS disk and an HPSS
mass storage system. VOMS server maintains both VO services for US
ATLAS and BNL local site policies. This paper describes the work of
optimizing the performance and efficiency of this configuration.
(Brookhaven National Laboratory)
Experiment Software Installation experience in LCG-2
The management of Application and Experiment Software represents a very
common issue in emerging grid-aware computing infrastructures.
While the middleware is often installed by system administrators at a site
via customized tools that serve also for the centralized management of
the entire computing facility, the problem of installing, configuring and
validating Gigabytes of Virtual Organization (VO) specific software or
frequently changing user applications remains an open issue.
Following the requirements imposed by the experiments, in the LHC Computing
Grid (LCG) Experiment Software Managers (ESM) are designated people
with privileges of installing, removing and validating software for a
specific VO on a per site basis.
They can manage univocally identifying tags in the LCG Information
System to announce the availability of a specific software version.
Users of a VO can then select, via the published tag, sites to run their jobs.
The solution adopted by LCG has mainly served its purpose but it presents many problems.
The requirement imposed by the present solution for the existence of a
shared file-system in a computing farm poses performance,
reliability and scalability issues for large installations.
With this work we present a more flexible service based on P2P
technology that has been designed to tackle the limitation of the current system.
This service allows the ESM to propagate the installation occuring in a given WN to
the rest of the farm elements.
We illustrate the deployment, the design, preliminary results obtained and the
feedback from the LHC experiments and sites that have adopted it.
Federating Grids: LCG meets Canadian HEPGrid
A large number of Grids have been developed, motivated by
geo-political or application requirements. Despite being mostly based
on the same underlying middleware, the Globus Toolkit, they are
generally not inter-operable for a variety of reasons. We present a
method of federating those disparate grids which are based on the
Globus Toolkit, together with a concrete example of interfacing the
LHC grid(LCG) with HEPGrid. HEPGrid consists of shared resources, at
several Canadian research institutes, which are exposed via Globus
gatekeepers, and makes use of Condor-G for resource advertisement,
matchmaking and job submission. An LCG Computing Element(CE) based at
the TRIUMF Laboratory hosts a HEPGrid User Interface(UI) which is
contained within a custom jobmanager. This jobmanager appears in the
LCG information system as a normal CE publishing an aggregation of the
HEPGrid resources. The interface interprets the incoming job in terms
of HEPGrid UI usage, submits it onto HEPGrid, and implements the
jobmanager 'poll' and 'remove' methods, thus enabling monitoring and
control across the grids. In this way non-LCG resources are
integrated into LCG, without the need for LCG middleware on those
resources. The same method can be used to create interfaces between
other grids, with the details of the child-Grid being fully abstracted
into the interface layer. The LCG-HEPGrid interface is operational,
and has been used to federate 1300 CPU's at 4 sites into LCG for the
Atlas Data Challenge (DC2).
(Simon Fraser University)
Generic logging layer for the distributed computing
Most HENP experiment software includes a logging or tracing API allowing for
displaying in a particular format important feedback coming from the core
application. However, inserting log statements into the code is a low-tech method
for tracing the program execution flow and often leads to a flood of messages in
which the relevant ones are occluded. In a distributed computing environment,
accessing the information via a log-file is no longer applicable and the approach
fails to provide runtime tracing.
Running a job involves a chain of events where many components are involved often
written in diverse languages and not offering a consistent and easily adaptable
interface for logging important events.
We will present an approach based on a new generic layer built on top of a logger
family derived from the Jakarta log4j project that includes log4cxx, log4c, log4perl
packages. This provides consistency across packages and framework.
Additionally, the power of using log4j, is the possibility to enable logging (or
features) at runtime without modifying the application binary or the wrapper layers.
We provide a C++ abstract class library that serves as a proxy between the
application framework and the distributed environment. The approach is designed so
that the debugging statements can remain in shipped code without incurring a heavy
performance cost. Logging equips the developer with as detailed context as necessary
for application failures, from testing, quality assurance to a production mode
limited amount of information. We will explain and show its implementation in the
STAR production environment.
(BROOKHAVEN NATIONAL LABORATORY)
GILDA: a Grid for dissemination activities
Computational and data grids are now entering a more mature phase where experimental
test-beds are turned into production quality infrastructures operating around the
clock. All this is becoming true both at national level, where an example is the
Italian INFN production grid (http://grid-it.cnaf.infn.it), and at the continental
level, where the most strinking example is the European Union EGEE Project
However, the impact of grid technologies on the next future way of doing e-science
and research in Europe will be proportional to the capability of National and
European Grid Infrastructures to attract and serve
many diverse scientific and industrial communities through serious and detailed
dissemination and tutoring programs.
In this contribution we present GILDA, the Grid Infn Laboratory for Dissemination
Activities (http://gilda.ct.infn.it). GILDA is a complete suite of grid elements
(Certification Authority, Virtual Organization, Distributed Test-bed, Grid
Demonstrator, etc.) completely devoted to dissemination activities. GILDA can also
act as a fast-prototyping test-bed where to start the porting/interfacing of new
applications with the grid middle-ware. The use and exploitation of GILDA in the
context of the Network Activities of the EGEE Project will be discussed.
(Univ. Catania and INFN Catania)
Global Grid User Support for LCG
For very large projects like the LHC Computing Grid Project (LCG) involving 8,000
scientists from all around the world, it is an indispensable requirement to have a
well organized user support. The Institute for Scientific Computing at the
Forschungszentrum Karlsruhe started implementing a Global Grid User Support (GGUS)
after official assignment of the Grid Deployment Board in March 2003. For this
purpose a web portal and a helpdesk application have been developed. As a single
entry point for all Grid related issues and problems GGUS follows the objectives of
providing news, documentation and status information about Grid resources. The user
will find forms to submit and track service requests. GGUS collaborates with
different support teams in the Grid environment like the Grid Operations Center and
the Experiment Specific Support. They can access the helpdesk system via web
interface. GGUS stores all the incoming trouble tickets and outgoing solutions in a
central database and plans to build up a knowledge base where all the information
can be offered in a structured manner.
As a prototype GGUS started operation at the Forschungszentrum Karlsruhe in October
2003 and supported local user groups of the German Tier 1 Computing Center, called
GridKa. 4 month later the GGUS system was opened for the LCG community.
The GGUS system will be explained and demonstrated. The present status of GGUS
within the LCG environment will be discussed.
GoToGrid - A Web-Oriented Tool in Support to Sites for LCG Installations
The installation and configuration of LCG middleware, as it is currently being done,
is complex and delicate.
An “accurate” configuration of all the services of LCG middleware requires a deep
knowledge of the inside dynamics and hundreds of parameters to be dealt with. On the
other hand, the number of parameters and flags that are strictly needed in order to
run a working ”default” configuration of the middleware is relatively small, due to
the fact that the values to be set mainly deal with environment configuration and
with a limited set of possible operation scenarios.
This “default” configuration appears to be the most suitable for sites joining LCG
for the first time.
The GoToGrid system is aimed to support Site Administrators to easily perform such a
G2G combines the gathering of configuration information, provided by sites, with the
dynamic adaptive creation of customized documentation and installation tools.
By using a web interface and being requested only for the relevant configuration
information, site Administrators will be able to design the desired configuration of
their own LCG site.
Site configuration data is collected and stored in a well defined format liable to
be used as the interface to different configuration management tools.
Grid Deployment Experiences: The path to a production quality LDAP based grid information system
This paper reports on the deployment experience of the defacto grid
information system, Globus MDS, in a large scale production grid. The
results of this experience led to the development of an information
caching system based on a standard openLDAP database. The paper then
describes how this caching system was developed further into
a production quality information system including a generic framework
for information providers. This includes the deployment and operation
experience and the results from performance tests on the information
system to assess the scalability limits of it.
GROSS: an end user tool for carrying out batch analysis of CMS data on the LCG-2 Grid.
GROSS (GRidified Orca Submission System) has been developed to provide CMS
end users with a single interface for running batch analysis tasks over
the LCG-2 Grid. The main purpose of the tool is to carry out job
splitting, preparation, submission, monitoring and archiving in a
transparent way which is simple to use for the end user. Central to its
design has been the requirement for allowing multi-user analyses, and to
accomplish this all persistent information is stored on a backend MySQL
database. This database is additionally shared with BOSS, to which GROSS
interfaces in order to provide job submission and real time monitoring
In this paper we present an overview of GROSS's architecture and
functionality and report on first user tests of the system using CMS Data
Challenge 2004 data (DC04).
(IMPERIAL COLLEGE LONDON)
Installing and Operating a Grid Infrastructure at DESY
DESY is one of the world-wide leading centers for research with particle
accelerators and a center for research with synchrotron light.
The hadron-electron collider HERA houses four experiments which are taking
data and will be operated until 2006 at least.
The computer center manages a data volumes of order 1 PB and is the home
for around 1000 CPUs.
In 2003 DESY started to set up a Grid infrastructure on site.
Monte Carlo production is the primer HEP application candidate for the Grid
at DESY. The experiments have started major tests.
A first Grid Testbed was based on EDG 1.4.
Some effort was taken to install the binary distribution of the middleware
on SuSE based Linux systems at DESY.
With the first fixed LCG-2 release in spring 2004, the Grid Testbed2 was
installed, which serves as the basis for all further DESY activities.
The contribution to CHEP2004 will start by briefly summarizing the status of the
Grid activities at DESY in the context of EGEE and D-GRID, in which DESY
takes a leading role.
In the following, we will discuss the integration of Grid components in
the infrastructure of the DESY computer center.
This includes technical aspects of the operating system, such as
SuSE versus RedHat Linux, the interaction with the mass storage system, and
the management of Virtual Organizations.
We will finish with discussing installation and operation experiences
of Grid middleware at DESY, also having in mind HEP and future synchrotron
light experiments in the X-FEL era.
JIM Deployment for the CDF Experiment
JIM (Job and Information Management) is a grid extension to the mature data handling
system called SAM (Sequential Access via Metadata) used by the CDF, DZero and Minos
Experiments based at Fermilab. JIM uses a thin client to allow job submissions from
any computer with Internet access, provided the user has a valid certificate or
kerberos ticket. On completion the job output can be downloaded using a web
interface. The JIM execution site software can be installed on shared resources,
such as ScotGRID, as it may be configured for any batch system and does not require
exclusive control of the hardware. Resources that do not belong entirely to CDF and
thus cannot run DCAF (Decentralised CDF Analysis Farm), may therefore be accessed
using JIM. We will report on the initial deployment of JIM for CDF and the steps
taken to integrate JIM with DCAF.
(UNIVERSITY OF GLASGOW)
Job Interactivity using a Steering Service in an Interactive Grid Analysis Environment
In the context of Interactive Grid-Enabled Analysis Environment
(GAE), physicists desire bi-directional interaction with the job
they submitted. In one direction, monitoring information about the
job and hence a “progress bar” should be provided to them. On other
direction, physicist should be able to control their jobs. Before
submission, they may direct the job to some specified resource or
computing element. Before execution, its parameter may be changed or
it may be moved to another location. During execution, its
intermediate results should be fetched or it may be moved to another
location. Also, physicists should be able to kill, restart, hold and
resume their jobs.
Interactive job execution requires that at each step, the user must
make choices between alternative application components, files, or
locations. So a dead end may be reached where no solution can be
found, which would require backtracking to undo some previous
choice. Another desire is reliable and optimal execution of the job.
Grid should take some decisions regarding the job execution to help
in reliable and optimal execution of the job. Reliability can be
achieved using the job recovery mechanism. When a job on grid fails,
the recovery mechanism should resubmit the job on either the same
resource or on different resource. Check-pointing the job will
make resource utilization low when recovering the job from failure.
In this paper the architecture and design of an autonomous grid
service is described that fulfills the above stated requirements for
interactivity in Grid-enabled data analysis.
Job Monitoring in Interactive Grid Analysis Environment
Grid is emerging as a great computational resource but its dynamic behaviour makes
the Grid environment unpredictable. System failure or network failure can occur or
the system performance can degrade. So once the job has been submitted monitoring
becomes very essential for user to ensure that the job is completed in an efficient
way. In current environments once user submits a job he loses direct control over
the job, system behaves like a batch system, user submits the job and gets the
result back. Only information a user can obtain about a job is whether it is
scheduled, running, cancelled or finished. This information is enough from the Grid
management point of view but not from the point of view of a user. User wants
interactive environment in which he can check the progress of the job, obtain
intermediate results, terminate the job based on the progress of job or intermediate
results, steer the job other nodes to achieve better performance and check the
resources consumed by the job. So a mechanism is needed that can provide user with
secure access to information about different attributes of a job. In this paper we
describe a monitoring service, a java based web service that will provide secure
access to different attributes of a job once a job has been submitted to Interactive
Grid Analysis Environment.
Job-monitoring over the Grid with GridIce infrastructure.
In a wide-area distributed and heterogeneous grid environment, monitoring
represents an important and crucial task. It includes system status checking,
performance tuning, bottlenecks detecting, troubleshooting, fault notifying. In
particular a good monitoring infrastructure must provide the information to
track down the current status of a job in order to locate any problems. Job
monitoring requires interoperation between the monitoring system and other grid
Currently development and deployment LCG testbeds integrate GridICE monitoring
system which measures and publics the state of a grid resource at a particular
point in time. In this paper we present the efforts to integrate in the current
GridICE infrastructure, additional useful information about job status, e.g. the
name of job, the virtual organization to which it belongs, eventually real and
mapped user who has submitted the job, the effective CPU time consumed and its
(UNIVERSITà DEGLI STUDI DI BARI), G. Tortone
K5 @ INFN.IT: an infrastructure for the INFN cross REALM & AFS cell authentication.
The infn.it AFS cell has been providing a useful single file-space and authentication mechanism for the whole
INFN, but the lack of a distributed management system, has lead several INFN sections and LABs to setup local
AFS cells. The hierarchical transitive cross-realm authentication introduced in the Kerberos 5 protocol and the
new versions of the OpenAFS and MIT implementation of Kerberos 5, make possible to setup an AFS cross cell
authentication in a transparent way, using the Kerberos 5 cross-realm one. The goal of the K5 @ INFN.IT
project is to provide a Kerberos 5 authentication infrastructure for the INFN and cross-realm authentication
to be used for the cross cell AFS authentication. In this work we describe the scenario, the results of various
tests performed, the solution chosen and the status of the K5 @ INFN.IT project.
LEXOR, the LCG-2 Executor for the ATLAS DC2 Production System
In this paper we present an overview of the implementation of the LCG
interface for the ATLAS production system. In order to take profit
of the features provided by DataGRID software, on which LCG is based,
we implemented a Python module, seamless integrated into the Workload
Management System, which can be used as an object-oriented API to the
submission services. On top of it we implemented Lexor,an executor
component conforming to the pull/push model designed by the DC2
production system team. It pulls job descriptions from the supervisor
component and uses them to create job objects, which in turn are
submitted to the Grid. All the typical Grid operations (match-making
with respect to input data location, registration of output data in
the replica catalog, workload balancing) are performed by the
underlying middleware, while interactions with ATLAS metadata
catalog and the production database are granted by the integration
with the Data Management System (Don Quijote) client module and via
XML messages to the production supervisor (Windmill).
(INFN - MILANO)
LHC data files meet mass storage and networks: going after the lost performance
Experiments frequently produce many small data files for reasons beyond their control, such as output
splitting into physics data streams, parallel processing on large farms, database technology incapable of
concurrent writes into a single file, and constraints from running farms reliably. Resulting data file size is
often far from ideal for network transfer and mass storage performance. Provided that time to analysis does
not significantly deteriorate, files arriving from a farm could easily be merged into larger logical chunks, for
example by physics stream and file type within a configurable time and size window.
Uncompressed zip archives seem an attractive candidate for such file merging and are currently tested by the
CMS experiment. We describe the main components now in use: the merging tools, tools to read and write zip
files directly from C++, plug-ins to the database system, mass-storage access optimisation, consistent
handling of application and replica metadata, and integration with catalogues and other grid tools. We report
on the file size ratio obtained in the CMS 2004 data challenge and observations and analysis on changes to
data access as well as estimated impact on network usage.
(NORTHEASTERN UNIVERSITY, BOSTON, MA, USA)
Mass Storage Management and the Grid
The University of Edinburgh has an significant interest in mass storage systems as it
is one of the core groups tasked with the roll out of storage software for the UK's
particle physics grid, GridPP. We present the results of a development project to
provide software interfaces between the SDSC Storage Resource Broker, the EU DataGrid
and the Storage Resource Manager. This project was undertaken in association with the
eDikt group at the National eScience Centre, the Universities of Bristol and Glasgow,
Rutherford Appleton Laboratory and the San Diego Supercomputing Center.
MONARC2: A Processes Oriented, Discrete Event Simulation Framework for Modelling and Design of Large Scale Distributed Systems.
The design and optimization of the Computing Models for the future LHC experiments,
based on the Grid technologies, requires a realistic and effective modeling and
simulation of the data access patterns, the data flow across the local and wide area
networks, and the scheduling and workflow created by many concurrent, data intensive
jobs on large scale distributed systems.
This paper presents the latest generation of the MONARC (MOdels of Networked Analysis
at Regional Centers) simulation framework, as a design and modelling tool for large
scale distributed systems applied to HEP experiments. A process-oriented approach for
discrete event simulation is used for describing concurrent running programs, as well
as the stochastic arrival patterns that characterize how such systems are used. The
simulation engine is based on Threaded Objects, (or Active Objects) which offer great
flexibility in simulating the complex behavior of distributed data processing
programs. The engine provides an appropriate scheduling mechanism for the Active
Objects with efficent support for interrupts.
The framework provides a complete set of basic components (processing nodes, data
servers, network components) together with dynamically loadable decision units
(scheduling or data replication modules) for easily building complex Computing Model
Examples of simulating complex data processing systems, specific for the LHC
experiments (production tasks associated with data replication and interactive
analysis on distributed farms) are presented, and the way the framework is used to
compare different decision making algorithms or to optimize the overall Grid
Monitoring a Petabyte Scale Storage System
Fermilab operates a petabyte scale storage system, Enstore, which is the
primary data store for experiments' large data sets. The Enstore system
regularly transfers greater than 15 Terabytes of data each day. It is designed using a
client-server architecture providing sufficient modularity to allow easy addition and
replacement of hardware and software components. Monitoring of this system is
essential to insure the integrity of the data that is stored in it and to maintain
the high volume access that this system supports.
The monitoring of this distributed system is accomplished
using a variety of tools and techniques that present information for use
by a variety of roles (operator, storage system administrator, storage software
All elements of the system are monitored: performance, hardware,
firmware, software, network, data integrity.
We will present details of the deployed monitoring tools with an
emphasis on the different techniques that have proved useful
to each role. Experience with the monitoring tools and techniques,
what worked and what did not will be presented.
Monitoring CMS Tracker construction and data quality using a grid/web service based on a visualization tool
The complexity of the CMS Tracker (more than 50 million channels to monitor) now in
construction in ten laboratories worldwide with hundreds of interested people , will
require new tools for monitoring both the hardware and the software. In our approach
we use both visualization tools and Grid services to make this monitoring possible.
The use of visualization enables us to represent in a single computer screen all
those million channels at once. The Grid will make it possible to get enough data
and computing power in order to check every channel and also to reach the experts
everywhere in the world allowing the early discovery of problems .
We report here on a first prototype developed using the Grid environment already
available now in CMS i.e. LCG2. This prototype consists on a Java client which
implements the GUI for Tracker Visualization and a few data servers connected to the
tracker construction database , to Grid catalogs of event datasets or directly to
test beam setups data acquisition . All the communication between client and servers
is done using data encoded in xml and standard Internet protocols.
We will report on the experience acquired developing this prototype and on possible
future developments in the framework of an interactive Grid and a virtual counting
room allowing complete detector control from everywhere in the world.
Multi-Terabyte EIDE Disk Arrays running Linux RAID5
High-energy physics experiments are currently recording large amounts of data and in a few years will be
recording prodigious quantities of data. New methods must be developed to handle this data and make
analysis at universities possible. Grid Computing is one method; however, the data must be cached at the
various Grid nodes. We examine some storage techniques that exploit recent developments in commodity
hardware. Disk arrays using RAID level 5 (RAID5) include both parity and striping. The striping improves
access speed. The parity protects data in the event of a single disk failure, but not in the case of multiple disk
We report on tests of dual-processor Linux Software RAID5 arrays and Hardware RAID5 arrays using the 12-
disk 3ware controller, in conjunction with 300 GB disks, for use in offline high-energy physics data analysis.
The price of IDE disks is now less than $1/GB. These RAID5 disk arrays can be scaled to sizes affordable to
small institutions and used when fast random access at low cost is important.
(UNIVERSITY OF MISSISSIPPI)
New distributed offline processing scheme at Belle
The Belle experiment has accumulated an integrated
luminosity of more than 240fb-1 so far, and a daily logged
luminosity has exceeded 800pb-1. This requires more
efficient and reliable way of event processing. To meet
this requirement, new offline processing scheme has been
constructed, based upon technique employed for the Belle
online reconstruction farm. Event processing is performed
at PC farms, which consists of 60 quad(0.7GHz) and 225
dual(1.3GHz or 3.2GHz) CPU PC nodes. Raw event data are
read from a Solaris tape server connected to a DTF2 tape
drive, and they are distributed over all PC nodes.
Reconstructed events are recorded onto 8 file servers,
which are newly installed last year. To maximize
processing capabilities, various optimizations such as PC
clustering, job control, output data management and so on
have been done. As a result, processing power with this
scheme has been more than doubled, which corresponds to
that more than 3 fb-1 of beam data per day can be
processed. In this talk, stable operation of our new
system, together with a description of the Belle offline
computing model, will be demonstrated by showing computing
performance obtained from experience in processing beam
On the Management of Certification Authority in Large Scale GRID Infrastructure
The scope of this work is the study of scalability limits of the
Certification Authority (CA), running for large scale GRID environments.
The operation of Certification Authority is analyzed from the view of
the rate of incoming requests, complexity of authentication procedures,
LCG security restrictions and other limiting factors. It is shown, that
standard CA operational model has some native "bottlenecks", which
can be resolved with proper management and technical tools.
The central point is the discussion of "decentralized" scheme with
single CA and multiple authentication agents, called Registration
Authorities (RA). Single CA retains a role for technical center,
responsible for support of GRID security infrastructure, while
general role of RAs is verification of requests from end-users.
Practical implementation of this scheme (including the development
and installation of end-user software) have been done in CERN in 2002
Second implementation of the same ideas was the GRID project of the
Russia Ministry of Atomic Energy, 2003 (http://grid.ihep.su/MAG/).
These two implementations are compared in aspects of security
(INSTITUTE FOR HIGH ENERGY PHYSICS, PROTVINO, RUSSIA)
OptorSim: a Simulation Tool for Scheduling and Replica Optimisation in Data Grids
In large-scale Grids, the replication of files to different sites is an important
data management mechanism which can reduce access latencies and give improved usage
of resources such as network bandwidth, storage and computing power.
In the search for an optimal data replication strategy, the Grid simulator OptorSim
was developed as part of the European DataGrid project. Simulations of various HEP
Grid scenarios have been undertaken using different job scheduling and file
replication algorithms, with the experimental emphasis being on physics analysis
use-cases. Previously, the CMS Data Challenge 2002 testbed and UK GridPP testbed were
among those simulated; recently, our focus has been on the LCG testbed. A novel
economy-based strategy has been investigated as well as more traditional methods,
with the economic models showing distinct advantages in terms of improved resource
Here, an overview of OptorSim's design and implementation is presented with a
selection of recent results, showing its usefulness as a Grid simulator both in its
current features and in the ease of extensibility to new scheduling and replication
(UNIVERSITY OF GLASGOW)
Participation of Russian sites in the Data Challenge of ALICE experiment in 2004
The report presents an analysis of the Alice Data Challenge 2004.
This Data Challenge has been performed on two different distributed
computing environments. The first one is the Alice Environment for
distributed computing (AliEn) used standalone. Presently this
environment allows ALICE physicists to obtain results on simulation,
reconstruction and analysis of data in ESD format for AA and pp
collisions at LHC energies. The second environment is the LCG-2
middleware accessed via AliEn with the help of an interface,
developed at INFN. Three Russian sites have been configured as AliEn
nodes for the Data Challenge. These sites (IHEP at Protvino, ITEP in
Moscow and JINR at Dubna) could run a maximal of 86 jobs. The initial
analysis shows that the architecture of one site was not adequate for
distributed computing. Another farm had nodes with insufficient RAM
for efficient job processing. All these problems have been cured
subsequent DC phases.
Actions have also been taken to reduce the downtime due to wrong site
configuration. The local AliEn server installed at the JINR site has
been used as a standard configuration for the other Russian sites.
The total number of jobs processed in Russia constitute ~2% of total
run in the ALICE DC 2004.
(Joint Institute for Nuclear Research (JINR))
Patriot: Physics Archives and Tools required to Investigate Our Theories
PATRIOT is a project that aims to provide better predictions of
physics events for the high-Pt physics program of Run2 at the
Central to Patriot is an enstore or mass storage repository for files
describing the high-Pt physics predictions. These are typically
stored as StdHep files which can be handled by CDF and
D0 and run through detector and triggering simulations. The
definition of these datasets in the CDF and D0 data handling system
SAM is under way.
Patriot relies heavily on a new generation of Monte Carlo tools
(such as MadEvent, Alpgen, Grappa, CompHEP, etc.) to calculate the
hard structure of high-Pt events and the more venerable event
generators (Pythia and Herwig) to make particle level predictions.
An early informational database, describing the types of data files
stored in Patriot, already exists. A new database is under
In parallel with PATRIOT, we wish to develop the QCD tools that
describe the detailed properties of high-Pt events. Some of the
essential features of particle-level events must be described by non-
perturbative functions, whose form is often constrained by theory,
but which must be ultimately tuned to data.
Performance of an operating High Energy Physics Data grid, D0SAR-grid
The D0 experiment at Fermilab's Tevatron will record several petabytes of data over
the next five years in pursuing the goals of understanding nature and searching for
the origin of mass. Computing resources required to analyze these data far exceed
the capabilities of any one institution. Moreover, the widely scattered
geographical distribution of collaborators poses further serious difficulties for
optimal use of human and computing resources. These difficulties will be
exacerbated in future high energy physics experiments, like those at the LHC. The
computing grid has long been recognized as a solution to these problems. This
technology is being made a more immediate reality to end users by developing a
fully realized grid in the D0 Southern Analysis Region (D0SAR).
D0SAR consists of eleven universities in the Southern US, Brazil, Mexico and India.
The centerpiece of D0SAR is a data and resource hub, a Regional Analysis Center
(RAC). Each D0SAR member institution constructs an Institutional Analysis Center
(IAC), which acts as a gateway to the grid for users within that institution.
These IACs combine dedicated rack-mounted servers and personal desktop computers
into a local physics analysis cluster. D0SAR has been working on establishing an
operational regional grid, D0SAR-Grid, using all available resources within it and
a home-grown local task manager, McFarm.
In this talk, we will describe the architecture of the D0SAR-Grid implementation,
the use and functionality of the grid, and the experiences of operating the grid
for simulation, reprocessing and analysis of data from a currently running HEP
(The University of Mississippi)
Predicting Resource Requirements of a Job Submission
Grid computing provides key infrastructure for distributed problem solving in
dynamic virtual organizations. However, Grids are still the domain of a few highly
trained programmers with expertise in networking, high-performance computing, and
One of the big issues in the full-scale usage of a grid is the matching of the
resource requirements of a job submission to available resources. In order for
resource brokers/job schedulers to ensure efficient use of grid resources, an
initial estimate of the likely resource usage of a submission must be made. In the
context of the Grid Enabled Analysis Environment (GAE), physicists want the ability
to discover, acquire, and reliably manage computational resources dynamically, in
the course of their everyday activities. They do not want to be bothered with the
location of these resources, the mechanisms that are required to use them, keeping
track of the status of computational tasks operating on these resources, or with
reacting to failure. They do care about how long their tasks are likely to run and
how much these tasks will cost.
So the grid scheduler must have the capability to estimate before job submission,
how much time and resources the job will consume on execution site. Our proposed
module, Prediction engine will be part of scheduler and it will provide estimates of
resource use along with the duration of use. This will enable scheduler to choose
the optimum site for job execution.
This paper presents the survey of existing grid schedulers and then based on this
survey states the need for resource usage estimation. Also the architecture and
design of “grid prediction engine” that predicts the resource requirements of a job
submission is discussed.
Production data export and archiving system for new data format of the BaBar experiment.
For The BaBar Computing Group
BaBar has recently moved away from using Objectivity/DB for it's event
store towards a ROOT-based event store. Data in the new format is
produced at about 20 institutions worldwide as well as at SLAC. Among
new challenges are the organization of data export from remote
institutions, archival at SLAC and making the data visible to users
for analysis and import to their own institutions.
The new system is designed to be scalable, easily configurable on the
client and server side and adaptive to server load. It's intergrated
to work with SLAC's mass storage system (HPSS) and with the xrootd
service. Design, implementation and experience with new system, as
well as future development is discussed in this article.
Production Experience of the Storage Resource Broker in the BaBar Experiment
We describe the production experience gained from implementing and using
exclusively the San Diego Super Computer Center developed Storage Resource Broker
(SRB) to distribute the BaBar experiment's production event data stored in ROOT
files from the experiment center at SLAC, California, USA to a Tier A computing
center at ccinp3, Lyon France. In addition we outline how the system can be readily
include more sites.
Production of simulated events for the BaBar experiment by using LCG
The BaBar experiment has been taking data since 1999. In 2001 the computing group
started to evaluate the possibility to evolve toward a distributed computing model in
a Grid environment. In 2003, a new computing model, described in other talks, was
implemented, and ROOT I/O is now being used as the Event Store. We implemented a
system, based onthe LHC Computing Grid (LCG) tools, to submit full-scale MonteCarlo
simulation jobs in this new BaBar computing model framework. More specifically, the
resources of the LCG implementation in Italy, grid.it, are used as computing
elements (CE) and Worker Nodes (WN). A Resource Broker (RB) specific for the Babar
computing needs was installed. Other BaBar requirements, such as the installation and
usage of an object-oriented (Objectivity) Database to read detector conditions and
calibration constants, were accomodated by using non-gridified hardware in a subset
of grid.it sites. The BaBar simulation software was packed and installation on Grid
elements was centrally managed with LCG tools. Sites were geographically mapped to
Objectivity databases, and conditions were read by the WN either locally or remotely.
An LCG User Interface (UI) has been used to submit simulation tests by using standard
JDL commands. The ROOT I/O output files were retrieved from the WN and stored in the
closest Storage Element (SE). Standard BaBar simulation production tools were then
installed on the UI and configured such that the resulting simulated events can be
merged and shipped to SLAC, like in the standard BaBar simulation production setup.
Final validation of the system is being completed. This gridified approach results in
the production of simulated events on geographically distributed resources with a
large throughput and minimal, centralized system maintenance.
(INFN Sezione di Ferrara)
SAMGrid Experiences with the Condor Technology in Run II Computing
SAMGrid is a globally distributed system for data handling and job management,
developed at Fermilab for the D0 and CDF experiments in Run II. The Condor
system is being developed at the University of Wisconsin for management
of distributed resources, computational and otherwise. We briefly review the
SAMGrid architecture and its interaction with Condor, which was presented
earlier. We then present our experiences using the system in production,
which have two distinct aspects.
At the global level, we deployed Condor-G, the Grid-extended Condor, for
the resource brokering and global scheduling of our jobs. At the heart of
the system is Condor's Matchmaking Service. As a more recent work at the
computing element level, we have been benefitting from the large computing
cluster at the University of Wisconsin campus. The architecture of
the computing facility and the philosophy of Condor's resource management
have prompted us to improve the application infrastructure for D0 and CDF,
in aspects such as parting with the shared file system or reliance on
resources being dedicated. As a result, we have increased productivity
and made our applications more portable and Grid-ready. We include some
statistics gathered from our experience. Our fruitful collaboration
with the Condor team has been made possible by the Particle Physics Data Grid.
(FERMI NATIONAL ACCELERATOR LABORATORY)
SAMGrid Monitoring Service and its Integration with MonALisa
The SAMGrid team is in the process of implementing a monitoring and
information service, which fulfills several important roles in the
operation of the SAMGrid system, and will replace the first
generation of monitoring tools in the current deployments. The first
generation tools are in general based on text logfiles and
represent solutions which are not scalable or maintainable. The roles
of the monitoring and information service are: 1) providing
diagnostics for troubleshooting the operation of SAMGrid services; 2)
providing support for monitoring at the level of user jobs; 3)
providing runtime support for local configuration
and other information currently which currently must be stored
centrally (thus moving thesystem toward greater autonomy for the SAM
station services, which include cache management and job management
services); 4) providing intelligent collection of statistics in order
to enable performance monitoring and tuning. The architecture of
this service is quite flexible, permitting input from any
instrumented SAM application or service. It will allow multiple
backend storage for archiving of(possibly) filtered monitoring
events, as well as real time information displays andactive
notification service for alarm conditions. This service will
be able to export, in a configurable manner, information to higher
level Grid monitoring services, such as MonALisa. We describe our
experience to date with using a prototype version together with
(FERMI NATIONAL ACCELERATOR LABORATORY)
SRM AND GFAL TESTING FOR LCG2
Storage Resource Manager (SRM) and Grid File Access Library (GFAL) are GRID
middleware components used for transparent access to Storage Elements. SRM provides a
common interface (WEB service) to backend systems giving dynamic space allocation and
file management. GFAL provides a mechanism whereby an application software can access
a file at a site without having to know which transport mechanism to use or at which
site it is running.
Two separate Test Suites have been developed for testing of SRM interface v 1.1 and
testing against the GFAL file system. Test Suites are written in C and Perl languages.
SRM test suite: a script in Perl generates files and their replicas. These files are
copied to the local SE and registered (published). Replicas of files are made to the
specified SRM site. All replicas are used by the C-program. The SRM functions, such
as get, put, pin, unPin etc. are tested using a program written in C. As SRMs do not
perform file movement operations, the C-program transfers files using
"globus-url-copy". It then compares the data files before and after transfer.
GFAL test suite: as GFAL allows users to access a file in a Storage Element directly
(read and write) without copying it locally, a C-program tests the implementation of
POSIX I/O functions such as open/seek/read/write. A Perl script executes almost all
Unix based commands: dd, cat, cp, mkdir and so on. Also the Perl script launches a
stress test, creating many small files (~5000), nested directories and huge files.
The investigation of interactions between the Replica Manager, the SRM and the file
access mechanism will help making the Data Management software better.
(Institute for High Energy Physics,Protvino,Russia)
Testing the CDF Distributed Computing Framework
To distribute computing for CDF (Collider Detector at Fermilab) a system managing
local compute and storage resources is needed. For this purpose CDF will use the
DCAF (Decentralized CDF Analysis Farms) system which is already at Fermilab. DCAF
has to work with the data handling system SAM (Sequential Access to data via
Metadata). However, both DCAF and SAM are mature systems which have not yet been
used in combination, and on top of this DCAF has only been installed at Fermilab and
not on local sites. Therefore tests of the systems are necessary to test the
interplay of the data handling with the farms, the behaviour of the off-site DCAFs
and the user friendliness of the whole system. The tests are focussed on the main
tasks of the DCAFs, like Monte Carlo generation and stores, as well as the readout
of data files and connected data handling. To achieve user friendliness the SAM
station environment has to be common to all stations and adaptations to the
environment have to be made.
The ATLAS Computing Model
The ATLAS Computing Model is under continuous active development.
Previous exercises focussed on the Tier-0/Tier-1 interactions, with
an emphasis on the resource implications and only a high-level view
of the data and workflow. The work presented here considerably
revises the resource implications, and attempts to describe in some
detail the data and control flow from the High Level Trigger farms
all the way through to the physics user. The model draws from the
experience of previous and running experiments, but will be tested in
the ATLAS Data Challenge 2 (DC2, described in other abstracts) and in
the ATLAS Combined Testbeam exercises.
An important part of the work is to devise the measurements and
tests to be run during DC2. DC2 will be nearing completion in
September 2004, and the first assessments of the performance of the
computing model in scaled slice tests will be presented.
The BABAR Analysis Task Manager
The new BaBar bookkeeping system comes with tools to directly support
data analysis tasks. This Task Manager system acts as an interface
between datasets defined in the bookkeeping system, which are used as
input to analyzes, and the offline analysis framework. The Task
Manager organizes the processing of the data by creating specific jobs
to be either submitted to a batch system, or run in the background on a
local desktop, or laptop. The current system has been designed to
support pbs and lsf batch systems. Changes to defined datasets due
production is directly supported by the Task Manager, where new
collections that add to a dataset or replace other collections are
automatically detected, allowing an analysis at any time to be
up-to-date with the latest available data. The output of tasks,
whether new data collections, ntuple/hbook files, or text files, can be
put back into a collections bookkeeping system or stored in the private
Task Manager database. Currently MySQL and Oracle relational databases
are supported. The BABAR Task Manager has been in use for data
production since January this year, and the schema of the working
system will be presented.
(Stanford Linear Accelerator Center)
The D0 Virtual Center and planning for large scale computing
The D0 experiment relies on large scale computing systems to achieve her
physics goals. As the experiment lifetime spans, multiple generations of
computing hardware, it is fundemental to make projective models in to use
available resources to meet the anticipated needs. In addition, computing
resources can be supplied as in-kind contributions by collaborating
institutions and countries, however, such resources typically require
scheduling, thus adding another dimension for planning. In addition, to
avoid over-subscription of the resources, the experiment has to be educated
on the limitations and trade-offs for various computing activities to enable
the management to prioritze. We present the metrics and mechanisms used for
planning and discuss the uncertainties and unknowns, as well as some of the
mechanisms for communicating the resource load to the stakeholders.
In order to correctly account for in-kind contributions of remote computing,
D0 uses the concept of a Virtual Center, in which all of the costs are
estimated as if the computing were located at solely at FNAL. In contrast
to other such models in common use, D0 accounts for contributions based on
computer usage rather than strictly on money spend on hardware. This gives
incentive to acheive the maximum efficiency of the systems as well as
encouraging active participation in the computing model by collaborating
instititions. This method of operation leverages a common tool and
infrastructure base for all production-type activites.
(FERMI NATIONAL ACCELERATOR LABORATORY)
The deployment mechanisms for the ATLAS software.
One of the most important problems in software management of a very
large and complex project such as Atlas is how to deploy the software
on the running sites. By running sites we include computer sites
ranging from computing centers in the usual sense down to individual
laptops but also the computer elements of a computing grid
organization. The deployment activity consists in constructing a well
defined representation of the states of the working software (known as
releases), and transporting them to the target sites, in such a way
that the installation process can be entirely automated and can take
care of discovering the context and adapting itself to it. A set of
tools based on both CMT - the basic configuration management tool of
ATLAS - and Pacman has been developed. The resulting mechanism now
supports the systematic production of distribution kits for various
binary conditions of every release, the partial or complete automatic
installation of kits on any site and the running of test suites to
validate the installed kits. This mechanism is meant to be fully
compliant with the Grid requirements and has been tested in the
context of LCG. Several issues related with the constraints on the
target system, or with the incremental updates of the installation
still need to be studied and will be discussed.
The LCG-AliEn interface, a realization of a MetaGrid system
AliEn (ALICE Environment) is a GRID middleware developed and used in the context of ALICE, the CERN LHC
heavy-ion experiment. In order to run Data Challenges exploiting both AliEn “native” resources and any
infrastructure based on EDG-derived middleware (such as the LCG and the Italian GRID.IT), an interface
system was designed and implemented; some details of a prototype were already presented at CHEP2003. In
the spring of 2004 an ALICE Data Challenge began with the simulated data production on this multiple
infrastructure, thus qualifying as the first large production carried out transparently making use of very
different middleware system. This system is a practical realisation of the “federated” or “meta-” grid concept,
and it has been successfully tested in a very large production. This talk reports about new developments of
the interface system, the successful DC running experience, the advantages and limitations of this concept,
the plans for the future and some lessons learned.
The role of legacy services within ATLAS DC2
This paper presents an overview of the legacy interface provided for
the ATLAS DC2 production system. The term legacy refers to any
non-grid system which may be deployed for use within DC2. The
reasoning behind providing such a service for DC2 is twofold in
nature. Firstly, the legacy interface provides a backup solution
should unforeseen problems occur while developing the grid based
interfaces. Secondly, this system allows DC2 to use resources which have yet to
deploy grid software, thus increasing the available computing power
for the Data Challenge.
The aim of the legacy system is to provide a simple framework which is
easily adaptable to any given computing system. Here the term
computing system refers to the batch system provided at a given site
and also to the structure of the computing and storage systems at that
site. The legacy interface provides the same functionality as the grid
based interfaces and is deployed transparently within the DC2
production system. Following the push-pull model implemented for DC2
the system pulls jobs from a production database and pushes them onto
a gives computing/batch system.
In a world which is becoming increasingly grid orientated this project
allows us to evaluate the role of non-grid solutions in dedicated
production environments. Experiences, both good and bad, gained during
DC2 are presented and the future of such systems is discussed.
Tools for GRID deployment of CDF offline and SAM data handling systems for Summer 2004 computing.
The Fermilab CDF Run-II experiment is now providing official support for
remote computing, expanding this to about 1/4 of the total CDF computing
during the Summer of 2004.
I will discuss in detail the extensions to CDF software distribution
and configuration tools and procedures, in support of CDF GRID/DCAF
computing for Summer 2004. We face the challenge of unreliable networks,
time differences, and remote managers with little experience with
this particular software.
We have made the first deployment of the SAM data handling system
outside its original home in the D0 experiment.
We have deployed to about 20 remote CDF sites.
We have created light weight testing and monitoring tools
to assure that these sites are in fact functional when installed.
We are distributing and configuring both client code within CDF code releases,
and the SAM servers to which the clients connect.
Procedures which once took days are now performed in minutes.
These tools can be used to install SAM servers for D0 and other experiments.
Networks permitting, we will give a live SAM installation demonstration.
We have separated the data handling components from the main CDF offline
code releases by means of shared libraries, permitting live upgrades
to otherwise frozen code.
We now use a special 'development lite' release to ensure that all sites
have the latest tools available.
We have put subtantial effort into revision control,
so that essentially all active CDF sites are running exactly the same code.
Toward a Grid Technology Independent Programming Interface for HEP Applications
In the High Energy Physics (HEP) community, Grid technologies have
been accepted as solutions to the distributed computing problem.
Several Grid projects have provided software in the last years. Among
of all them, the LCG - especially aimed at HEP applications -
provides a set of services and respective client interfaces, both in
the form of command line tools as well as programming language APIs
in C, C++, Java, etc.
Unfortunately, the programming interface presented to the end user
(the physicist) is often not uniform or provides different levels of
abstractions. In addition, Grid technologies face a constant change
and an improvement process and it is of major importance to shield
changes of underlying technology to the end users. As services
evolve and new ones are introduced, the way users interact with them
These new interfaces are often designed to work at a different level
and with a different focus than the original ones. This makes it hard
for the end user to build Grid applications.
We have analyzed the existing LCG programming environment and
identified several ways to provide high-level technology independent
interfaces. In this article, we describe the use cases we were
presented by the LCG experiments and the specific problems we
encountered in documenting existing APIs and providing
usage examples. As a main contribution, we also propose a prototype
high-level interface for the information, authentication and
authorization systems that is now under test on the LCG EIS testbed
by the LHC experiments.
Usage of ALICE Grid middleware for medical applications
Breast cancer screening programs require managing and accessing a
huge amount of data, intrinsically distributed, as they are collected
in different Hospitals. The development of an application based on
Computer Assisted Detection algorithms for the analysis of digitised
mammograms in a distributed environment is a typical GRID use case.
In particular, AliEn (ALICE Environment) services, whose development
was carried on by the ALICE Collaboration, were used to configure a
dedicated Virtual Organisation; a PERL-based interface to AliEn
commands allows the registration of new patients and mammograms in
the AliEn Data Catalogue as well as queries to retrieve images
associated to selected patients. The analysis of selected mammograms
can be performed interactively, making use of PROOF services, or
taking advantage of the AliEn capabilities to generate "sub-jobs";
each of them analyzes the fraction of the selected sample stored on a
site, and the results are merged. All the required functionality is
available: by the end of 2004 a working prototype is foreseen, with
an AliEn Client installed in each of the Hospitals participating to
the INFN-funded MAGIC-5 project.
The same approach will be applied in the near future in two other
- Lung cancer screening, equivalent to the mammographic screening
from the middleware point of view, where Computer Assisted Detection
algorithms are being developed;
- Diagnosis of the Alzheimer disease, where the application is
intrinsically distributed: it should, in fact, compare the PET-
generated image to a set of reference images which are scattered on
many sites and merge the results.
Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world.
The Nordic Grid facility (NorduGrid) came into production operation during
the summer of 2002 when the Scandinavian Atlas HEP group started to use
the Grid for the Atlas Data Challenges and was thus the first Grid ever
contributing to an Atlas production. Since then, the Grid facility has
been in continuous 24/7 operation offering an increasing number of
resources to a growing set of active users coming from various scientific
areas including chemistry, biology, informatics. As of today the Grid has
grown into one of the largest production Grids of the world continuously
running Grid jobs on the more than 30 Grid-connected sites which offer
over 2000 CPUs.
This article will start with a short overview of the design and
implementation of the Advanced Resource Connector (ARC), the NorduGrid
middleware, which delivers reliable Grid services to the NorduGrid
production facility. This will be followed by a presentation of the
logging facility of NorduGrid, describing the logging service and the
collected information. The main part of the talk will focus on the
analysis of the collected logging information: usage statistics, usage
patterns (what is a typical grid job on the NorduGrid looks like?). Use
cases from different application domains will also be discussed.
-NorduGrid live: www.nordugrid.org -> Grid Monitor
-Atlas Data-Challenge 1 on NorduGrid: http://arxiv.org/abs/physics/0306013
(Lund University, Sweden)
Using Tripwire to check cluster system integrity
Expansion of large computing fabrics/clusters throughout the world would
create a need for stricter security. Otherwise any system could suffer damages
such as data loss, data falsification or misuse.
Perimeter security and intrusion detection system (IDS) are the two main
aspects that must be taken into account in order to achieve system security.
The main target of an intrusion detection system is early detection in
the previously mentioned cases, as a way to minimize any damage in data
contained in the system.
Tripwire is one of the most powerful IDSs and is widely used as a
security tool by the community of network administrators. Tripwire is
oriented to monitor the status of files and directories, being
able to detect the lightest change suffered by them.
At Ciemat, Tripwire has been used to monitor our local clusters, involved
in GRID projects such as implementation of LCG prototypes, to
guarantee the integrability of data generated, and stored there. It is
used as well to monitor any modificacion of operating system files and
any other scientific core software.
XTNetFile, a fault tolerant extension of ROOT TNetFile
This paper describes XTNetFile, the client side of a project
conceived to address the high demand data access needs of modern
physics experiments such as BaBar using the ROOT framework. In this
context, a highly scalable and fault tolerant client/server
architecture for data access has been designed and deployed which
allows thousands of batch jobs and interactive sessions to
effectively access the data repositories basing on the XROOTD data
server, a complex extension of the rootd daemon. The majority of the
communication problems are handled by the design of the client/server
mechanism and the communication protocol.
This allows us to build distributed data access systems which are
highly robust, load balanced and scalable to an extent which
allows 'no jobs to fail'.
Furthermore XTNetFile ensures backward compatibility with the 'old'
rootd server by using same API as the existing ROOT TFile/TNetFile
The code is designed with a high degree of modularity that allows to
build other interfaces, such as administrative tools, based on the
same communication layer. In addition the client plugin can also be
used to readother types of (non-ROOT I/O) data files, providing the
Plenary: Session 6Kongress-Saal
Evolution and Revolution in the Design of Computers Based on Nanoelectronics
Today's computers are roughly a factor of one billion less efficient at doing their
job than the laws of fundamental physics state that they could be. How much of this
efficiency gain will we actually be able to harvest? What are the biggest obstacles
to achieving many orders of magnitude improvement in our computing hardware, rather
that the roughly factor of two we are used to seeing with each new generation of
chip? Shrinking components to the nanoscale offers both potential advantages and
severe challenges. The transition from classical mechanics to quantum mechanics is
a major issue. Others are the problems of defect and fault tolearance: defects are
manufacturing mistakes or components that irreversibly break over time and faults
are transient interuptions that occur during operation. Both of these issues become
bigger problems as component sizes shrink and the number of components scales up
massively. In 1955, John von Neumann showed that a completely general approach to
building a reliable machine from unreliable components would require a redundancy
overhead of at least 10,000 - this would completely negate any advantages of
building at the nanoscale. We have been examining a variety of defect and fault
tolerant techniques that are specific to particular structures or functions, and are
vastly more efficient for their particular task than the general approach of von
Neumann. Our strategy is to layer these techniques on top of each other to achieve
high system reliability even with component reliability of no more than 97% or so,
and a total redudancy of less than 3. This strategy preserves the advantages of
nanoscale electronics with a relatively modest overhead.
Video in CDS
Enterasys - Networks that Know
Today and in the future businesses need an intelligent network.
And Enterasys has the smarter solution. Our active network uses a combination of
context-based and embedded security technologies -
as well as the industry’s first automated response capability
- so it can manage who is using your network.
Our solution also protects the entire enterprise - from the
edge, through the distribution layer, and into the core of
the network. Threats are recognized and isolated at the
user level, rather than taking your entire network down.
It even has the ability to coexist with and enhance your
legacy data networking infrastructure and existing security
appliances - regardless of the vendor. By continually offering
a context-based analysis of network traffic, our solution
allows you to see not only what the problem is, but also
where it is, and who caused it. And, with the industry's most
advanced controls, we're the first solution that's able to
resolve threats across the entire network - dynamically
or on demand.
Video in CDS
The IBM Research Global Technology Outlook
The Global Technology Outlook (GTO) is IBM Research’s projection of the
future for information technology (IT). The GTO identifies progress and
trends in key indicators such as raw computing speed, bandwidth, storage,
software technology, and business modeling. These new technologies have the
potential to radically transform the performance and utility of tomorrow's
information processing systems and devices, ultimately creating new levels
of business value.
Video in CDS
Operation of the CERN Managed Storage environment; current status and future directions
This paper discusses the challenges in maintaining a stable Managed Storage Service
for users built upon dynamic underlying disk and tape layers.
Early in 2004 the tools and techniques used to manage disk, tape, and stage servers
were refreshed in adopting the QUATTOR tool set. This has markedly increased the
coherency and efficiency of the configuration of data servers. The LEMON
monitoring suite was deployed to raise alarms and gather performance metrics.
Exploiting this foundation, higher level service displays are being added, giving
comprehensive and near-real-time views of operations. The scope of our monitoring
has been broadened to include low-level machine sensors such as thermometer, IPMI
and SMART readings, improving our ability to detect impending hardware failure.
In terms of operations, widespread disk reliability problems which were manpower
intensive to chase, were overcome by exchanging a bad batch of 1200 disks. Recent
LHC data challenges have ventured into new operating domains for the CASTOR system,
with massive disk resident file catalogues requiring special handling. The tape
layer has focused on STK 9940 drives for bulk recording capacity: a large scale
data migration to this media permitted old drive technologies to be retired.
Repacking 9940A data to 9940B high density media allows us to recycle tapes, giving
substantial savings by avoiding acquisition of new media.
In addition to more robust software, hardware developments are required for LHC era
services. We are moving from EIDE to SATA based disk storage and envisage a tape
drive technology refresh. Details will be provided of our investigations in these
The status of Fermilab Enstore Data Storage System
Fermilab has developed and successively uses Enstore Data Storage
System. It is a primary data store for the Run II Collider Experiments,
as well as for the others. It provides data storage in robotic tape libraries
according to requirements of the experiments. High fault tolerance and
availability, as well as multilevel priority based request processing
allows experiments to effectively store and access data stored in the
Enstore, including storing raw data from data acquisition systems.
The distributed structure and modularity of Enstore allow to scale
the system and add more storage equipment as the requirements grow.
Currently Fermilab Data Storage System storage system Enstore includes
5 robotic tape libraries, 96 tape drives of different type. Amount of data
stored in the system is ~1.7 PetaBytes. Users access Enstore directly using
a special command. They also can use ftp, grid ftp, SRM interfaces to dCache
system, that uses Enstore as its lower layer storage.
(FERMI NATIONAL ACCELERATOR LABORATORY, USA)
Disk storage technology for the LHC T0/T1 centre at CERN
By 2008, the T0/T1 centre for the LHC at CERN is estimated to use
about 5000 TB of disk storage. This is a very significant increase
over the about 250 TB running now. In order to be affordable, the
chosen technology must provide the required performance and at the
same time be cost-effective and easy to operate and use.
We will present an analysis of the cost (both in terms of material
and personnel) of the current implementation (network-attached
storage), and then describe detailed performance studies with hardware
currently in use at CERN in different configurations of filesystems
on software or hardware RAID arrays over disks. Alternative
technologies that have been evaluated by CERN in varying depth (such
as arrays of SATA disks with a Fiber Channel uplink, distributed disk
storage across worker nodes, iSCSI solutions, SANFS, ...) will be
discussed. We will conclude with an outlook of the next steps to be
taken at CERN towards defining the future disk storage model.
64-Bit Opteron systems in High Energy and Astroparticle Physics
64-Bit commodity clusters and farms based on AMD technology meanwhile have been
proven to achieve a high computing power in many scientific applications. This report
first gives a short introduction into the specialties of the amd64 architecture and
the characteristics of two-way Opteron systems.
Then results from measuring the performance and the behavior of such systems in
various Particle Physics applications as compared to the classical 32-Bit systems
are presented. The investigations cover analysis tools like ROOT, Astrophysics
simulations based on CORSIKA and event reconstruction programs. Another field of
investigations are parallel high performance clusters for Lattice QCD
calculations, and n-loop calculations based on perturbative methods in quantum field
theory using the formula manipulation program FORM.
In addition to the performance results the compatibility of 32- and 64-Bit
architectures and Linux operating system issues, as well as the impact on fabric
management are discussed.
It is shown that for most of the considered applications the recently available
64-bit commodity computers from AMD are a viable alternative to comparable 32-Bit
CERN's openlab for Datagrid applications
For the last 18 months CERN has collaborated closely with several industrial partners
to evaluate, through the opencluster project, technology that may (and hopefully
will) play a strong role in the future computing solutions, primarily for LHC but
possibly also for other HEP computing environments. Unlike conventional field testing
where solutions from industry are evaluated rather independently, the openlab
principle is based on active collaboration between all partners, with the common goal
of constructing a coherent system.
The talk will discuss our experience to date with the following hardware
- 64-bit computing (in our case represented by the Itanium processor). This
include the porting of applications and Grid software to 64 bits.
- Rack mounted servers
- The use of 10 Gbps Ethernet for both LAN and WAN connectivity
- An iSCSI-based Storage System that promises to scale to Petabyte dimensions
- The use of 10 Gbps Infiniband as a cluster interconnect
On the software side we will review our experience with the latest grid-enabled
release of Oracle, the so-called release "10g".
The talk will review the results obtained so far, either in stand alone tests or as
part of the larger LCG testbed, and it will describe the plans for the future in this
three-year collaboration with industry.
InfiniBand for High Energy Physics
Distributed physics analysis techniques as provided by the rootd
and proofd concepts require a fast and efficient interconnect between
the nodes. Apart from the required bandwidth the latency of message
transfers is important, in particular in environments with many nodes.
Ethernet is known to have large latencies, between 30 and 60 micro seconds for
the common Giga-bit Ethernet.
The InfiniBand architecture is a relatively new, open industry standard.
It defines a switched high-speed, low-latency fabric designed to connect compute
and I/O nodes with copper or fibre cables. The theoretical bandwidth
is up to 30 Gbit/s. The Institute for Scientific Computing (IWR) at the
Forschungszentrum Karlsruhe is testing InfiniBand technology since begin of 2003,
and has a cluster of
dual Xeon nodes using the 4X (10 Gbit/s) version of the interconnect.
Bringing the RFIO protocol - which is part of the CERN CASTOR
facilities for sequential file transfers - to InfiniBand has been
a big success, allowing significant reduction of CPU consumption
and increase of file transfer speed.
A first prototype of a direct interface to InfiniBand for the root
toolkit has been designed and implemented.
Experiences with hard- and software, in particular MPI performance results, will be
The methods and first performance results on rfio and root will be
shown and compared to other fabric technologies like Ethernet.
CASTOR: Operational issues and new Developments
The Cern Advanced STORage (CASTOR) system is a scalable high throughput
hierarchical storage system developed at CERN. CASTOR was first deployed
for full production use in 2001 and has expanded to now manage around two
PetaBytes and almost 20 million files. CASTOR is a modular system,
providing a distributed disk cache, a stager, and a back end tape archive,
accessible via a global logical name-space.
This paper focuses on the operational issues of the system currently in
production, and first experiences with the new CASTOR stager which has
undergone a significant redesign in order to cope with the data handling
challenges posed by the LHC, which will be commissioned in 2007.
The design target for the new stager was to scale to another order of
magnitude above the current CASTOR, namely to be able to sustain peak
rates of the order of 1000 file open requests per second for a PetaByte
disk pool. The new developments have been inspired by the problems which
arose managing massive installations of commodity storage hardware. The
farming of disk servers poses new challenges to the disk cache management:
request scheduling; resource sharing and partitioning; automated
configuration and monitoring; and fault tolerance of unreliable hardware
Management of the distributed component based CASTOR system across a large
farm, provides an ideal example of the driving forces for the development
of automated management suites. Quattor and Lemon frameworks naturally
address CASTOR's operational requirements, and we will conclude by
describing their deployment on the masstorage systems at CERN.
dCache, LCG Storage Element and enhanced use cases
The dCache software system has been designed to manage a
huge amount of individual disk storage nodes and let them
appear under a single file system root. Beside a variety
of other features, it supports the GridFtp dialect, implements
the Storage Resource Manager interface (SRM V1) and can be linked
against the CERN GFAL software layer. These abilities makes
dCache a perfect Storage Element in the context of LCG and
possibly future grid initiatives as well.
During the last year, dCache has been deployed at dozens of
Tier-I and Tier-II centers for the CMS and CDF experiments in
the US and Europe, including Fermilab, Brookhaven, San Diego,
Karlsruhe and CERN. The largest implementation, the CDF system
at FERMI, provides 150 TeraBytes of disk space and delivers up
to 50 TeraBytes/day to its clients.
Sites using the LCG dCache distribution are more or less
operating the cache as black box and little knowledge is
available about customization and enhanced features.
This presentation is therefor intended to make non dCache users
curious and enable dCache users to better integrate dCache into
their site specific environment. Beside many other topics,
paper will touch on the possibility of dCache to closely cooperate
with tertiary storage systems, like Enstore, Tsm and HPSS. It
will describe the way dCache can be configured to attach
different pool nodes to different user groups but let them all
use the same set of fall back pools. We will explain how dCache
takes care of dataset replication, either by configuration or by
automatic detection of data access hot spots. Finally we will
report on ongoing development plans.
Storage Resource Manager
Storage Resource Managers (SRMs) are middleware components whose function is to
provide dynamic space allocation and file management on shared storage components on
the Grid. SRMs support protocol negotiation and reliable replication mechanism. The
SRM standard allows independent institutions to implement their own SRMs, thus
allowing for a uniform access to heterogeneous storage elements. SRMs leave the
policy decision to be made independently by each implementation at each site.
Resource Reservations made through SRMs have limited lifetimes and allow for
automatic collection of unused resources thus preventing clogging of storage systems
with "forgotten" files.
The storage systems can be classified on basis of their longevity and persistence of
their data. Data can also be temporary or permanent. To support these notions, SRM
defines Volatile, Durable and Permanent types of files and spaces. Volatile files can
be removed by the system to make space for new files upon the expiration of their
lifetimes. Permanent files are expected to exist in the storage system for the
lifetime of the storage system. Finally Durable files have both the lifetime
associated with them and a mechanism of notification of owners and administrators of
lifetime expiration, but cannot be deleted automatically by the system and require
Fermilab's data handling system uses the SRM management interface, the dCache
Distributed Disk Cache and the Enstore Tape Storage System as key components to
satisfy current and future user requests.
Storage Resource Manager specification is a result of international collaborative
effort by representatives of JLAB, LBNL, FNAL, EDG-WP2 and EDG-WP5.
(FERMI NATIONAL ACCELERATOR LABORATORY)
SRB system at Belle/KEK
The Belle experiment has accumulated an integrated luminosity of more than 240fb-1
so far, and a daily logged luminosity now exceeds 800pb-1. These numbers correspond
to more than 1PB of raw and processed data stored on tape and an accumulation of
the raw data at the rate of 1TB/day. The processed, compactified data, together
with Monte Carlo simulation data for the final physics analyses amounts to more
than 100TB. The Belle collaboration consists of more than 55 institutes in 14
countries and at most of the collaborating institutions, active physics data
analysis programs are being undertaken. To meet these storage and data distribution
demands, we have tried to adopt a resource broker, SRB. We have installed the SRB
system at KEK, Australia, and other collaborating institutions and have started to
share data. In this talk, experiences with the SRB system will be discussed and the
performance of the system when used for data processing and physics analysis of the
Belle experiment will be demonstrated.
(HIGH ENERGY ACCELERATOR RESEARCH ORGANIZATION)
StoRM: grid middleware for disk resource management
Within a Grid the possibility of managing storage space is fundamental, in
particular, before and during application execution. On the other hand, the
increasing availability of highly performant computing resources raises the need for
fast and efficient I/O operations and drives the development of parallel distributed
file systems able to satisfy these needs granting access to distributed storage.
The demand of POSIX compliant access to storage and the need to have a uniform
interface for both Grid integrated and pure vanilla applications stimulate developers
to investigate the possibility to integrate already existing filesystems into a Grid
infrastructure, allowing users to take advantage of storage resources without being
forced to change their applications.
This paper describes the design and implementation of StoRM, a storage resource
manager (SRM) for disk only. Through StoRM an application can reserve and manage
space on disk storage systems. It can then access the space either in a Grid
environment or locally in a transparent way via classic POSIX calls.
The StoRM architecture is based on a pluggable model in order to easily add new
functionalities. The StoRM implementation uses now filesystems such as GPFS or
LUSTRE. The StoRM prototype includes space reservation functionalities that
complement SRM space reservation to allow applications to directly access/use the
managed space trough POSIX calls. Moreover, StoRM includes quota management
and a space guard. StoRM will serve as policy enforcement point (PEP) for the Grid
Policy Management System over disk resources. The experimental results obtained are
The SAMGrid Database Server Component: Its Upgraded Infrastructure and
Future Development Path
The SAMGrid Database Server encapsulates several important services, such as
accessing file metadata and replica catalog, keeping track of the processing
information, as well as providing the runtime support for SAMGrid station
services. Recent deployment of the SAMGrid system for CDF has resulted in
unification of the database schema used by CDF and D0, and the complexity
of changes required for the unified metadata catalog has warranted a
complete redesign of the DB Server.
We describe here the architecture and features of the new server. In particular,
we discuss the new CORBA infrastructure that utilizes python wrapper classes
around IDL structs and exceptions. Such infrastructure allows us to
use the same code on both server and client sides, which in turn results
in significantly improved code maintainability and easier development.
We also discuss future integration of the new server with an SBIR II
project which is directed toward allowing the dbserver to access distributed
databases, implemented in different DB systems and possibly using different
Core SoftwareBrunig 1 + 2
Brunig 1 + 2
LCIO persistency and data model for LC simulation and reconstruction
LCIO is a persistency framework and data model for the next linear
collider. Its original implementation, as presented at CHEP 2003,
was focused on simulation studies. Since then the data model has
been extended to also incorporate prototype test beam data,
reconstruction and analysis. The design of the interface has also
been simplified. LCIO defines a common abstract user interface (API)
in Java, C++ and Fortran in order to fulfill the needs of the global
linear collider community. It is designed to be lightweight and
flexible without introducing additional dependencies on other software
User code is completely separated from the concrete persistency
implementation. SIO, a simple binary format that supports data
compression and pointer retrieval is the current choice. LCIO is
implemented in such a way that it can also be used as the transient
data model in any linear collider application, e.g. a modular
can use the LCIO event class (LCEvent) as the container for the
modules' input and output data. As LCIO offers a common API for three
languages it is also possible to construct a multi-language
reconstruction framework that would facilitate the integration of
already existing algorithms.
A number of groups has already incorporated LCIO in their software
frameworks and others plan to do so.
We present the design and implementation of LCIO, focusing on new
developments and uses.
POOL Development Status and Plans
The LCG POOL project is now entering the third year of active development. The basic functionality of the
project is provided but some functional extensions will move into the POOL system this year. This
presentation will give a summary of the main functionality provided by POOL, which used in physics
productions today. We will then present the design and implementation of the main new interfaces and
components planned such as the POOL RDBMS abstraction layer and the RDBMS based Storage Manager back-
(CERN IT/DB & LCG POOL PROJECT)
POOL Integration into three Experiment Software Frameworks
The POOL software package has been successfully integrated with the three large experiment software
frameworks of ATLAS, CMS and LHCb. This presentation will summarise the experience gained during these
integration efforts and will try to highlight the commonalities and the main differences between the
integration approaches. In particular we’ll discuss the role of the POOL object cache, the choice of the main
storage technology in ROOT (tree or named objects) and approaches to collection and catalogue integration.
Recent Developments in the ROOT I/O
Since version 3.05/02, the ROOT I/O System has gone through
In particular, the STL container I/O has been upgraded to support
splitting, reading without existing libraries and using directly from
TTreeFormula (TTree queries).
This upgrade to the I/O system is such that it can be easily extended
(even by the users) to support the splitting and querying of almost
any collections. The ROOT TTree queries engine has also been enhanced
in many ways including an increase performance, better support for
array printing and histograming, addition of the ability to call any
external C or C++ functions, etc.
We improved the I/O support for classes not inheriting from TObject,
including support for automatic schema evolution without using an
explicit class version. ROOT now support generating files larger than
2Gb. We also added plugins for several of the mass storage servers
(Castor, DCache, Chirp, etc.).
We will describe in details these new features and their implementation.
XML I/O in ROOT
Till now, ROOT objects can be stored only in a binary ROOT specific file format.
Without the ROOT environment the data stored in such files are not directly
accessible. Storing objects in XML format makes it easy to view and edit (with some
restriction) the object data directly. It is also plausible to use XML as exchange
format with other applications. Therefore XML streaming has been implemented in
ROOT. Any object which is in the ROOT dictionary can be stored/retrieved in XML
format. Two layouts of object representation in XML are supported: class-dependent
and generic. In the first case all XML tag names are derived from class and member
names. To avoid name intersections, XML namespaces for each class are used. A
Document Type Definition (DTD) file is automatically generated for each class (or
set of classes). It can be used to validate the structure of the XML document. The
generic layout of XML files includes tag names like "Object", "Member", "Item" and
so on. In this case the DTD is common for all produced XML files. Further
development is required to provide tools for accessing created XML files from other
applications like: pure C++ code without ROOT libraries and dictionaries, Java and
The FreeHEP Java Library Root IO package
The FreeHEP Java library contains a complete implementation
of Root IO for Java. The library uses the "Streamer Info" embedded in files created
by Root 3.x to dynamically create high performance Java proxies for Root objects,
making it possible to read any Root file, including files with user defined
objects. In this presentation we will discuss the status of this code, explain its
implementation and demonstrate performance using benchmark comparisons to standard
Root IO. We will also describe recently added support for reading files remotely
using rootd and xrootd protocols.
We will also show some uses of this library, including using JAS3 to analyze Root
data, using the WIRED event display to visualize data from Root files and using
rootd and Java servlet technology to make live plots web accessible - with examples
from GLAST and BaBar. We will also explain how you can trivially make your own root
data web-accessible using the AIDA Tag Library and Jakarta Tomcat.
EventStore: Managing Event Versioning and Data Partitioning using Legacy Data Formats
HEP analysis is an iterative process. It is critical that in each iteration the physicist's analysis job accesses the
same information as previous iterations (unless explicitly told to do otherwise). This becomes problematic
after the data has been reconstructed several times. In addition, when starting a new analysis, physicists
normally want to use the most recent version of reconstruction. Such version control is useful for data
managed by a single physicist using a laptop or small groups of physicists at a remote institution in addition
to the collaboration wide managed data.
In this presentation we will discuss our implementation of the EventStore which uses a data location, indexing
and versioning service to manage legacy data formats (e.g. an experiment's existing proprietary file format or
Root files). A plug-in architecture is used to support adding additional file formats. The core of the system is
used to implement three different sizes of services: personal, group and collaboration.
ATLAS Metadata Interfaces (AMI) and ATLAS Metadata Catalogs
The ATLAS Metadata Interface (AMI) project provides a set of generic
tools for managing database applications. AMI has a three-tier
architecture with a core that supports a connection to any RDBMS
using JDBC and SQL. The middle layer assumes that the databases have
an AMI compliant self-describing structure. It provides a generic
web interface and a generic command line interface. The top layer
contains application specific features. The principal uses of AMI
are the ATLAS Data Challenge dataset bookkeeping catalogs, and Tag
Collector, a tool for release management.
The first AMI Web service client was introduced in early 2004. It
offers many advantages over earlier clients because:
- Web services permit multi-language and multi-operating system
- The user interface is very effectively de-coupled from the
Most upgrades can be implemented on the server side; no
redistribution of client software is needed. In 2004 this client
will be used for the ATLAS Data Challenge 2, for the ATLAS
combined test beam offline bookkeeping, and also in the first
prototypes of ARDA compliant analysis interfaces.
CMS Detector Description: New Developments
The CMS Detector Description Database (DDD) consists of a C++ API and an XML based
detector description language. DDD is used by the CMS simulation (OSCAR),
reconstruction (ORCA), and visualization (IGUANA) as well by test beam software that
relies on those systems. The DDD is a sub-system within the COBRA framework of the
CMS Core Software. Management of the XML is currently done using a separate Geometry
project in CVS.
We give an overview of the DDD integration and report on recent developments
concerning detector description in CMS software:
* The ability of client software to describe sub-detectors by providing an algorithm
plug-in in C++ based on SEAL plug-in facilities. A typical algorithm plug-in makes
use of the DDD API to describe detector properties. Through the API seamless access
to data defined via the XML description language is ensured.
* An Oracle schema was recently developed and the database populated by a DDD
application. The geometrical structure of the detector is seen as a skeleton to which
conditions or configuration data can be attached.
* A C++ streaming mechanism to output the geometry as binary files was developed.
This representation can be read into memory much more rapidly than the XML files can
The DDD API shields clients from each of the possible input sources. Even the
simultaneous use of several different input sources is possible through various
configuration options in the framework COBRA.
(UNIVERSITY OF CALIFORNIA, DAVIS)
Addressing the persistency patterns of the time evolving HEP data in
the ATLAS/LCG MySQL Conditions Databases
The size and complexity of the present HEP experiments
represents an enormous effort in the persistency of data. These efforts imply
a tremendous investment in the databases field not only for the event data
but also for data that is needed to qualify this one - the Conditions Data.
In the present document we'll describe the strategy for addressing the
Conditions data problem in the ATLAS experiment, focusing in the ConditionsDB
MySQL for the ATLAS/LCG project. The need for a persistent engine for
structured conditions data has motivated the studies for an relational backend
that maps transient structured objects in the relational database persistent
engine. This paper illustrate the proposal for the storage of Conditions data
in the LCG framework using it both to store only the Interval Of Validity
(IOV) and a reference that represents the 'path' to an external
persistent storage mechanism, and to archive the IOV and the data in
relational tables mapping the costumizable CondDBTable objects. This allow to
take advantages of all the relational features and also to
directly map between transient objects and tables in the database server.
The issue of distributed data storage and partitioning, is also analyzed in
this paper, taking into account the different levels of indirection that are
provided by the ConditionsDB MySQL implementation. These features represent a
very important built in functionality in terms of scalability, data balance
in a system that aims to be completely distributed and with a very high
performance for hundreds of users.
(FACULTY OF SCIENCES OF THE UNIVERSITY OF LISBON)
CDB - Distributed Conditions Database of BaBar Experiment
A new, completely redesigned Condition/DB was deployed in BaBar in October 2002. It
replaced the old database software used through the first three and half years of
The new software aims at performance and scalability limitations of the original
database. However this major redesign brought in a new model of the metadata, brand
new technology- and implementation- independent API, flexible configurability and
One of the greatest strength of new CDB is that it's been designed to be a
distributed kind database from the ground up to facilitate propagation and exchange
of conditions (calibrations, detector alignments, etc.) in the realm of the
international HEP collaboration.
The first implementation of CDB uses Objectivity/DB as its underlying persistent
technology. There is an ongoing study to understand how to implement CDB on top of
other persistent technologies.
The talk will cover the whole spectrum of topics ranging from the basic conceptual
model of the new database through the way CDB is currently exploited in BaBar to the
directions of further developments.
(LAWRENCE BERKELEY NATIONAL LABORATORY)
LCG Conditions Database Project Overview
The Conditions Database project has been launched
to implement a common persistency solution for experiment conditions data
in the context of the LHC Computing Grid (LCG) Persistency Framework.
Conditions data, such as calibration, alignment or slow control data,
are non-event experiment data characterized by the fact
that they vary in time and may have different versions.
The LCG project draws on preexisting projects which have led
to the definition of a generic C++ API for condition data access
and its implementation using different storage technologies,
such as Objectivity, MySQL or Oracle.
The project is assigned the task to deliver a production release
of the software including implementation libraries for several
technologies and high level tools for data management.
The presentation will review the current status of the LCG common project
at the time of the conference and the plans for its evolution.
Distributed Computing ServicesTheatersaal
Experience with POOL from the LCG Data Challenges of the three LHC experiments
This presentation will summarise the deployment experience gained
with POOL during the first larger LHC experiments data challenges
performed. In particular we discuss the storage access
performance and optimisations, the integration issues with grid
middleware services such as the LCG Replica Location Service
(RLS) and the LCG Replica Manager and experience with the POOL
proposed way of exchanging meta data (such as File Catalog catalogue
entries) in a de-coupled production system.
Middleware for the next generation Grid infrastructure
The aim of the EGEE (Enabling Grids for E-Science in Europe) is to
create a reliable and dependable European Grid infrastructure for
e-Science. The objective of the Middleware Re-engineering and Integration
Research Activity is to provide robust middleware components,
deployable on several platforms and operating systems, corresponding
to the core Grid services for resource access, data management,
information collection, authentication & authorization, resource
matchmaking and brokering, and monitoring and accounting.
For achieving this objective, we developed an architecture and
design of the next generation Grid middleware leveraging experiences
and existing components mainly from AliEn, EDG, and VDT. The
architecture follows the service breakdown developed by the LCG ARDA
RTAG. Our goal is to do as little original development as possible but
rather re-engineer and harden existing Grid services. The evolution of
these middleware components towards a Service Oriented Architecture
(SOA) adopting existing standards (and following emerging ones) as
much as possible is another major goal of our activity.
A rapid prototyping approach has been adopted, providing a sequence of
more sophisticated prototypes to the EGEE candidate applications
coming from the LHC HEP experiments and the Biomedical field. The
close feedback loop with applications via these prototypes in
indispensible for achieving our ultimate goals of providing a reliable
and dependable Grid infrastructure.
In this paper we will report on the architecture and design of the
main Grid components and report on our experiences with early
The Clarens Grid-enabled Web Services Framework: Services and Implementation
Clarens enables distributed, secure and high-performance access to the
worldwide data storage, compute, and information Grids being constructed in
anticipation of the needs of the Large Hadron Collider at CERN. We report on
the rapid progress in the development of a second server implementation in
the Java language, the evolution of a peer-to-peer network of Clarens
servers, and general improvements in client and server implementations.
Services that are implemented at this time include read/write file access,
service lookup and discovery, configuration management, job execution,
Virtual Organization Management, an LHCb Information Service, as well as web
service interfaces to POOL replica location and metadata catalogs, MonaLISA
monitoring information, CMS MCRunjob workflow management, BOSS job
monitoring and bookkeeping, Sphinx job scheduler and Chimera virtual data
Commodity web service protocols allows a wide variety of computing
platoforms and applications to be used to securely access Clarens services,
including a standard web browser, Java applets and stand-alone applications,
the ROOT data analysis package, as well as libraries that provide
programmatic access from the Python, C/C++ and Java languages.
(California Institute of Technology)
Experiences with the gLite Grid Middleware
The ARDA project was started in April 2004 to support
the four LHC experiments (ALICE, ATLAS, CMS and LHCb)
in the implementation of individual
production and analysis environments based on the EGEE middleware.
The main goal of the project is to allow a fast feedback between the
experiment and the middleware development teams via the
construction and the usage of end-to-end prototypes
allowing users to perform analyses out of the present
data sets from recent montecarlo productions.
The LCG ARDA project is contributing to the development
of the new EGEE Grid middleware by exercising it with realistic
analysis systems developed within the four LHC experiments. We will
present our experiences in using the EGEE middleware in first
prototypes developed by the experiments together with the ARDA
project. We will cover aspects such as the usability of individual
components of the middleware and give an overview on which
components are used by which experiments.
Global Distributed Parallel Analysis using PROOF and AliEn
The ALICE experiment and the ROOT team have developed a Grid-enabled version of PROOF that allows
efficient parallel processing of large and distributed data samples. This system has been integrated with the
ALICE-developed AliEn middleware. Parallelism is implemented at the level of each local cluster for efficient
processing and at the Grid level, for optimal workload management of distributed resources. This system
allows harnessing large Computing on Demand capacity during an interactive session. Remote parallel
computations are spawned close to the data, minimising network traffic. If several copies of the data are
available, a workload management system decides automatically where to send the task. Results are
automatically merged and displayed at the user workstation. The talk will describe the different components
of the system (PROOF, the parallel ROOT engine, and the AliEn middleware), the present status and future
plans for the development and deployment and the consequences for the ALICE computing model.
Software agents in data and workflow management
CMS currently uses a number of tools to transfer data which, taken together, form
the basis of a heterogenous datagrid. The range of tools used, and the directed,
rather than optimised nature of CMS recent large scale data challenge required the
creation of a simple infrastructure that allowed a range of tools to operate in a
The system created comprises a hierarchy of simple processes (named agents) that
propagate files through a number of transfer states. File locations and some
application metadata were stored in POOL file catalogues, with LCG LRC or MySQL
backends. Agents were assigned limited responsibilities, and were restricted to
communicating state in a well-defined, indirect fashion through a central transfer
management database. In this way, the task of distributing data was easily divided
between different groups for implementation.
The prototype system was developed rapidly, and achieved the required sustained
transfer rate of ~10 MBps, with O(10^6) files distributed to 6 sites from CERN.
Experience with the system during the data challenge raised issues with underlying
technology (MSS write/read, stability of the LRC, maintenance of file catalogues,
synchronisation of filespaces _) which have been successfully identified and
handled. The development of this prototype infrastructure allows us to plan the
evolution of backbone CMS data distribution from a simple hierarchy to a more
autonomous, scalable model drawing on emerging agent and grid technology.
(CMS, UNIVERSITY OF BRISTOL)
Housing Metadata for the Common Physicist Using a Relational Database
SAM was developed as a data handling system for Run II at Fermilab. SAM is a
collection of services, each described by metadata. The metadata are modeled on a
relational database, and implemented in ORACLE. SAM, originally deployed in
production for the D0 Run II experiment, has now been also deployed at CDF and is
being commissioned at MINOS. This illustrates that the metadata decomposition of its
services has a broader applicability than just one experiment. A joint working group
on metadata with representatives from ATLAS, BaBar, CDF, CMS, D0, and LHCB in
cooperation with EGEE has examined this metadata decomposition in the light of
general HEP user requirements.
Greater understanding of the required services of a performant data handling system
has emerged from Run II experience. This experience is being merged with the
understanding being developed in the course of LHC experience with data challenges
and user case discussions. We describe the SAM schema and the commonalities of
function and service support between this schema and proposals for the LHC
experiments. We describe the support structure required for SAM schema updates, the
use of development, integration, and production instances. We are also looking at
the LHC proposals for the evolution of schema using keyword-value pairs that are
then transformed into a normalized, performant database schema.
500-Housing Metadata for the Common Physicist Using a Relational Database
Lattice QCD Data and Metadata Archives at Fermilab and the International Lattice Data Grid
The lattice gauge theory community produces large volumes of
data. Because the data produced by completed computations form the
basis for future work, the maintenance of archives of existing data
and metadata describing the provenance, generation parameters, and
derived characteristics of that data is essential not only as a
reference, but also as a basis for future work. Development of these
archives according to uniform standards both in the data and metadata
formats provided and in the software interfaces to the component
services could greatly simplify collaborations between institutions
and enable the dissemination of meaningful results.
This paper describes the progress made in the development of a set of
such archives at the Fermilab lattice QCD facility. We are
coordinating the development of the interfaces to these facilities
and the formats of the data and metadata they provide with the efforts
of the international lattice data grid (ILDG) metadata and middleware
working groups, whose goals are to develop standard formats for
lattice QCD data and metadata and a uniform interface to archive
facilities that store them. Services under development include those
commonly associate with data grids: a service registry, a metadata
database, a replica catalog, and an interface to a mass storage
system. All services provide GSI authenticated web service interfaces
following modern standards, including WSDL and SOAP, and accept and
provide data and metadata following recent XML based formats proposed
by the ILDG metadata working group.
(FERMI NATIONAL ACCELERATOR LABORATORY)
Huge Memory systems for data-intensive science
Distributed Computing Systems and ExperiencesBallsaal
BaBar simulation production - A millennium of work in under a year
for the BaBar Computing Group.
The analysis of the BaBar experiment requires many times the measured
data to be produced in simulation. This requirement has resulted in
one of the largest distributed computing projects ever completed.
The latest round of simulation for BaBar started in early 2003, and
completed in early 2004, and encompassed over 1 million jobs, and
over 2.2 billion events. By the end of the production cycle over 2
dozen different computing centers and nearly 1.5 thousand cpus were
in constant use in North America and Europe. The whole effort was
managed from a central database at SLAC, with real-time updates of
the status of all jobs. Utilities were developed to tie together
production with many different batch systems, and with different
needs for security.
The produced data was automatically transfered to SLAC for use and
distribution to analysis sites. The system developed to manage this
effort was a combination of web and database applications, and
command line utilities. The technologies used to complete this
effort along with its complete scope will be presented.
(STANFORD LINEAR ACCELERATOR CENTER)
Role of Tier-0, Tier-1 and Tier-2 Regional Centres in CMS DC04
The CMS 2004 Data Challenge (DC04) was devised to test several key
aspects of the CMS Computing Model in three ways: by trying to
sustain a 25 Hz reconstruction rate at the Tier-0; by distributing
the reconstructed data to six Tier-1 Regional Centers (FNAL in US,
FZK in Germany, Lyon in France, CNAF in Italy, PIC in Spain, RAL in
UK) and handling catalogue issues; by redistributing data to Tier-2
centers for analysis. Simulated events, up to the digitization step,
were produced prior to the DC as input for the reconstruction in the
Pre-Challenge Production (PCP04).
In this paper, the model of the Tier-0 implementation used in DC04 is
described, as well as the experience gained in using the newly
developed data distribution management layer, which allowed CMS to
successfully direct the distribution of data from Tier-0 to Tier-1
sites by loosely integrating a number of available Grid components.
While developing and testing this system, CMS explored the overall
functionality and limits of each component, in any of the different
implementations which were deployed within DC04.
The role of Tier-1's is presented and discussed, from the import of
data from Tier-0, to the archiving on to the local mass storage
system and the data
distribution management to Tier-2's for analysis. Participating Tier-
1's differed in
available resources, set-up and configuration: a critical evalutation
of the results
and performances achieved adopting different strategies in the
management of each Tier-1 center to support CMS DC04 is presented.
AMS-02 Computing and Ground Data Handling
AMS-02 Computing and Ground Data Handling.
V.Choutko (MIT, Cambridge), A.Klimentov (MIT, Cambridge) and
M.Pohl (Geneva University)
AMS (Alpha Magnetic Spectrometer) is an experiment to search in
space for dark matter and antimatter on the International Space
Station (ISS). The AMS detector had a precursor flight in 1998 (STS-
91, June 2-12, 1998). More than 100M events were collected and
The final detector (AMS-02) will be installed on ISS in the fall of
2007 for at least 3 years. The data will be transmitted from ISS to
NASA Marshall Space Flight Center (MSFC, Huntsvile, Alabama) and
transfered to CERN (Geneva Switzerland) for processing and analysis.
We are presenting the AMS-02 Ground Data Handling scenario and
requirements to AMS ground centers: the Payload Operation and Control
Center (POCC) and the Science Operation Center (SOC).
The Payload Operation and Control Center is where AMS operations
take place, including commanding, storage and analysis of house
keeping data and partial science data analysis for rapid quality
control and feed back.
The AMS Science Data Center receives and stores all AMS science and
house keeping data, as well as ancillary data from NASA. It ensures
full science data reconstruction, calibration and alignment; it keeps
data available for physics analysis and archives all data.
We also discuss the AMS-02 distributed MC production currently
running in 15 Universities and Labs in Europe, USA and Asia, with
automatic jobs submission and control from one central place (CERN).
The software uses CORBA technology to control and monitor MC
production and an ORACLE relational database, to keep catalogues,
event description as well as production and monitoring information.
Distributed Computing Grid Experiences in CMS DC04
In March-April 2004 the CMS experiment undertook a Data Challenge(DC04).
During the previous 8 months CMS undertook a large simulated event
production. The goal of the challenge was to run CMS reconstruction for
sustained period at 25Hz input rate, distribute the data to the CMS Tier-1
centers and analyze them at remote sites. Grid environments developed in
Europe by the LHC Computing Grid (LCG) in Europe and in the US with Grid2003 were
utilized to complete the aspects of the challenge.
During the simulation phase, US-CMS utilized Grid2003 to simulate and
process approximately 17 million events. Simultaneous usage of CPU
resources peaked at 1200 CPUs, controlled by a single FTE. Using Grid3 was a
milestone for CMS computing in reaching a new magnitude in the
number of autonomously cooperating computing sites for production. The
use of Grid-based job execution resulted in reducing the overall support effort
required to submit and monitor jobs by a factor of two.
During the challenge itself, the CMS groups from Italy and Spain used the LCG Grid
Environment to satisfy challenge requirements . The LCG Replica
Manager was used to transfer the data. The CERN RLS provided the needed
replica catalogue functionality. The LCG submission system based on the
Resource Broker was used to submit analysis jobs to the sites hosting the
data. A CMS dedicated GridICE monitoring was activated to monitor both
services and resources.
A description of the experiences, successes and lessons learned from both
experiences with grid infrastructure is presented.
Experience producing simulated events for the DZero experiment on the SAM-Grid
Most of the simulated events for the DZero experiment at Fermilab have been
historically produced by the “remote” collaborating institutions. One of the
principal challenges reported concerns the maintenance of the local software
infrastructure, which is generally different from site to site. As the understanding
of the community on distributed computing over distributively owned and shared
resources progresses, it becomes increasingly interesting the adoption of grid
technologies to address the production of montecarlo events for high energy physics
experiments. The SAM-Grid is a software system developed at Fermilab, which
integrates standard grid technologies for job and information management with SAM,
the data handling system of the DZero and CDF experiments. During the past few
months, this grid system has been tailored for the montecarlo production of DZero.
Since the initial phase of deployment, this experience has exposed an interesting
series of requirements to the SAM-Grid services, the standard middleware, the
resources and their management and to the analysis framework of the experiment. As of
today, the inefficiency due to the grid infrastructure has been reduced to as little
as 1%. In this paper, we present our statistics and the "lesson learned" in running
large high energy physics applications on a grid infrastructure.
The ALICE Data Challenge 2004 and the ALICE distributed analysis prototype
During the first half of 2004 the ALICE experiment has performed a large distributed
computing exercise with two major objectives: to test the ALICE computing model,
included distributed analysis, and to provide data sample for a refinement of the
ALICE Jet physics Monte-Carlo studies. Simulation reconstruction and analysis of
several hundred thousand events were performed, using the heterogeneous resources of
tens of computer centres worldwide. These resources belong to different GRID systems
and were steered by the AliEn (ALICE Environment) framework, acting as a meta-GRID.
This has been a very thorough test of the middleware of AliEn and LCG (LCG-2 and
grid.it resources) and their compatibility. During the Data Challenge more than
1,500 jobs run in parallel for several weeks. More than 50 TB of data have been
produced and analysed worldwide in one of the major exercises of this kind run to
date. ALICE has developed an analysis system based on AliEn and ROOT. This system
starts with a metadata selection in the AliEn file catalogue, followed by a
computation phase. Analysis jobs are sent where the data is, thus minimising data
movement. The control is performed by an intelligent workload management system. The
analysis can be done either via batch or interactive jobs. The latter are "spawned"
on remote systems and report the results back to the user workstation. The talk will
describe the ALICE experience with this large-scale use of the Grid, the major
lessons learned and the consequences for the ALICE computing model.
Results of the LHCb experiment Data Challenge 2004
The LHCb experiment performed its latest Data Challenge (DC) in May-July 2004.
The main goal was to demonstrate the ability of the LHCb grid system to carry out
massive production and efficient distributed analysis of the simulation data.
The LHCb production system called DIRAC provided all the necessary services for the
DC: Production and Bookkeeping Databases, File catalogs, Workload and Data
Management systems, Monitoring and Accounting tools. It allowed to combine in a
consistent way resources of more than 20 LHCb production sites as well as the LCG2
grid resources. 200M events constituting 90 TB of data were produced and stored in
6 Tier 1 centers. The subsequent analysis was carried out at CERN as well as in all
the Tier 1 centers to where preselected datasets were distributed. The GANGA User
Interface was used to assist users in preparation of their analysis jobs and
running them on the local and remote computing resources.
We will present the DC results, the experience gained utilising DIRAC and
LCG2 grids as well as further developments necessary to achieve the scalability
level of the real running LHCb experiment.
ATLAS Data Challenge Production on Grid3
We describe the design and operational experience of the ATLAS production system as
implemented for execution on Grid3 resources. The execution environment consisted
of a number of grid-based tools: Pacman for installation of VDT-based Grid3 services
and ATLAS software releases, the Capone execution service built from the
Chimera/Pegasus virtual data system for directed acyclic graph (DAG) generation,
DAGMan/Condor-G for job submission and management , and the Windmill production
supervisor which provides the messaging system for distributing production tasks to
Capone. Produced datasets were registered into a distributed replica location
service (Globus RLS) that was integrated with the Don Quixote proxy service for
interoperability with other Grids used by ATLAS. We discuss performance,
scalability, and fault handling during the first phase of ATLAS Data Challenge 2.
(UNIVERSITY OF CHICAGO)
Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2
This talk describes the various stages of ATLAS Data Challenge 2 (DC2)
in what concerns usage of resources deployed via NorduGrid's Advanced
Resource Connector (ARC). It also describes the integration of these
resources with the ATLAS production system using the Dulcinea
ATLAS Data Challenge 2 (DC2), run in 2004, was designed to be a step
forward in the distributed data processing. In particular, much
coordination of task assignment to resources was planned to be
delegated to Grid in its different flavours. An automatic production
management system was designed, to direct the tasks to Grids and
The Dulcinea executor is a part of this system that provides interface
to the information system and resource brokering capabilities of the
ARC middleware. The executor translates the job definitions recieved
from the supervisor to the extended resource specification language
(XRSL) used by the ARC middleware. It also takes advantage of the ARC
middleware's built-in support for the Globus Replica Location Server
(RLS) for file registration and lookup.
NorduGrid's ARC has been deployed on many ATLAS-dedicated resources
across the world in order to enable effective participation in ATLAS
DC2. This was the first attempt to harness large amounts of strongly
heterogeneous resources in various countries for a single
collaborative exercise using Grid tools. This talk addresses various
issues that arose during different stages of DC2 in this environment:
preparation, such as ATLAS software installation; deployment of the
middleware; and processing. The results and lessons are summarized as
The simulation for the ATLAS experiment: present status and outlook
The simulation for the ATLAS experiment is presently operational in a full OO
environment and it is presented here in terms of successful solutions to problems
dealing with application in a wide community using a common framework. The ATLAS
experiment is the perfect scenario where to test all applications able to satisfy the
different needs of a big community. Following a well stated strategy of transition
from the GEANT3 to the GEANT4-based simulation, a good validation programme during
the last months confirmed the characteristics of reliability, performance and
robustness of this new tool in comparison with the results of the previous
simulation. Generation, simulation and digitization steps on different full sets of
physics events were tested in terms of performance and robustness in comparisons with
the same samples undergoing the old GEANT3-based simulation. The simulation program
is simultaneously tested on all different testbeam setups characterizing the R&D
programme of all subsystems belonging to the ATLAS detector with comparison to real
data in order to validate the physics content and the reliability in the detector
description of each component.
(PAVIA UNIVERSITY & INFN)
An Object-Oriented Simulation Program for CMS
The CMS detector simulation package, OSCAR, is based on the Geant4 simulation toolkit
and the CMS object-oriented framework for simulation and reconstruction.
Geant4 provides a rich set of physics processes describing in detail electro-magnetic
and hadronic interactions. It also provides the tools for the implementation of the
full CMS detector geometry and the interfaces required for recovering information
from the particle tracking in the detectors.
This functionality is interfaced to the CMS framework, which, via its "action on
demand" mechanisms, allows the user to selectively load desired modules and to
configure and tune the final application.
The complete CMS detector is rather complex with more than 12 million readout
channels and more than 1 million geometrical volumes.
OSCAR has been validated by comparing its results with test beam data and with
results from simulation with a GEANT3-based program.
It has been succesfully deployed in the 2004 data challenge for CMS, where ~20
million events for various LHC physics channels were simulated and analysed.
S. Abdulline, V. Andreev, P. Arce, S. Arcelli, S. Banerjee, T. Boccali,
M. Case, A. De Roeck, S. Dutta, G. Eulisse, D. Elvira, A. Fanfani, F. Ferro,
M. Liendl, S. Muzaffar, A. Nikitenko, K. Lassila-Perini, I. Osborne,
M. Stavrianakou, T. Todorov, L. Tuura, H.P. Wellisch, T. Wildish, S. Wynhoff,
M. Zanetti, A. Zhokin, P. Zych
The Virtual MonteCarlo : status and applications
The current major detector simulation programs, i.e. GEANT3, GEANT4
and FLUKA have largely incompatible environments. This forces the
physicists willing to make comparisons between the different
transport Monte Carlos to develop entirely different programs.
Moreover, migration from one program to the other is usually
very expensive, in manpower and time, for an experiment offline
environment, as it implies substantial changes in the simulation
code. To solve this problem, the ALICE Offline project has developed
a virtual interface to these three programs allowing their seamless
use without any change in the framework, the geometry description or
the scoring code. Moreover a new geometrical modeller has been
developed in collaboration with the ROOT team, and successfully
interfaced to the three programs. This allows the use of one
description of the geometry, which can be used also during
reconstruction and visualisation. The talk will describe the present
status and future plans for the Virtual Monte Carlo. It will also
present the capabilities and performance of the geometrical modeller.
From Geant 3 to Virtual Monte Carlo: Approach and Experience
The STAR Collaboration is currently using simulation software
based on Geant 3. The emergence of the new Monte Carlo
simulation packages, coupled with evolution of both STAR
detector and its software, requires a drastic change of
the simulation framework.
We see the Virtual Monte Carlo (VMC) approach as providing
a layer of abstraction that facilitates such transition.
The VMC platform is a candidate to replace the present legacy
software, and help avoid its certain shortcomings, such as
the use of a particular algorithmic language to describe the
detector geometry. It will also allow us to introduce a more
flexible in-memory representation of the geometry.
The Virtual Monte Carlo concept includes a platform-neutral
kernel of the application, to the highest degree possible.
This kernel is then equipped with interfaces to the modules
responsible for simulating the physics of particle propagation,
We consider the geometry description classes in the ROOT
system (in its latest form known as TGeo classes) as a good
choice for the in-memory geometry representation.
We present an application design based on the Virtual Monte Carlo,
along with the results of testing, benchmarking and comparison
to Geant 3. Internal event representation and IO model will
be also discussed.
(BROOKHAVEN NATIONAL LABORATORY)
The High Level Trigger software for the CMS experiment
The observation of Higgs bosons predicted in supersymmetric theories
will be a challenging task for the CMS experiment at the LHC, in
particular for its High Level trigger (HLT). A prototype of the
High Level Trigger software to be used in the filter farm of the CMS
experiment and for the filtering of monte carlo samples will be
presented. The implemented prototype heavily uses recursive
processing of a HLT tree and allows dynamic trigger definition.
Firstly the general architecture and design choices as well
as the timing performance of the system will be reviewed in the
light of the DAQ constrains. Secondly, specific trigger
implementations in the context of the object-oriented Reconstruction
for CMS Analysis (ORCA) software will be detailed.
Finally, the analysis for the selection of a CP even Higgs decaying
in tau pairs will be presented. The Aforementioned analysis will
illustrate the importance of the trigger strategies required to
achieve the various physics analysis in CMS.
O. van der Aa
(INSTITUT DE PHYSIQUE NUCLEAIRE, UNIVERSITE CATHOLIQUE DE LOUVAIN)
High Level Trigger software for the CMS experiment
Fast tracking for the ATLAS LVL2 Trigger
We present a set of algorithms for fast pattern recognition and track
reconstruction using 3D space points aimed for the High Level
Triggers (HLT) of multi-collision hadron collider environments. At
the LHC there are several interactions per bunch crossing separated
along the beam direction, z. The strategy we follow is to (a)
identify the z-position of the interesting interaction prior to any
track reconstruction; (b) select groups of space points pointing
back to this z-position, using a histogramming technique which
avoids performing any combinatorics; and (c) proceed to the
combinatorial tracking only within the individual groups of space
points. The validity of this strategy will be demonstrated with
results in terms of timing and physics performance for the LVL2
trigger of ATLAS at the LHC, although the strategy is generic and
can be applied to any multi-collision hadron collider experiment.
In addition, the algorithms are conceptually simple, flexible and
robust and hence appropriate for use in demanding, online
environments. We will also make qualitative comparisons with an
alternative, complimentary strategy, based on the use of look-up
tables for handling combinatorics, that has been developed for the
ATLAS LVL2 trigger. These algorithms have been used for the results
that appear in the ATLAS HLT, DAQ and Controls Technical Design
Report, which was recently approved by the LHC Committee.
(UNIVERSITY COLLEGE LONDON)
Implementation and Performance of the High-Level Trigger electron and photon selection for the ATLAS experiment at the LHC
The ATLAS experiment at the Large Hadron Collider (LHC) will face the challenge of
efficiently selecting interesting candidate events in pp collisions at 14 TeV center-
of-mass energy, whilst rejecting the enormous number of background events, stemming
from an interaction rate of about 10^9 Hz. The Level-1 trigger will reduce the
incoming rate to around O(100 kHz). Subsequently, the High-Level Triggers (HLT),
which are comprised of the second level trigger and the event filter, will need to
reduce this rate further by a factor of O(10^3). The HLT selection is software based
and will be implemented on commercial CPUs using a common framework, which is based
on the standard ATLAS object-oriented software architecture. In this talk an
overview of the current implementation of the selection for electrons and photons in
the trigger is given. The performance of this implementation has been evaluated
using Monte Carlo simulations in terms of the efficiency for the signal channels,
the rate expected for the selection, the data preparation times, and the algorithm
execution times. Besides the efficiency and rate estimates, some physics examples
will be discussed, showing that the triggers are well adapted for the physics
programme envisaged at LHC. The electron/gamma trigger software has been also
integrated in the ATLAS 2004 combined test-beam, to validate the chosen selection
architecture in a real on-line environment.
(University of Geneva, Switzerland)
Event Data Model in ATLAS
The event data model (EDM) of the ATLAS experiment is presented. For large
collaborations like the ATLAS experiment common interfaces and data objects
are a necessity to insure easy maintenance and coherence of the experiments
software platform over a long period of time. The ATLAS EDM improves
commonality across the detector subsystems and subgroups such as trigger, test
beam reconstruction, combined event reconstruction, and physics analysis. The
object oriented approach in the description of the detector data allows the
possibility to have one common raw data flow. Furthermore the EDM allows the
use of common software between online data processing and offline
reconstruction. One important component of the ATLAS EDM is a common track
class which is used for combined track reconstruction across the innermost
tracking subdetectors and is also used for tracking in the muon detectors. The
structure of the track object and the variety of track parameters are
presented. For the combined event reconstruction a common particle class is
introduced which serves as the interface between event reconstruction and
How to build an event store - the new Kanga Event Store for BaBar
In the past year, BaBar has shifted from using Objectivity to using ROOT I/O
as the basis for our primary event store. This shift required a total
reworking of Kanga, our ROOT-based data storage format. We took advantage
of this opportunity to ease the use of the data by supporting multiple
access modes that make use of many of the analysis tools available in
Specifically, our new event store supports: 1) the pre-existing separated
transient + persistent model, 2) a transient based load-on-demand model
currently being developed, 3) direct access to persistent data classes in
compiled code, 4) fully interactive access to persistent data classes from
either the ROOT prompt and via interpreted macros.
We will describe key features of Kanga including: 1) the separation and
management of transient and persistent representations of data, 2) the
implementation of read on demand references in ROOT, 3) the modular and
extensible persistent event design, 4) the implementation of schema
evolution and 5) BaBar specific extensions to core ROOT classes that we
used to preserve the end-user "feel" of ROOT.
(Ruhr Universitaet Bochum)
Using the reconstruction software, ORCA, in the CMS datachallenge
We report on the software for Object-oriented Reconstruction for CMS
Analysis, ORCA. It is based on the Coherent Object-oriented Base for
Reconstruction, Analysis and simulation (COBRA) and used for
digitization and reconstruction of simulated Monte-Carlo events as
well as testbeam data.
For the 2004 data challenge the functionality of the software has
been extended to store collections of reconstructed objects (DST) as
well as the previously storable quantities (Digis) in multiple,
We describe the structure of the DST, the way to ensure and store
the configuration of reconstruction algorithms that fill the
collections of reconstructed objects as well as the relations
between them. Also the handling of multiple streams to store parts
of selected events is discussed. The experience from the
implementation used early 2004 and the modifications for future
optimization of reconstruction and analysis are presented.
H1OO - an analysis framework for H1
During the years 2000 and 2001 the HERA machine and the H1
experiment performed substantial luminosity upgrades. To cope with
the increased demands on data handling an effort was made to
redesign and modernize the analysis software. Main goals were to
lower turn-around time for physics analysis by providing a single
framework for data storage, event selection, physics analysis and
event display. The new object oriented analysis environment is using
C++ and is based on the RooT framework. Data layers with a high
level of abstraction are defined, i.e. physics particles, event
summary information and user specific information.
A generic interface makes the use of reconstruction output stored
in BOS format transparent to the user. Links between all data layers
and partial event reading allow correlating quantities of different
abstraction levels with high performance. Detailed physics
analysis is performed by passing transient data between different
analysis modules. Binding of existing fortran based libraries on
demand allows the use of existing utility functions and interface to
the existing data base. On this basis tools with enhanced
functionality are provided. This framework has become standard for
data analyses of the previously and currently collected data.