Conveners
Computer facilities, production grids and networking: CF 1
- Kors Bos (NIKEF)
Computer facilities, production grids and networking: CF 2
- Kors Bos (NIKEF)
Computer facilities, production grids and networking: CF 3
- Kors Bos (NIKEF)
Computer facilities, production grids and networking: CF 4
- Kors Bos (NIKEF)
Computer facilities, production grids and networking: CF 5
- Kors Bos (NIKEF)
Computer facilities, production grids and networking: CF 6
- Kors Bos (NIKEF)
Computer facilities, production grids and networking: CF 7
- Kors Bos (NIKEF)
Dr
Jamie Shiers
(CERN)
9/3/07, 2:00โฏPM
Computer facilities, production grids and networking
oral presentation
This talk summarises the main lessons learnt from deploying WLCG production services,
with a focus on Reliability, Scalability, Accountability, which lead to both
manageability and usability.
Each topic is analysed in turn. Techniques for zero-user-visible downtime for the
main service interventions are described, together with pathological cases that need
special treatment. The...
Dr
Markus Schulz
(CERN)
9/3/07, 2:20โฏPM
Computer facilities, production grids and networking
oral presentation
Today's production Grids connect large numbers of distributed hosts using high
throughput networks and hence are valuable targets for attackers. In the same way
users transparently access any Grid service independently of its location, an
attacker may attempt to propagate an attack to different sites that are part of a
Grid. In order to contain and resolve the incident, and since such an...
Mrs
Ruth Pordes
(FERMILAB)
9/3/07, 2:40โฏPM
Computer facilities, production grids and networking
oral presentation
The Open Science Grid (OSG) is receiving five years of funding across six program offices of the Department of
Energy Office of Science and the National Science Foundation. OSG is responsible for operating a secure
production-quality distributed infrastructure, a reference software stack including the Virtual Data Toolkit (VDT),
extending the capabilities of the high throughput virtual...
Dr
Jeremy Coles
(RAL)
9/3/07, 3:00โฏPM
Computer facilities, production grids and networking
oral presentation
Over the last few years, UK research centres have provided significant computing
resources for many high-energy physics collaborations under the guidance of the
GridPP project. This paper reviews recent progress in the Grid deployment and
operations area including findings from recent experiment and infrastructure service
challenges. These results are discussed in the context of how GridPP...
Dr
Pavel Murat
(Fermilab)
9/3/07, 3:20โฏPM
Computer facilities, production grids and networking
oral presentation
CDFII detector at Fermilab is taking physics data since 2002.
The architechture of the CDF computing system has substantially
evolved during the years of the data taking and currently it reached stable
configuration which will allow experiment to process and analyse the data
until the end of Run II.
We describe major architechtural components of the CDF offline
computing - dedicated...
Mr
Lars Fischer
(Nordic Data Grid Facility)
9/3/07, 3:40โฏPM
Computer facilities, production grids and networking
oral presentation
The Tier-1 facility operated by the Nordic DataGrid Facility (NDGF) differs
significantly from other Tier-1s in several aspects: It is not located one or a few
locations but instead distributed throughout the Nordic, it is not under the
governance of a single organization but instead a "virtual" Tier-1 build out of
resources under the control of a number of different national...
Dr
Richard Mount
(SLAC)
9/3/07, 4:30โฏPM
Computer facilities, production grids and networking
oral presentation
The PetaCache project started at SLAC in 2004 with support from DOE
Computer Science and the SLAC HEP program. PetaCache focuses on using
cost-effective solid state storage for the hottest data under analysis. We chart
the evolution of metrics such as accesses per second per dollar for different
storage technologies and deduce the near inevitability of a massive use of solid-
state...
Dr
Giuseppe Lo Presti
(CERN/INFN)
9/3/07, 4:50โฏPM
Computer facilities, production grids and networking
oral presentation
In this paper we present the architecture design of the CERN Advanced Storage system
(CASTOR) and its new disk cache management layer (CASTOR2).
Mass storage systems at CERN have evolved over time to meet growing requirements,
both in terms of scalability and fault resiliency. CASTOR2 has been designed as a
Grid-capable storage resource sharing facility, with a database-centric...
Dr
Horst Goeringer
(GSI)
9/3/07, 5:10โฏPM
Computer facilities, production grids and networking
oral presentation
GSI in Darmstadt (Germany) is a center for heavy ion research
and hosts an Alice Tier2 center.
For the future FAIR experiments at GSI,
CBM and Panda, the planned data rates
will reach those of the current LHC experiments at Cern.
Since more than ten years gStore, the GSI Mass Storage System,
is successfully in operation.
It is a hierarchical storage system with a unique name...
Paul Avery
(University of Florida)
9/3/07, 5:30โฏPM
Computer facilities, production grids and networking
oral presentation
UltraLight is a collaboration of experimental physicists and network engineers whose
purpose is to provide the network advances required to enable and facilitate
petabyte-scale analysis of globally distributed data. Existing Grid-based
infrastructures provide massive computing and storage resources, but are currently
limited by their treatment of the network as an external, passive, and...
Ms
Alessandra Forti
(University of Manchester)
9/3/07, 5:50โฏPM
Computer facilities, production grids and networking
oral presentation
The HEP department of the University of Manchester has purchased a 1000
nodes cluster. The cluster is dedicated to run EGEE and LCG software and is currently
supporting 12 active VOs. Each node is equipped with
2x250 GB disks for a total amount of 500 GB and there is no tape storage behind nor
raid arrays are used. Three different storage solutions are
currently being deployed to...
Dr
Ian Fisk
(FNAL)
9/4/07, 11:00โฏAM
Computer facilities, production grids and networking
oral presentation
In preparation for the start of the experiment, CMS has conducted computing, software, and analysis challenges to
demonstrate the functionality, scalability, and useability of the computing and software components. These
challenges are designed to validate the CMS distributed computing model by demonstrating the functionality of
many components simultaneously. In the challenges CMS...
Mr
Michel Jouvin
(LAL / IN2P3)
9/4/07, 11:20โฏAM
Computer facilities, production grids and networking
oral presentation
Quattor is a tool aimed at efficient management of fabrics with hundred or
thousand of Linux machines, still being easy enough to manage smaller
clusters. It has been originally developed inside the European Data Grid (EDG)
project. It is now in use at more than 30 grid sites running gLite middleware,
ranging from small LCG T3 to very large one like CERN.
Main goals and specific...
Torsten Antoni
(Forschungszentrum Karlsruhe)
9/4/07, 11:40โฏAM
Computer facilities, production grids and networking
oral presentation
The organization and management of the user support in a global e-science computing
infrastructure such as EGEE is one of the challenges of the grid. Given the widely
distributed nature of the organisation, and the spread of expertise for installing,
configuring, managing and troubleshooting the grid middleware services, a standard
centralized model could not be deployed in EGEE. This...
Mr
Antonio Retico
(CERN)
9/4/07, 12:00โฏPM
Computer facilities, production grids and networking
oral presentation
Grids have the potential to revolutionise computing by providing ubiquitous, on
demand access to computational services and resources. They promise to allow for on
demand access and composition of computational services provided by multiple
independent sources. Grids can also provide unprecedented levels of parallelism for
high-performance applications. On the other hand, grid...
Dirk Duellmann
(CERN)
9/5/07, 2:00โฏPM
Computer facilities, production grids and networking
oral presentation
Relational database services are a key component of the computing models for the Large Hadron Collider (LHC). A
large proportion of non-event data including detector conditions, calibration, geometry and production
bookkeeping metadata require reliable storage and query services in the LHC Computing Grid (LCG). Also core grid
services to catalogue and distribute data cannot operate...
Dr
Xavier Espinal
(PIC/IFAE)
9/5/07, 2:20โฏPM
Computer facilities, production grids and networking
oral presentation
In preparation for first data at the LHC, a series of Data Challenges, of
increasing scale and complexity, have been performed. Large quantities of
simulated data have been produced on three different Grids, integrated into
the ATLAS production system. During 2006, the emphasis moved towards providing
stable continuous production, as is required in the immediate run-up to first
data, and...
Mr
Jose Hernandez Calama
(CIEMAT)
9/5/07, 2:40โฏPM
Computer facilities, production grids and networking
oral presentation
Monte Carlo production in CMS has received a major boost in performance and
scale since last CHEP conference. The production system has been re-engineered
in order to incorporate the experience gained in running the previous system
and to integrate production with the new CMS event data model, data management
system and data processing framework. The system is interfaced to the two...
Smirnov Yuri
(Brookhaven National Laboratory)
9/5/07, 3:20โฏPM
Computer facilities, production grids and networking
oral presentation
The Open Science Grid infrastructure provides one of the largest distributed
computing systems deployed in the ATLAS experiment at the LHC. During the CSC
exercise in 2006-2007, OSG resources provided about one third of the worldwide
distributed computing resources available in ATLAS. About half a petabyte of ATLAS MC
data is stored on OSG sites. About 2000k SpecInt2000 CPU's is available....
Mr
Dave Evans
(Fermi National Laboratory)
9/5/07, 3:40โฏPM
Computer facilities, production grids and networking
oral presentation
The CMS production system has undergone a major architectural upgrade from its
predecessor, with the goals of reducing the operations manpower requirement and
preparing for the large scale production required by the CMS physics plan.
This paper discusses the CMS Monte Carlo Workload Management architecture. The
system consist of 3 major components: ProdRequest, ProdAgent, and ProdMgr...
Mr
Philip DeMar
(FERMILAB)
9/5/07, 4:30โฏPM
Computer facilities, production grids and networking
oral presentation
Fermilab hosts the American Tier-1 Center for the LHC/CMS experiment. In preparation
for the startup of CMS, and building upon extensive experience supporting TeVatron
experiments and other science collaborations, the Laboratory has established high
bandwidth, end-to-end (E2E) circuits with a number of US-CMS Tier2 sites, as well as
other research facilities in the collaboration. These...
Mr
Maxim Grigoriev
(FERMILAB)
9/5/07, 4:50โฏPM
Computer facilities, production grids and networking
oral presentation
The LHC experiments will start very soon, creating immense data volumes capable of
demanding allocation of an entire network circuit for task-driven applications.
Circuit-based alternate network paths are one solution to meeting the LHC high
bandwidth network requirements. The Lambda Station project is aimed at addressing
growing requirements for dynamic allocation of alternate network...
Dr
Matt Crawford
(FERMILAB)
9/5/07, 5:10โฏPM
Computer facilities, production grids and networking
oral presentation
Due to shortages of IPv4 address space - real or artificial - many HEP
computing installations have turned to NAT and application gateways.
These workarounds carry a high cost in application complexity and
performance. Recently a few HEP facilities have begun to deploy IPv6
and it is expected that many more must follow within several years.
While IPv6 removes the problem of address...
Mr
Maxim Grigoriev
(FERMILAB)
9/5/07, 5:30โฏPM
Computer facilities, production grids and networking
oral presentation
End-to-end (E2E) circuits are used to carry high impact data movement into and out of
the US CMS Tier-1 Center at Fermilab. E2E circuits have been implemented to
facilitate the movement of raw experiment data from Tier-0, as well as processed data
to and from a number of the US Tier-2 sites. Troubleshooting and monitoring those
circuits presents a challenge, since the circuits typically...
Dr
Luc Goossens
(CERN)
9/5/07, 5:50โฏPM
Computer facilities, production grids and networking
oral presentation
ATLAS is a multi-purpose experiment at the LHC at CERN,
which will start taking data in November 2007.
To handle and process the unprecedented data rates expected
at the LHC (at nominal operation, ATLAS will record about
10 PB of raw data per year) poses a huge challenge on the
computing infrastructure.
The ATLAS Computing Model foresees a multi-tier hierarchical
model to perform this...
Dr
Lukas Nellen
(I. de Ciencias Nucleares, UNAM)
9/6/07, 2:00โฏPM
Computer facilities, production grids and networking
oral presentation
The EELA project aims at building a grid infrastructure in Latin
America and at attracting users to this infrastructure. The EELA
infrastructure is based on the gLite middleware, developed by the EGEE
project. A test-bed, including several European and Latin American
countries, was set up in the first months of the project. Several
applications from different areas, especially...
Dr
Alexei Klimentov
(BNL)
9/6/07, 2:20โฏPM
Computer facilities, production grids and networking
oral presentation
ATLAS Distributed Data Management Operations Team unites experts from
Tier-1s and Tier-2s computer centers. The group is responsible for all day
by day ATLAS data distribution between different sites and centers.
In our paper we describe ATLAS DDM operation model and address the
data management and operation issues. A serie of Functional Tests have
been conducted in the past and is in...
Dr
Daniele Bonacorsi
(INFN-CNAF, Bologna, Italy)
9/6/07, 2:40โฏPM
Computer facilities, production grids and networking
oral presentation
The CMS experiment is gaining experience towards the data taking in several computing preparation activities, and a
roadmap towards a mature computing operations model stands as a primary target. The responsibility of the
Computing Operations projects in the complex CMS computing environment spawns a wide area and aims at
integrating the management of the CMS Facilities Infrastructure,...
Luca dell'Agnello
(INFN-CNAF)
9/6/07, 3:00โฏPM
Computer facilities, production grids and networking
oral presentation
Performance, reliability and scalability in data access are key issues when
considered in the context of HEP data processing and analysis applications.
The importance of these topics is even larger when considering the quantity of data
and the request load that a LHC data centers has to support.
In this paper we give the results and the technical details of a large scale
validation,...
Jan van ELDIK
(CERN)
9/6/07, 3:20โฏPM
Computer facilities, production grids and networking
oral presentation
This paper presents work, both completed and planned, for streamlining the
deployment, operation and re-tasking of Castor2 instances. We present a summary of
what has recently been done to reduce the human intervention necessary for bringing
systems into operation; including the automation of Grid host certificate requests
and deployment in conjunction with the CERN Trusted CA and...
Mr
Timur Perelmutov
(FERMILAB)
9/6/07, 3:40โฏPM
Computer facilities, production grids and networking
oral presentation
The Storage Resource Manager (SRM) and WLCG collaborations recently
defined version 2.2 of the SRM protocol, with the goal of satisfying
the requirement of the LCH experiments. The dCache team has now
finished the implementation of all SRM v2.2 elements required by the
WLCG. The new functions include space reservation, more advanced data
transfer, and new namespace and permission...
Dr
Maxim Potekhin
(BROOKHAVEN NATIONAL LABORATORY)
9/6/07, 4:30โฏPM
Computer facilities, production grids and networking
oral presentation
The simulation program for the STAR experiment at Relativistic Heavy Ion Collider at
Brookhaven National Laboratory is growing in scope and responsiveness to the needs of
the research conducted by the Physics
Working Groups. In addition, there is a significant ongoing R&D activity aimed at
future upgrades of the STAR detector, which also requires extensive simulations
support. The...
Igor Sfiligoi
(Fermilab)
9/6/07, 4:50โฏPM
Computer facilities, production grids and networking
oral presentation
Pilot jobs are becoming increasingly popular in the Grid world. Experiments like
ATLAS and CDF are
using them in production, while others, like CMS, are actively evaluating them.
Pilot jobs enter Grid sites using a generic pilot credential, and once on a worker
node, call home to fetch the job of an actual user.
However, this operation mode poses several new security problems when...
Dr
Simone Campana
(CERN/IT/PSS)
9/6/07, 5:10โฏPM
Computer facilities, production grids and networking
oral presentation
The ATLAS experiment has been running continuous simulated events production since
more than two years. A considerable fraction of the jobs is daily submitted and
handled via the gLite Workload Management System, which overcomes several limitations
of the previous LCG Resource Broker. The gLite WMS has been tested very intensively
for the LHC experiments use cases for more than six months,...
Mr
Sergey Chechelnitskiy
(Simon Fraser University)
9/6/07, 5:30โฏPM
Computer facilities, production grids and networking
oral presentation
SFU is responcible for running two different clusters - one is designed for WestGrid internal
jobs with its specific software and the other should run Atlas jobs only. In addition to
different software configuration the Atlas cluster should have a diffener networking
confirugation. We would also like to have a flexibility of running jobs on different
hardware. That is why it has been...
Ms
Zhenping Liu
(BROOKHAVEN NATIONAL LABORATORY)
9/6/07, 5:50โฏPM
Computer facilities, production grids and networking
oral presentation
BNL ATLAS Computing Facility needs to provide a Grid-based storage system with these
requirements: a total of one gigabyte per second of incoming and outgoing data rate
between BNL and ATLAS T0, T1 and T2 sites, thousands of reconstruction/analysis jobs
accessing locally stored data objects, three petabytes of disk/tape storage in 2007
scaling up to 25 petabytes by 2011, and a...