Speaker
Mr
Martin Bly
(STFC/RAL)
Description
The GRIDPP Tier-1 Centre at RAL is one of 10 Tier-1 centres worldwide preparing for
the start of LHC data taking in late 2007. The RAL Tier-1 is expected to provide a
reliable grid-based computing service running thousands of simultaneous batch jobs
with access to a multi-petabyte CASTOR-managed disk storage pool and tape silo, and
will support the ATLAS, CMS and LHCb experiments as well as many other experiments
already taking or analysing data.
The RAL Tier-1 is already well advanced towards readiness for LHC data-taking. We
describe some of the reliability and performance issues encountered with
various generations of storage hardware in use at RAL and how the problems were
addressed.
We describe the networking challenges for shipping late volumes of data into
and out of the Tier-1 storage systems, and system to system within the Tier-1, and
the changes made to accommodate the expected data volumes.
We describe the scalability and reliability issues encountered with the
grid-services and the various strategies used to minimise the impact of problems,
including multiplying the number of service hosts, splitting services across a number
of hosts, and upgrading services to more resilient hardware.
Authors
Dr
Andrew Sansum
(STFC/RAL)
Mr
Martin Bly
(STFC/RAL)