Gary Stiehr
(The Genome Center at Washington University)
06/05/2008, 14:00
Data centre management, availability, and reliability
Over the last couple of years, The Genome Center at Washington University in St. Louis has been involved with the planning and construction of a new data center. We will provide updates since our data center presentation at HEPiX Fall 2007 in St. Louis. In addition, we will share our experiences and lessons learned as we prepare to move into the new data center in May 2008.
Stefan Haller
(GSI)
06/05/2008, 14:30
Data centre management, availability, and reliability
Wim Heubers
(NIKHEF)
06/05/2008, 15:00
Data centre management, availability, and reliability
Extension of the NIKHEF/SARA data centre
Arne Wiebalck
(CERN)
06/05/2008, 16:00
Data centre management, availability, and reliability
CERN's AFS installation serves between 1 and 2 billion
accesses per day to its around 20'000 users. Keeping
track of the system's overall status and trying to find
problems before the users do is not a trivial task, esp.
as the installation is growing in almost all aspects.
This talk will present CERN's AFS Console, a Lemon- and
web-based monitoring tool used by the AFS...
Tony Chan
(Brookhaven National Laboratory)
06/05/2008, 16:30
Data centre management, availability, and reliability
This presentation provides an update on the status of the new Data Center to support the
ATLAS Tier 1 Center and RHIC Computing at Brookhaven. A brief discussion provides
details of the new facility to Brookhaven, as well as timelines for availability to both the
ATLAS and RHIC programs. Some of our experiences described in this presentation will
also be beneficial to other sites who are...
Tony Chan
(Brookhaven National Laboratory)
06/05/2008, 16:50
Data centre management, availability, and reliability
The RACF provides computing support to a broad spectrum of programs at
Brookhaven. The growth of the facility, the varying needs of the scientific
programs and the necessity for distributed computing requires the RACF to
change from a system to a service-based SLA with our end users. This
presentation describes the adjustments made by the RACF to transition to
a service-based SLA,...
Sebastian Lopienski
(CERN)
06/05/2008, 17:10
Data centre management, availability, and reliability
Managing large clusters that host complex services has particular challenges. Operations like checking configuration consistency, running some actions on node or nodes, moving them between clusters etc. are very frequent. When scaling up to running thousands of CPU and STORAGE nodes in order to meet LHC requirements some of these challenges are becoming more evident. These scaling challenges...
Eric Grancher
(CERN)
06/05/2008, 17:40
Data centre management, availability, and reliability
Problem tracking at CERN