CERN tier-0 Site Report - 26 March 2007 ======================================= Castor ------ * Performance problems in the Atlas Stager DB caused by lack of physical memory, the computer was swapping (as shown in the lemon graphs). Added 1 Gb of memory and restarted the instance. The performance has significantly increased while keeping the load in the machine quite low. * Database corruption episodes on Atlas Stager and LHCb DLF databases fixed or being fixed. * Over 150 CASTOR disk servers have been drained, reinstalled and reconfigured to be moved on a configuration "ballet" in order to allow for the interventions to fix the microcode on Western Digital disks. * As announced earlier, we plan to move the Castor nameserver database to new hardware on April 2. At the same time, three switches hosting various Grid services will be replaced. As the Castor nameserver is shared between all Castor services, the Castor service will be unavailable during this upgrade. Lxbatch will therefore be paused or drained. The intervention will start at 08:00, and services should be resumed by 10:30. Tape ---- * A series of problems accumulated to cause long wait times (up to 2 days) for the users of the T10K robot * A hardware failure of the SL8500 513 robot over the weekend meant that no tape mounts in T10KR1 were possible from Sunday afternoon to Monday morning. Sun is investigating the root cause but no explanation given yet. * IBM robot stability has improved following move to Squid protected web server but a problem was seen soon after SNMP polling was turned on. We will run for a few more days with it turned off again to check that the problem is related to the SNMP server and then report findings to IBM. Site Grid Services ------------------ * CE101/CE102 reinstalled and back in production * CE105 retired * Further plans: - We'll retire batch HW based CEs one by one, ie. schedule a downtime until they are drained, and then reinstall them as batch worker nodes (as done for ce105) Core Grid Services ------------------ * Middleware upgrade (update 18) done on all the gLite WMS 3.0. * All the old gdrbxx nodes (LCG RBs) removed from production definitely next Friday 22 March 2007. Physics Database Services ------------------------- * We are investigating together with CS an interference betwwen the new firewall set-up and the database replication to the Tier1s. A 3D DBA workshop has been organized at SARA, Amsterdam, 20th,-21st March, which was attended by experiments and sites DBAs. The workshop had as main focus the 3D deployment in production mode, foreseen for April 2007.