- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
OR click HERE
NB: Reports were not received in advance of the meeting from:
dCache: gridka-dCache.fzk.de Software update of the dCache SE on 2008-07-16 from 07:00 to 09:00 (UTC)
LFC: lfc-2-fzk.gridka.de FTS: fts2-fzk.gridka.de databases down for Oracle upgrade on 2008-07-24 from 07:30 to 11:30 (UTC)
During the past week-end an incident involving connectivity to the INFN Tier-1 occurred. On Saturday July 5, 2008 at 03:29 (all times are local) a 10 Gigabit/s interface on one of the Tier-1 core switches started flapping. This interface is part of a bundle of 4 10GE interfaces. Although the flapping should not in itself have caused much disturbance to the network infrastructure, the effect was intermittent connectivity to various sets of computers across the Tier-1.
Problem troubleshooting started immediately, with system specialists looking for possible causes of the problem already in the morning of Saturday July 5, 2008. What rendered detection of the fault not immediately obvious was that no traces of the flapping were recorded in the log files of the core switch actually exhibiting the problem. On Sunday night an official EGEE broadcast message was issued.
On Monday July 7, 2008 the priority for the replacement of the faulty network card was escalated to the highest possible level to the switch vendor. At 17:00 the faulty network card was replaced, and the network was operational again. A fallout of the network problem was that several systems were stuck and had to be rebooted.
On Tuesday July 8, 2008 at 11:00, during certification of all INFN Tier-1 subsystems and services, some other network problems were detected. At 12:30 the cause of these problems was identified through log messages in a faulty core switch management card, which caused among other problems random packet loss. Another ticket was opened with the switch vendor, and in the afternoon a replacement management card was received. The replacement of the card and a related operating system upgrade to the core switches finished at 22:00.
In the morning of Wednesday July 9, 2008 all network, storage and farm subsystems and services were checked and certified as ready for operation. But since the INFN Tier-1 was still in downtime, the decision was taken to replace the the broken component of the electrical switch mentioned above on Thursday July 10, 2008.
Now all services (network, farming and storage) are up and running.
Time at WLCG T0 and T1 sites.