06/24/2022

7 nodes became blackhole nodes because of cvmfs issue, this is later diagnosed with cause from the one of the squid servers. 

06/29/2022

One of the slate squid servers sl-um-es5 stopped working  because of both iptables issue and full var partition . It caused intermittent cvmfs issues. We got 2 ggus tickets for this. 

 

06/30

From 06/28, the SAM test jobs stopped running. This started after the SAM test job team made some changes (change the leave_in_queue conditions on ETF). We could not find any obvious cause after a couple of days of debugging. Eventually we decided to restart the condor-ce services on both ATLAS gatekeepers, and that got the SAM test jobs to start to run, but it also caused all the running jobs on the gatekeepers to be removed, so about 4000 jobs got removed. 

07/06

upgraded dCache 7.2.16 to 7.2.19 (with reboot to new kernel)
Got all WNs updated and ready for reboot to new kernel.
Starting rolling drain and reboot in batches

All January 2022 order R6525 AMD Milan 7413 are shipped.
A fraction already received.