Personnel update/correction:
- Wenjing came back from China before the full travel ban took effect.
Now working from home during self-quarantine period.
Tickets:
- old / now solved ticket 144783 12-Jan-2020 AGLT2: lost heartbeat
- new / assigned ticket 144982 28-Jan-2020 AGLT2: lost heartbeat.
Found and retired one particular worker node failing all jobs
for what looked like a file system problem, but probably not related.
No other acute problem found.
Still suspect that most of these errors came from the global pilot problem active around that time.
Currently only 5% of failures come from lost heartbeat.
https://bigpanda.cern.ch/errors/?computingsite=AGLT2_UCORE&jobstatus=failed
Hardware:
- Last R740XD2 online and in production for dcache.
Finishing migration and retirement of oldest dcache disk shelves at MSU.
Services:
- xrootd.aglt2.org certificate SANs restored.