Hardware:
- no big change, no major issue, typical maintenance.
- Progress continues on retiring older T2 storage at MSU and T3 storage at UM.
Services:
- One new ggus ticket (144783) about jobs losing heartbeat. We verified at site.
The number of jobs losing heartbeat has been consistent at the site, about 100-200 jobs per day.
This also seems to have similar symptoms as seen at other sites (see MWT2 ticket 144756)
and tentatively tracked down to the pilot with a fix recently put in place.
- Condor Problem: on Jan 21st, starting around 4am, the running jobs in condor started to drop down to 20%
spent a few hours investigate, eventually rebooting the Condor central server
and another Tier 3 submission machine solved this problem.
- Getting close to adding (restoring) xrootd.aglt2.org SAN to dcache doors SSL certificate.
NOTE: Wenjing Wu is on vacation starting today through the next two weeks and then will be working for one week from China (use non-Gmail email to reach her: wuwj@ihep.ac.cn or wwu@cern.ch ) Back on the 17th of February