- Some reduction in production in the last 30 days.
- Two central outages:
- 1/14/24-1/16/24 Change at CERN causes BNL to fail and sites drain until they are moved to CERN FTS instance.
- 2/6/24 One of two harvester instances at CERN has a database issue. US sites using HTCondor-CE drain.
- Does not affect NET2 and Kubernetes part of CPB.
- For the month of January the Illinois site at MTW2 is offline reducing MWT2 production by about 1/3.
- Jan 2-15 the site was down to move to a new building,
- From Jan 16-22 (approximately) authentication was not working,
- From Jan 23-31 (approximately) Systems were rebuilt as RHEL9 using new puppet setup.
- There were also various hardware and power balance issues.
- NET2 had a couple of interruptions to get their 400G uplink working.
- The good news is the 400G is in service and working well!
- OU_OSCER_ATLAS generally stable and lots opportunistic jobs.
- SWT2_CPB worked most of January to get their site up running Alma Linux 9.
- Things stablelized on 2/3/24.
- CPB did not refill last week for one whole day after the harvester issue was fixed.
- Cause of the slow refilling is under investigation,
- Procurement Planning
- We need to come up with a list of extra network gear we need to spend $2-$4 million split between the Tier sites by the end of February.
- Procurement plans will likely be due by the end of March now that the equipment funding levels are known.
- Operations Planning
- Now that we are past the EL9 updates (except MSU), we need to plan for what we do going forward.
- Clearly storage tokens will need to be supported at all sites,
- Some sites need to update to OSG24/Condor24.
- All sites have all public facing servers dual stacked and supporting IPv6 except the CE at OU.
- AGLT2 and CPB still need to go to jumbo frames.