UTA
- A pair of incidents with the campus chilled water supply caused disruptions. We were able to maintain storage access, but in the first incident we had to drop all of the computational load. In the second incident we lost about 1/4 of the computational load.
- Scaling up internal K8 cluster. Previous K8 cluster will be merged into SWT2_CPB
- Power balancing operations have started.
OU
- Last Wednesday was OSCER maintenance, they upgraded network switches. That apparently didn't go too well, since on Friday afternoon the core network collapsed; core switches had high CPU usage and broadcast storms or something like that. Was fixed Saturday morning.
- Also saw some fraction of stage-in transfer failures with strange IPv4 network error. Not clear if that started around that time as well, or if it was there at a low level before. A restart of xrootd (both proxy on se1 and backend storage) seems to have fixed that.
- Old OCHEP squid server stopped reporting to CERN monitoring. Not sure yet what's going on there, investigating.