Katy was at CHEP for the last 2 meetings.

Echo problems from Friday until yesterday. Originally thought to be related to reweighting of new disk hardware, was then also blamed on the vRead change hitting Echo with more requests than normal. The number of IOps was too high. SAM tests red on Friday and Saturday. Katy put CMS into drain as jobs were failing at a high rate (lots more stage-out errros). Transfers were also failing. On sunday tests were green as the load was removed - Katy put CMS back into production.

On Monday and Tuesday SAM tests failed again and CMS went back into drain automatically. Tuesday afternoon the WN-xrootd-access (accessing Echo) continued to fail. All other tests were green after the vRead changes were removed. The xrootd-access test files were accessible. The xrootd-access tests started passing again about 5 hours after the other tests went green. This delay in passing tests after the end of an incident has been observed several times before. Suspicion that this is related to AAA redirector being blacklisted for too long - a known issue?

Batch farm upgrades have been ongoing the last week and a half, with several half-batch farm drains. CMS are currently (still) capped at 8k cores due to the suspected pressure on the network in recent weeks. This should be released when we move LHCONE off of Janet.

To Do: test Tape REST API