AAA machines somehow lost ability to authenticate with certificates. It is unclear why this happened. Jyothish fixed it today. Certificate SAM tests failed for a couple of days, while token tests remained green.
Problems with Echo gateways since last Wednesday and over the weekend. Particular gateways were seen to be failing and timing out. These gateways were removed to mitigate the problem. SAM tests had a red overall status on Wednesday, Friday and Saturday.
CMS went into production drain...however we kept our slots due to 'Tier 0' jobs that are (still) not respecting the site status. In this case everything was fine - performance of the jobs was excellent.
I believe the IPv6 inaccessibility problem with AAA was fixed by DI. This also affected other machines not using LHCONE or LHCOPN.
Seeing a 'glitch' most days in SAM tests, affecting CE tests and sometimes others as well. Possible network disconnection? Error is:
"Job completed but failed to get job output"