SAM test issues:
- Timeout failures on svc20 (AAA server) on Friday - Jyothish removed from cluster. Telegraf and Icinga were also down. Jyothish has ticket with Fabric.
- Network problems on Saturday
- After 2. the other AAA servers and manager failed 'federation' test fairly consistently since. Restarts of the usual services by Katy and Jyothish has not fixed it.
- ARC-CE xrootd-access test requires AAA. This has failed intermittently due to 3. Fortunately not every CE is failing the test simultaneously, so we do not get a red mark in the summary.
- New tokens tests for CEs are generally working, but the 'basic' test is in warning due to jobs almost entirely landing on 2018/9 WNs which do not have IPv6 (Tom Birkett might comment).
- 'Connection' test for Antares endpoints in warning due to no IPv6 - how are the tests for the new EOS nodes going?
Job efficiency dropped sharply during the network issue on Saturday.
Suspect CMS running empty pilots again - there are major monitoring discrepancies I am seeing (Tuesday night). Have messaged Submission Infrastructure team.