- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
It would be good to get the token tests green for CMS. These SAM tests have been running for a while, and although they do not affect the 'site status' yet, it would be good to start working on this now that the Tape REST API is in place. You can see the tests here: https://cmssst.web.cern.ch/siteStatus/detail.html?site=T1_UK_RAL
The 'federation' SAM test for AAA machines has been failing since early Tuesday morning. This is a CMS-wide problem, affecting many European sites. It is being investigated on the CMS side.
Also for AAA, the EU collector has been turned off and we are warned that if any machine with this config attempts a restart then it will fail. The collector was turned off due to the owner not wanting to update the OS. Shoveler will replace it in time. The RAL-based AAA proxies already had this monitoring commented-out. Jyothish and I removed the monitoring from the various redirectors under our control - committed in Aquilon.
Katy is attempting to further test Shoveler and validate on behalf of CMS. Alessandra will also do some work on this for ATLAS but perhaps later in the year. Jyothish already had a new VM which will be the 'production' Shovler instance but it hasn't been sending any monitoring information. We suspect the firewall is not open to this VM and made a ticket to DI requesting this. Hopefully then we will immediately see data in the plots.
We also need Shoveler to run on the WN gateways. This would add a line to the Xrootd config on each WN gateway. Katy to test on the CMS test WN and report back. When it is working request a roll-out on the batch farm.
The new AAA proxy machine (svc20) is now being monitored in Vande. It seems to show the same number of xrootd connections as the other machines but the throughput is higher. My assumption is this is expected due to it being a newer, better machine. Jyothish confirmed that the number of xrootd connections being the same is expected due to the round-robin assignment of requests.
Discussion this week at RAL in the #networking Slack channel over the IPv6 connectivity of the AAA machines. A ticket has been sent to DI.
Job performance variable again - further issues being investigated on the CMS side for particular campaigns with very low efficiency.
CMS job submission to use EL9 queue only - this seems to be mostly working. CMS think only EL9 is being used. However at RAL we see a few jobs are still EL7 - Jose pointed out they are coming from one particular CMS machine. Katy has requested more information about this.
Transfer failures to Antares from both CERN and Echo since last Tuesday evening were caused by an upgrade issue and the xrootd version not being 'pinned'. The Antares team fixed this. CMS has not had a lot of tape activity this week so was not affected by the issue over garbage collection. Katy is still following up the handful of production transfers that have been failing for several weeks - CMS DM has a ticket and we are currently a bit confused about Rucio's behaviour.
Operational issues: