● Outstanding tickets
- GGUS #145614, #145931: maybe Manchester headnode memory issue has resurfaced.
- GGUS #145688: Alessandra is discussing with RAL expert (Jose)
- GGUS #145804: Matt is investigating. Multi-core job submission to CREAM is failing.
- GGUS #145610: Glasgow Ceph test disk is working again, so Sam will close the ticket.
- GGUS #145510: James is working on timeouts accessing RAL Echo from WN jobs. For stage-in, looking at transfer times. For stage-out, thought new Rucio version 1.21.9 would fix it, but it didn't. The stage-out issue is not specific to RAL: Rod sees similar error rates from other sites.
● CPU
- Lancaster drop in submissions last Friday, but fixed. Could have been when apfmon failed? Peter reported that apfmon is being updated.
- RHUL running mostly single-core jobs. Looking to install HTCondor defrag package.
- Durham has been down for various interventions. Sam expects it to ramp up now.
- James reported that Stewart has looked at the CPU pledges. Since the pledge period is Apr-Mar, we need to compare current ATLAS reporting against 2019 pledge. The pledge lines match REBUS 2019.
● CentOS7 - Sussex
Dan discussed with Patrick a couple of days ago. The WN kernels should now be up to date. He should be ready to accept ATLAS jobs again, but not yet in HammerCloud. He should email atlas-support-cloud-uk@cern.ch to be enabled again.
● Glasgow Ceph storage
Sam will upgrade to Ceph Nautilus release. He can then check stage-in and stage-out errors. Sam commented that stage-out errors may not be the same as those experienced at other sites (see GGUS #145510 above).
● Grand Unified queues
no news.
● News round-table
- Dan sees a lot of job failures from DaviX. That should be the secondary protocol.
- Elena is investigating problems with Pilots.
- James: NETR
- Matt: NETR; jobs flowing.
- Peter: will sort out Lancaster problem ASAP
- Sam: NETR
- Tim: NETR
- Vip: NETR
● AOB
Peter requested that future reminders for this meeting be sent earlier. James agreed to remind on Tuesday.
James asked about site plans concerning quarantine for Coronavirus.
Matt said that working from home is OK for many sites.
There are minutes attached to this event.
Show them.