Outstanding tickets
- 149842 UKI-SCOTGRID-ECDF less urgent assigned 2020-12-09 11:15:00 UKI-SCOTGRID-ECDF: Low transfer efficiency due to TRANSFER ERROR: Copy failed with mode 3rd pull, wi…
- Davs ECDF https transfers; possible headnodes overloaded, compared to other protocols (interpretation from Sam)
- Rob looking into this
- 149811 UKI-LT2-QMUL less urgent in progress 2020-12-09 16:16:00 Transfer and deletion errors from UKI-LT2-QMUL as dst site
- Storage back online; needs rebuilding of several systems for Compute nodes
- ProxMox cluster taken down. HP SSD running journals, with uptime bug that bricked after x-hours. 2 out 3 SSDs taken out.
- Positive comments regarding ProxMox made; Runs on debian/ubuntu
- Downtime next week for power work
- 149750 UKI-SOUTHGRID-RALPP less urgent in progress 2020-12-09 11:50:00 UKI-SOUTHGRID-RALPP: unable to connect to host
- IPv4 problems to site with FTS transfers via Rucio.
- Site will attempt router reboot to fix
- Also exposed bug in rucio for default IPvX version, if not specified in RSE.
- RSE default looks to be update, which is causing succesful transfers over, by using IPv6.
- 149738 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-12-09 14:16:00 UKI-NORTHGRID-LANCS-HEP: deletion errors
- two sets of files declared lost.
- Ongoing unique set attempting to be recovered. Will stop by Monday.
- 149362 UKI-SOUTHGRID-RALPP urgent in progress 2020-12-04 10:14:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
- No progress; however may have some relation to IPv4/6 differences; to be followed-up.
- 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-11-27 10:00:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- Recieved file-list from disk 40. Some might be recoverable, but unlikely.
- To be declared lost once cleaned from namespace,
- JW: to create Jira, and get unique files
- 146651 RAL-LCG2 urgent on hold 2020-10-16 11:56:00 singularity and user NS setup at RAL
- 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-11-05 10:52:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- Arc now working correctly. LDAP issue; not started. Adding more nodes; but network failures in the DC to be fixed.
- Final nodes need provisioning, aim to finish early next year.
CPU
-
RAL
- HC test failues (due to updated root version in one of the tests) caused sites to go into test. Recovery of lost slots taking time.
-
Northgrid
- Lancs: Mis-config of submission dir on the nfs mounts; should now be fixed
-
London
- QMUL issues (as reported above)
-
SouthGrid
- OX observed similar HC dip to RAL
-
Scotgrid
- Durham; problematic disk server over weekend.
- Glasgow; some additional cores added; running with 40 kHS06.
Other new issues
- Glasgow Site Avail/Rel
- ETF information appears to be correct, but interpretation from the ATLAS Topology enrichment via VOFeed to be understood and updated.
Ongoing issues
- CentOS7 - Sussex
- TPC with http
- Storageless Site tests (Oxford)
- No progress; discussions ongoing on how to configure the arc-ce queues
- ECDF volatile storage
- Ticket updated; number of config changes needed from ATLAS side; JW to follow-up.
- Glasgow DPM Decommissioning
- Still need LOCALGROUPDISK setup on Ceph. Discussion on the pool name, vs endpoint naming.
News round-table
- Vip
- Dan
- Matt
- Peter
- Sam
- Gareth
- JW
- Patrick
AOB
- Future meetings to use new Cern hosted zoom room, integrated into indico.
- Next week 17th, last Cloud support Mtg of the year. Expect to then restart on 7th.
There are minutes attached to this event.
Show them.