ATLAS UK Cloud Support
Vidyo
Outstanding tickets
-
148589 UKI-LT2-UCL-HEP less urgent in progress 2020-09-09 10:50:00 Failovers from UKI-LT2-UCL-HEP to CERN backup proxy
- In progress; Local user causing issues. Squid not set to be monitored in gocdb
-
148578 UKI-NORTHGRID-LANCS-HEP urgent in progress 2020-09-09 15:04:00 cannot download files from UKI-NORTHGRID-LANCS-HEP_LOCALGROUPDISK
- Some files lost, others recovered on LOCALGROUPDISK
- ZFS needs time to complete the list.
- JW - to delete the 7 LGD files.
-
148544 UKI-SCOTGRID-ECDF less urgent in progress 2020-09-07 13:09:00 UKI-SCOTGRID-ECDF failed jobs
- Possible chksum timeouts for large files
- Upgrades underway and inprogress;
-
148401 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-09-09 19:20:00 UKI-NORTHGRID-LANCS-HEP: globus_ftp_client failures
- Initial filelist declared lost
- Diskserver finally failed the disk, preparing list of lost files.
-
148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-09-04 05:40:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- Work on capacity in the ceph pool in progress
-
146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-09-05 18:57:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
- Await for upgrades to finish
-
146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
- on hold
-
146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-07-22 14:53:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- on hold
-
144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- on hold
-
142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- on hold
CPU
-
RAL
- Echo Downtime for network firmware upgrades. Needed Manual downtime to be set.
-
Northgrid
- Issues with Lancs (as described above)
-
London
- QMUL in downtime for lustre updates; will extend into next week.
-
SouthGrid
-
Scotgrid
Other new issues
- Active http TPC endpoints
- LANCS upgraded; works fine
- EDCF upgraded but some cerificate issues.
- Durham (with macaroon), only up- and down-load, not tpc; and not in functional tests.
Ongoing issues
- CentOS7 - Sussex
- No update
- Grand Unified queues
- Awaiting SHEF
News round-table
-
Vip
- Paul to send Vip instructions for DPM upgrades.
- Some manual changes needed for TPC.
- Still working on residual fallout from previous DC power issues
- Paul to send Vip instructions for DPM upgrades.
-
Dan
- In Downtime for upgrades; hardware done
- One difficult ATLAS directory (many files to verify checksums); downtime to next week to finish migration.
- Additionally, AC situation will be improved by next week.
- Change to mountpoints needed; to confirm via email.
-
Matt
- Working on the Storage issues
- More jobs running from other VOs.
-
Peter
- ATLAS will switch to python3 from cvmfs; should be transparent.
-
Alessandra
- NTR
-
Sam
- Atlas to move to 40G ceph
- To start to look at xrootd 5 for some new featuree.
-
JW
- Work on TPC with HTTP for CEPH ongoing.
AOB