ATLAS UK Cloud Support
→
Europe/London
Vidyo
Vidyo
,
Outstanding tickets
- 148719 UKI-LT2-IC-HEP less urgent in progress 2020-09-22 19:43:00 Failovers from UKI-LT2-IC-HEP to CERN CVMFS backup proxy
- Active discussion on ticket
- 148401 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-09-22 12:47:00 UKI-NORTHGRID-LANCS-HEP: globus_ftp_client failures
- Files declared lost (again, with typo fixed); few residual files to be investigated once Matt is back.
- 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-09-15 12:59:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- Internal deltions complete; Sam to update ticket
- 146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
- On hold
- 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-09-11 13:35:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- On hold
- 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- On hold
- 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- No update
CPU
- Number of ATLAS / CERN issues affecting sites.
-
New Pilot version misreporting corecount:
- Affected scaling values used for acounting (e.g. wallclock and slots used)
- Killed jobs (incorrectly) with incorrectly calculated men-limit values.
- fix deployed yesterday, and rolling out
-
Update to VOMS server yesterday introduced issues:
- upgraded VOMS server issued a VOMS extension that could not be validated by existing (and supported) VOMS C/C++ libraries.
- The problem was observed on XRootD since XRootD links against VOMS libraries, but any C/C++ software linking against the VOMS library would be affected (e.g., the StoRM frontend server).
- change has been rolled back; but ATLAS may still have some lingering effects?
-
Harvester_Central_B stopped submitting jobs this morning - under investigation
-
All storm sites blacklisted since VOMS incident + pilot update (may be related to the VOMS issue?):
- “pilot, 1324: Service not available at the moment”
-
-
RAL
- Small drop in jobs due to pilot problems; now slowly claiming back jobs from other VOs
- Not seemingly affected by other issues.
-
Northgrid
- All jobs dropped off.
-
London
- All jobs dropped off.
- QMUL breifly back up to 20kHS06 before new issues arose
-
SouthGrid
- Most sites gone; BHAM not affected
-
Scotgrid
- Most sites gone; ECDF not affected
Other new issues
- GLASGOW:
- CEPH_DATADISK no longer in TEST (set to DATADISK in AIGS)
- DPM DATADISK now set as test
- PQ set offline for DPM queues
- QMUL:
- Space reporting now ok
- Additional space for ATLAS (with some further space coming)
Ongoing issues
-
CentOS7 - Sussex
- No update
-
TPC with http
- No update
News round-table
-
Dan
-
1/2 PB further to add for ATLAS
- ATLAS to propose spacetoken split
-
-
Peter
- Learning arc-ce
-
Sam
-
Reported on discussion in Storage mtg. on future planning,
- e.g. moving to Storageless sites (even if storage not initially decommissioned):
-
To hold of final commissioning, until voms / related issues are resolved.
-
-
Gareth
- Noted general problems due to the VOMS issues
-
JW
- NTR
AOB
- Move to Zoom?
- No strong preference in either direction;
- Noted that additional (organsiation) overhead on Host may be the deciding factor.
- No strong preference in either direction;
There are minutes attached to this event.
Show them.