Outstanding tickets
- 148234 RAL-LCG2 less urgent in progress 2020-08-12 10:38:00 RAL-LCG2 deletion errors
- Deletion into echo failure rate 10%, just a load issue? Failed deletions do complete
- 148228 UKI-SOUTHGRID-OX-HEP less urgent waiting for reply 2020-08-12 10:17:00 UKI-SOUTHGRID-OX-HEP transfer failures as destination
- 148169 UKI-SCOTGRID-ECDF less urgent in progress 2020-08-05 10:25:00 Failovers from jobs running at UKI-SCOTGRID-ECDF_CLOUD to CERN backup proxy
- 147979 UKI-NORTHGRID-MAN-HEP less urgent in progress 2020-08-04 09:28:00 UKI-NORTHGRID-MAN-HEP timeout transfer errros and also deletion errors
- 146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-08-10 10:23:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
- Mitigation still working; still exploring the main solution
- 146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
- RAL last big site to provide this; impacting on containerised workflow jobs
- 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-07-22 14:53:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- Some test jobs through, but still issues
- 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- Update to ticket; Restrictions on access; dealing with admin to get relevant systems into place
- 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- Access to to data centre now feasible. Need to consolidate pieces of kit deliveried to various places and start preparing new node.
CPU
Other new issues
Ongoing issues
- CentOS7 - Sussex
- Grand Unified queues
News round-table
-
Vip
- 896 threads added to the pool
- Noted lower efficiency; GR pointed out may just be from increase of reco jobs
-
Dan
- AC issues, but more nodes should now be available
-
Peter
-
Sam
- Xrootd; is ATLAS seeing similar issues as LHCb with streaming
- JW do see some error rate in user jobs (using direct-IO)
- recent case of production job now running in direct-IO; with similar issue
-
Gareth
- Noted wrt to job efficiency:
- special evgen ? some jobs may try to take two threads;
- Reco jobs can hit efficiency (JW: increased running due to reprocessing camapaigns)
- Performance improvements planed for CEPH / infrastructure / bonding networking; ‘timescale’
- 1400 cores; starting to hit the gridFTP limits;
-
JW
-
Patrick
There are minutes attached to this event.
Show them.