ATLAS UK Cloud Support
Vidyo
Outstanding tickets
-
148946 UKI-LT2-QMUL less urgent in progress 2020-10-07 10:34:00 Failovers from jobs running at UKI-LT2-QMUL queue
- WNs available with IPV6
-
148908 UKI-NORTHGRID-LANCS-HEP less urgent waiting for reply 2020-10-07 16:56:00 UKI-NORTH-LANCS-HEP jobs failing due to “lost heartbeat”
- Downtime for improvements with shared FS done.
- ZFS failed files; checks ongoing.
- Current HC failures with
root://fal-pygrid-30.lancs.ac.uk:1094//dpm/lancs.ac.uk/home/atlas/atlasdatadisk/rucio/data18_13TeV/96/e0/data18_13TeV.00349263.physics_Main.merge.AOD.f937_m1972._lb0150._0003.1- JW to declare HC File lost, to get HC passing again
-
148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-10-07 18:15:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- SS apoligies for absence; report via email:
- disk cleanup/deletions on the DPM are being handled locally for things on disk063, which seems to be weird dark data
- Pickup in failure rate overnight for CEPH:
- initially, it looks like putting the 40GB/s connection into the ceph cluster might have caused some load spikes. Later today I’m going to see what I can do to shape the traffic a bit here - it looks like write traffic is really the only thing seriously affected.
- SS apoligies for absence; report via email:
-
146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
- Update requested from Grid Services team on timeline
-
146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-09-11 13:35:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- No update
-
144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- No update
-
142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- No update
CPU
- RAL
- Running above pledge; CMS problems released slots
- Northgrid
- LANCS issues, and recent drop for MAN
-
London
-
SouthGrid
- Scotgrid
- GLA Ceph related issues noted above
Other new issues
Ongoing issues
- CentOS7 - Sussex
- on hold
- Grand Unified queues
- on hold
News round-table
- Vip
- 26-27th possible downtime?
- To find a time to discuss Storageless tests and plans
- Dan
- NTR; asked for relevant info from ATLAS S&C week to be passed back to T2s.
- JW mentioned moving of Data-carousel model into production mode
- Matt
- Expecting more disks to arrive
- Peter
- Raised interest in Covid working arrangements at other sites
- Sam
- Sent Appologies
- JW
- TPC-http tests reveal issue in Pulls (RAL as Dest) with writing data.