ATLAS UK Cloud Support
Vidyo
Outstanding tickets
-
147390 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-06-09 10:48:00 Failovers from jobs running at UKI-SCOTGRID-GLASGOW_CEPH to CERN backup proxy
- Routing issue between sides of DC; attempt some static routing, but will physical access to finally resolve
-
147361 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-06-08 07:43:00 Deletion errors at UKI-SCOTGRID-GLASGOW
- Check if files lost are not in our DPM DB they need to removed on the ATLAS side?
- Tricky to delete multiple replicas; risk to delete the whole object, not just on disk039.
- JW To ask the DDM OPs people.
-
146918 UKI-SCOTGRID-ECDF less urgent in progress 2020-06-09 10:03:00 Failovers from jobs running at UKI-SCOTGRID-ECDF_CLOUD to CERN backup proxy
- Still with other pressing priorities
-
146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-06-07 16:11:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
- Problematic xroot with DPM, plan still to upgrade to centos 7.
-
146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL
- In todo list
-
146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-06-06 23:57:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- Queue set to TEST, progress being made
-
145688 UKI-NORTHGRID-MAN-HEP less urgent on hold 2020-04-02 09:20:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP
- On hold
-
145510 RAL-LCG2 urgent on hold 2020-05-13 13:07:00 RAL-LCG2: timeouts on stage-in/outs
- To close -> DirectIO comparisons
-
144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- As previously; needs to change the HW.
-
142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- Updated; physical access required to manage migration
CPU
Pledge lines - only visible in 30day mode now
Other new issues
Ongoing issues
-
CentOS7 DPM Lancs
- NTR
-
CentOS7 - Sussex
- As mentioned above
-
Glasgow Ceph storage
- Not BW problems - unlikely in the ceph cluster
- Problems seems in the disk cache. If problem getting a file, will store the truncated file? Hence poisoned by the corrupt copies?
- Using xrootd 4.12, compiled.
- Try perhaps a 4.11? (Can use the exact version that RAL uses,
- CEPH itself appears more stable after configurations
-
Grand Unified queues
- Awaiting SHEF
News round-table
-
Vip
- NTR
-
Dan
- Migration to centos7 for several services in progress
-
Matt
- NTR
-
Peter
- School closures continue to interupt work as normal
-
Alessandra
- DPM 1.14 in testing; needed for TPC tests in production; contains puppet and memory libraries (to avoid full mem)
- Petr, RAL off RAL-FTS (on to CERN), to have the TPC capabilities
- DPM 1.14 in testing; needed for TPC tests in production; contains puppet and memory libraries (to avoid full mem)
-
Sam
- NTR
-
Gareth
- NTR
-
Tim
- TPC; transfers (xrootd) to test, have checksum issues: Too slow for the stress-test. Can it be improved by checksumming close to the storage?
- Can also reduce the number of simultaneous connections?
- Petr pushing to look at http (may be the eventual prefered protocol)
- curent issues with the the xrootd server, not the protocol
- TPC; transfers (xrootd) to test, have checksum issues: Too slow for the stress-test. Can it be improved by checksumming close to the storage?
-
JW
- NTR