147390 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-06-09 10:48:00 Failovers from jobs running at UKI-SCOTGRID-GLASGOW_CEPH to CERN backup proxy
- Routing issue between sides of DC; attempt some static routing, but will physical access to finally resolve
147361 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-06-08 07:43:00 Deletion errors at UKI-SCOTGRID-GLASGOW
- Check if files lost are not in our DPM DB they need to removed on the ATLAS side?
- Tricky to delete multiple replicas; risk to delete the whole object, not just on disk039.
- JW To ask the DDM OPs people.
146918 UKI-SCOTGRID-ECDF less urgent in progress 2020-06-09 10:03:00 Failovers from jobs running at UKI-SCOTGRID-ECDF_CLOUD to CERN backup proxy
- Still with other pressing priorities
146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-06-07 16:11:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
- Problematic xroot with DPM, plan still to upgrade to centos 7.
146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL
146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-06-06 23:57:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- Queue set to TEST, progress being made
145688 UKI-NORTHGRID-MAN-HEP less urgent on hold 2020-04-02 09:20:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP
145510 RAL-LCG2 urgent on hold 2020-05-13 13:07:00 RAL-LCG2: timeouts on stage-in/outs
- To close -> DirectIO comparisons
144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- As previously; needs to change the HW.
142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- Updated; physical access required to manage migration
Pledge lines - only visible in 30day mode now
Other new issues
CentOS7 DPM Lancs
CentOS7 - Sussex
Glasgow Ceph storage
- Not BW problems - unlikely in the ceph cluster
- Problems seems in the disk cache. If problem getting a file, will store the truncated file? Hence poisoned by the corrupt copies?
- Using xrootd 4.12, compiled.
- Try perhaps a 4.11? (Can use the exact version that RAL uses,
- CEPH itself appears more stable after configurations
Grand Unified queues
- Migration to centos7 for several services in progress
- School closures continue to interupt work as normal
- DPM 1.14 in testing; needed for TPC tests in production; contains puppet and memory libraries (to avoid full mem)
- Petr, RAL off RAL-FTS (on to CERN), to have the TPC capabilities
- TPC; transfers (xrootd) to test, have checksum issues: Too slow for the stress-test. Can it be improved by checksumming close to the storage?
- Can also reduce the number of simultaneous connections?
- Petr pushing to look at http (may be the eventual prefered protocol)
- curent issues with the the xrootd server, not the protocol
There are minutes attached to this event.