New Rucio RSE: RAL-LCG2-ECHO_TEST for ATLAS FT of webdav transfers: points to /atlas:test/
Initial set of transfers looks ok.
Checksums from Xrootd: xrootd-ceph does not by default add XrdCks.adler32 as xattr
May cause issues in, for example, FTS transfers, where DEST wants to verify checksum.
Propose to add this as default behavious
Oxford almost ready to start initial tests with Xcache; reminder that RAL will serve as the endpoint.
RAL will need to accept Ox certificate for forwarding proxy with read-only access to ATLAS data.
Are there continuing issues with the Arc-CE's ? Occasional drops in jobs / changes in (idle) jobs. Not easily explained by ATLAS observations.
SAM tests on the AAA proxies and manager are no longer failing. AAA was quiet over the weekend so all SAM tests were green.
AAA became more busy since yesterday, and interestingly the maximum cap, which previously always appeared to be ~1GB/s is now going to ~1.5GB/s. This may be a result of my change to the throttling last week, which I increased from 200 to 400 concurrent connections on ceph-gw10 only. Puzzlingly, both gw10 and gw11 seem to have increased their max rate equally. I would like to play further with this setting, although I do have 2 observations (perhaps related to each other) on making such changes: 1. They don't seem to come into use for some days after making the change. 2. I am not sure if there is some particular order I should be restarting services on the various redirectors.
Anyway, some of the WN tests are failing intermittently, and yesterday these were enough (10% or more) to make the whole day fail site readiness. It's not just the WN-xrootd-access test failing, although this is one. Also seeing a couple of failures on WN-analysis and CONDOR-jobSubmit.
1. Low number of running jobs
a. Likely just a fairshare issue
b. Would like to have some documentation about this
2. Streaming issue for user jobs
a. Waiting for development of fix
. What is the time scale (in the 2nd week of February now)
b. Testing lower value of block size as mitigation
. Not sure if it has helped. Checks ongoing
3. Failing user jobs not able to open files
a. Related to the GGUS that was opened in December and closed in January
b. Edge case for what was fixed within LHCb software in January
. Run I simulation, which cannot be fixed in this method (uses xroot v3.x)
No complaints / running reasonably smoothly at RAL