Resulted in failed transfers/job uploads/downloads, and, most annoyingly, in file loss (GGUS 683184)
Due to race conditions between https and root protocols
There should be a way to mitigate this, e.g.
Poison DNS entry for webdav.echo.stfc.ac.uk on WNs (for LHCb jobs at least), so that it is redirected to local gateway as well
Introduce some locking/protection (e.g. do not execute delete if it arrived more than N minutes ago)?
13:50
→
13:55
ALICE Operations Report5m
Speaker:
Alexander Rogovskiy(Rutherford Appleton Laboratory)
13:55
→
14:00
LSST Operations Report5m
Speakers:
Mathew Sims, Timothy Noble(Science and Technology Facilities Council STFC (GB))
LSST jobs from latest pipeline failing,
We think currently this is due to the job pulling in the instructions (QG), writing out to echo any changes, then reading locally and complaining its changes were not there. - Contacting Middleware team about this.
Moving data to 'correct' location on echo, started on Friday and moved 220,000 files so far with originals and copies checksumed and if they matched old one deleted.
14:00
→
14:01
Tier-1 Projects
1m
14:15
→
14:25
Anatares Upgrade10m
New EOS nodes
Tape Robotics downtime
Speakers:
George Patargias, Thomas Byrne
14:25
→
14:35
XRootD Development10m
Speakers:
Alexander Rogovskiy(Rutherford Appleton Laboratory), Jyothish Thomas(STFC)
14:35
→
14:45
Utilizing GPUs10m
Speakers:
Jyoti Prakash Biswal(Rutherford Appleton Laboratory), Thomas Birkett
14:45
→
14:46
AOB
1m
14:46
→
14:55
Summary of Operational Status and Issues9m
Speakers:
Brian Davies(Lancaster University (GB)), Darren Moore