Operational issues:
- Upload failures on RAL (WNs) -> CERN channel
- Issues on WNs last Sunday
- Both Job and transfer failures were present;
- The most meaningfull error message came from one of the jobs --
runtime/cgo: pthread_create failed: Resource temporarily unavailable
;
- The issue disappeared on Monday;
- Might be some PID limit excess, Tom would like to upgrade it to the newer version (after the break, obviously);
- Ticket GSTSM-277 is opened.
- LHCb pilots are killed at RAL due to memory excess
- That looks suspicious since all types of jobs are suffering from it
- Could it be some wrong memory accounting?
- Some WNs have >0 values in cvmfs IO error counter, it causes warnings in ETF tests.
Other:
- Writeable WN GW sandbox deployed to a few WNs on the preprod farm last week
- Did not work
- A few bugs found, to be fixed (in the new year)