Review scheduler retry logic for archive and retrieve
- We decided to apply a minimum "time window" before canceling any job request.
- Before this threshold is achieved, we should not cancel the request. Instead, if necessary, we should be delaying/retrying them. The exact details need to be defined.
- The delay should be applied per tape file. We can take advantage of different tape copies to by pass delays on some tapes.
- TODO: Write document proposing new behaviour/approach.
Handle 'unavailable' files in user and repack retrieves originated from problematic tapes
- We decided to change the approach to this problem. Instead of making use of the IS_ACCESSIBLE column (needs to be reverted on the git repo, before any new release), we will simply remove all the retry logic from the repack retrieve requests. This will mean that the operators can quickly get a list of all tape files that failed to retrieve (files that remain on the tape after the repack). Then, they can manually issue a new repack, mount on a different tape drive, or simply handle the tape as they desire.
- Vlado will write a document on how the retry logic should be done for repacking (failed segments), taking into account the discussion during this meeting.
- Catalogue commits are to be reverted from main and put back into a separate branch. The commit that adds IS_ACCESSIBLE should be removed from this branch.
Allow VO override for repack
- We won't be discussing this for now. Once we are more familiar with operating the new REPACKING behaviours - after new year's eve - we will revisit this topic.
REPACKING tape state and queue cleanup - Wrong WARNING messages
- For now, operations will filter out these messages, since they are not a problem.
- They will permanently be removed (or have their priority reduced) in a future commit.
Several Free drive STALE because of long global scheduler lock aquisition time
- The only thing to do on the dev side is to increase the STALL constant. The rest will be handled by operations.
r_alice_test_datachallenge archives queues not being absorbed
- Vova will create an dev issue and link to the existing ops issue.