Review scheduler retry logic for archive and retrieve
- Implementing a "try again after T seconds" mechanism is complex and requires playing with the current implementation of the object store.
- In particular, we would need to create a new queue subtype to keep track of the requests that we want to retry later. This is a non-trivial task.
- The new postgreSQL scheduler will make it much easier to implement this feature in the future (#147).
- Therefore, we will not implement this yet.
As a compromise, we will modify the number of retries to 0 (zero) in the case of repack requests, as discussed in the following topic.
Handle 'unavailable' files in user and repack retrieves originated from problematic tapes
- We discussed the two options presented in issue #218.
- Both options are not mutually exclusive. However, option #2 (do not retry when repacking) is much simpler to implement and operate, while option #1 (manualy disable some files on a problematic tape) is more complex and requires changing the catalogue.
- Therefore, we will implement option #2, but will keep discussing with our external collaborators if option #1 is also necessary.
Amend code convention: include headers should use the complete path from the project root
- It was decided that we will change all the headers to full path.
- Richard will handle it.
REPACKING tape state and queue cleanup
- Release 4.8.1-1 fixed successfully the protobuf bug introduced in 4.8.0-1. The monitoring data shows this.
Several Free drive STALE because of long global scheduler lock aquisition time
- We will only mark as STALE a free drive that did not update its status in the past 4 hours (increase from 10 mins to 4 hours).
- This change is only done in the client side (backend does not calculate this).
stagerrm issues continued
- There are several stagerrm-related issues in both our operations and development pages.
- We need to aggregate all of them and discuss a common approach.
- To be discussed between Joao, Julien and Richard.
Improvements in gitlab CI workflow
- The CI stage cta_valgrind has been taking a long time, and impacts the time that it takes to merge a commit into main.
- Therefore, we will remove cta_valgrind from the list of mandatory CI stages (will be kept as optional). It will still be done as part of the scheduled CI tests.
- The person tagging the release must check that the last commits passes the Valgrind tests. It must be written as a part of the checklist!
- Besides this, the file ReleaseNotes.mb is always a source of rebase conflicts. We need to think of a strategy to avoid this conflicts (for example by clearly separating each person's commits in different files, or in different segments of the same file).