CTA Dev Meeting
-
- 1
-
2
CTA Release Roadmap
Release 4.8.1-1
- Milestones link
- Release date: 12 Dec
- Pre-prod deployment date: 12 Dec
- Prod deployment date: 12 Dec
- Fixed protobuf error introduced by the previous release v4.8.0-1/v5.8.0-1.
Release 4.8.2-1
- Milestones link
- Release date: 14 Dec
- Pre-prod deployment date: 15 Dec
- Prod deployment date: -
- Optimisations on catalogue DB queries
Public Release
- Latest version available on public repo: v4.7.14-1, v5.7.14-1
- Release 4.8.2-1 was done in a separate branch. It needs to be merged back into main.
- We don't plan any other tagged releases before Christmas.
- Jacek and Michael discussed creating a merge request related to the issue #242 (cta-frontend-grpc - problem with loading pem_root_certs), so that it can be reviewed by Michael.
-
3
CTA dev topics
Review scheduler retry logic for archive and retrieve
- We decided to apply a "pause" between all retries. To be discussed how this delay shall be performed.
- For details check #37
Handle 'unavailable' files in user and repack retrieves originated from problematic tapes
- How to handle unavailable files, during the repack retrieve workflow?
- Use is_unavailable flag vs reducing repack retrieve retries to zero.
- For details check #218
Amend code convention: include headers should use the complete path from the project root
- Use full path vs relative file locations in header files.
- For details check #249
Allow VO override for repack
- For details check #31
REPACKING tape state and queue cleanup
- Feedback of deploying v4.8.1-1 fix on production (fixed protobuf error - #ops-937).
Several Free drive STALE because of long global scheduler lock aquisition time
- For details check ops issue #ops-929
stagerrm issues continued
- For details check ops issue #ops-943
- There are several other stagerrm issues ongoing, such as #152, #151. We should have an unified approach.
221213 Database intervention
- For details check ops issue #ops-948
journalctl filling disk causing problems in CI
- For details check ops issue #ops-956
"Needs discussion" topics
"Dev issue needed" topics
Review scheduler retry logic for archive and retrieve
- Implementing a "try again after T seconds" mechanism is complex and requires playing with the current implementation of the object store.
- In particular, we would need to create a new queue subtype to keep track of the requests that we want to retry later. This is a non-trivial task.
- The new postgreSQL scheduler will make it much easier to implement this feature in the future (#147).
- Therefore, we will not implement this yet.
As a compromise, we will modify the number of retries to 0 (zero) in the case of repack requests, as discussed in the following topic.
Handle 'unavailable' files in user and repack retrieves originated from problematic tapes
- We discussed the two options presented in issue #218.
- Both options are not mutually exclusive. However, option #2 (do not retry when repacking) is much simpler to implement and operate, while option #1 (manualy disable some files on a problematic tape) is more complex and requires changing the catalogue.
- Therefore, we will implement option #2, but will keep discussing with our external collaborators if option #1 is also necessary.
Amend code convention: include headers should use the complete path from the project root
- It was decided that we will change all the headers to full path.
- Richard will handle it.
REPACKING tape state and queue cleanup
- Release 4.8.1-1 fixed successfully the protobuf bug introduced in 4.8.0-1. The monitoring data shows this.
Several Free drive STALE because of long global scheduler lock aquisition time
- We will only mark as STALE a free drive that did not update its status in the past 4 hours (increase from 10 mins to 4 hours).
- This change is only done in the client side (backend does not calculate this).
stagerrm issues continued
- There are several stagerrm-related issues in both our operations and development pages.
- We need to aggregate all of them and discuss a common approach.
- To be discussed between Joao, Julien and Richard.
Improvements in gitlab CI workflow
- The CI stage cta_valgrind has been taking a long time, and impacts the time that it takes to merge a commit into main.
- Therefore, we will remove cta_valgrind from the list of mandatory CI stages (will be kept as optional). It will still be done as part of the scheduled CI tests.
- The person tagging the release must check that the last commits passes the Valgrind tests. It must be written as a part of the checklist!
- Besides this, the file ReleaseNotes.mb is always a source of rebase conflicts. We need to think of a strategy to avoid this conflicts (for example by clearly separating each person's commits in different files, or in different segments of the same file).
-
4
CTA dev board review
Objective
- Look at the active issues in our CTA dev board.
- Decide if they should be kept, removed, reassigned, prioritised, etc.
Review "In progress" issues
- Full CTA board: link
Review specific topic
- Every week we review the issues of one topic.
- This week: "Scheduler" and "Object Store" labels
-
5
AOB
Other
- Room 513/R-068 is booked every week, until the EOY, for the CTA dev meeting.