CTA Dev Meeting
-
- 16:00 → 16:10
-
16:10
→
16:20
CTA Release Roadmap 10m
Release 4.8.0-1
- Milestones link
- Release date: 28 Nov
- Pre-prod deployment date: 30 Nov
- Prod deployment date: 5 Dec
Release 4.8.1-1 - DELAYED (see minutes)
- Milestones link
- Release date: TBD (already stress tested)
- Pre-prod deployment date: TBD
- Prod deployment date: TBD
- Both 4.8.1-1 and 4.8.2-1 will be released simultaneously
Release 4.8.2-1 - DELAYED (see minutes)
- Milestones link
- Release date: TBD
- Pre-prod deployment date: TBD
- Prod deployment date: TBD
- Catalogue v13 release
- Both 4.8.1-1 and 4.8.2-1 will be released simultaneously
Public Release
- Version available on public repo: v4.7.14-1, v5.7.14-1
- New versions will be released after we have used it internally in production.
- We have decided to change the approach on the issue #218 (unavailable files). Instead of using the new IS_ACCESSIBLE column, we will just not do any retries on repack retrieve requests. Therefore, we need to revert some of the existing commits, and there will be no urgency to release 4.8.2-1 (catalogue v13.0 )for now.
-
16:20
→
16:30
CTA dev topics 10m
Review scheduler retry logic for archive and retrieve
- The retry logic should take into account the type of the error found, in order to decide if and how long the queue should sleep between retries.
- For details check #37
Handle 'unavailable' files in user and repack retrieves originated from problematic tapes
- When to check for unavailable files, during the retrieve workflow?
- For details check #218
Allow VO override for repack
- For details check #31
REPACKING tape state and queue cleanup
- Feedback of deploying version v4.8.0-1 on production.
Several Free drive STALE because of long global scheduler lock aquisition time
- For details check ops issue #ops-929
r_alice_test_datachallenge archives queues not being absorbed
- For details check ops issue #ops-918
"Needs discussion" topics
"Dev issue needed" topics
Review scheduler retry logic for archive and retrieve
- We decided to apply a minimum "time window" before canceling any job request.
- Before this threshold is achieved, we should not cancel the request. Instead, if necessary, we should be delaying/retrying them. The exact details need to be defined.
- The delay should be applied per tape file. We can take advantage of different tape copies to by pass delays on some tapes.
- TODO: Write document proposing new behaviour/approach.
Handle 'unavailable' files in user and repack retrieves originated from problematic tapes
- We decided to change the approach to this problem. Instead of making use of the IS_ACCESSIBLE column (needs to be reverted on the git repo, before any new release), we will simply remove all the retry logic from the repack retrieve requests. This will mean that the operators can quickly get a list of all tape files that failed to retrieve (files that remain on the tape after the repack). Then, they can manually issue a new repack, mount on a different tape drive, or simply handle the tape as they desire.
- Vlado will write a document on how the retry logic should be done for repacking (failed segments), taking into account the discussion during this meeting.
- Catalogue commits are to be reverted from main and put back into a separate branch. The commit that adds IS_ACCESSIBLE should be removed from this branch.
Allow VO override for repack
- We won't be discussing this for now. Once we are more familiar with operating the new REPACKING behaviours - after new year's eve - we will revisit this topic.
REPACKING tape state and queue cleanup - Wrong WARNING messages
- For now, operations will filter out these messages, since they are not a problem.
- They will permanently be removed (or have their priority reduced) in a future commit.
Several Free drive STALE because of long global scheduler lock aquisition time
- The only thing to do on the dev side is to increase the STALL constant. The rest will be handled by operations.
r_alice_test_datachallenge archives queues not being absorbed
- Vova will create an dev issue and link to the existing ops issue.
-
16:30
→
16:40
CTA dev board review 10m
Objective
- Look at the active issues in our CTA dev board.
- Decide if they should be kept, removed, reassigned, prioritised, etc.
Review "In progress" issues
- Full CTA board: link
Review specific topic
- Every week we review the issues of one topic.
- This week: "Scheduler" and "Object Store" labels
-
16:40
→
16:50
AOB 10m
Other
- Room 513/R-068 is booked every week, until the EOY, for the CTA dev meeting.