CTA Dev Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Michael Davis (CERN)
Videoconference
CTA Dev Meeting
Zoom Meeting ID
68646510714
Description
CTA Dev Meeting
Host
Michael Davis
Useful links
Join via phone
Zoom URL
    • 2:00 PM 2:10 PM
      CTA Release Workflow 10m
    • 2:10 PM 2:20 PM
      CTA Release Roadmap 10m

      See CTA Release Roadmap

      Release 4.8.1-1

      • Milestones link
      • Release date: 12 Dec
      • Pre-prod deployment date: 12 Dec
      • Prod deployment date: 12 Dec
      • Fixed protobuf error introduced by the previous release v4.8.0-1/v5.8.0-1.

      Release 4.8.2-1

      • Milestones link
      • Release date: 14 Dec
      • Pre-prod deployment date: 15 Dec
      • Prod deployment date: -
      • Optimisations on catalogue DB queries

      Public Release

      • Latest version available on public repo: v4.7.14-1, v5.7.14-1
      • Release 4.8.2-1 was done in a separate branch. It needs to be merged back into main.
      • We don't plan any other tagged releases before Christmas.
      • Jacek and Michael discussed creating a merge request related to the issue #242 (cta-frontend-grpc - problem with loading pem_root_certs), so that it can be reviewed by Michael.
    • 2:20 PM 2:30 PM
      CTA dev topics 10m

      Review scheduler retry logic for archive and retrieve

      • We decided to apply a "pause" between all retries. To be discussed how this delay shall be performed.
      • For details check #37

      Handle 'unavailable' files in user and repack retrieves originated from problematic tapes

      • How to handle unavailable files, during the repack retrieve workflow?
      • Use is_unavailable flag vs reducing repack retrieve retries to zero.
      • For details check #218

      Amend code convention: include headers should use the complete path from the project root

      • Use full path vs relative file locations in header files.
      • For details check #249

      Allow VO override for repack

      • For details check #31

      REPACKING tape state and queue cleanup

      • Feedback of deploying v4.8.1-1 fix on production (fixed protobuf error - #ops-937).

      Several Free drive STALE because of long global scheduler lock aquisition time

      stagerrm issues continued

      • For details check ops issue #ops-943
      • There are several other stagerrm issues ongoing, such as #152, #151. We should have an unified approach.

      221213 Database intervention

      journalctl filling disk causing problems in CI

      "Needs discussion" topics

      "Dev issue needed" topics

      Review scheduler retry logic for archive and retrieve

      • Implementing a "try again after T seconds" mechanism is complex and requires playing with the current implementation of the object store.
      • In particular, we would need to create a new queue subtype to keep track of the requests that we want to retry later. This is a non-trivial task.
      • The new postgreSQL scheduler will make it much easier to implement this feature in the future (#147).
      • Therefore, we will not implement this yet.

      As a compromise, we will modify the number of retries to 0 (zero) in the case of repack requests, as discussed in the following topic.

      Handle 'unavailable' files in user and repack retrieves originated from problematic tapes

      • We discussed the two options presented in issue #218.
      • Both options are not mutually exclusive. However, option #2 (do not retry when repacking) is much simpler to implement and operate, while option #1 (manualy disable some files on a problematic tape) is more complex and requires changing the catalogue.
      • Therefore, we will implement option #2, but will keep discussing with our external collaborators if option #1 is also necessary.

      Amend code convention: include headers should use the complete path from the project root

      • It was decided that we will change all the headers to full path.
      • Richard will handle it.

      REPACKING tape state and queue cleanup

      • Release 4.8.1-1 fixed successfully the protobuf bug introduced in 4.8.0-1. The monitoring data shows this.

      Several Free drive STALE because of long global scheduler lock aquisition time

      • We will only mark as STALE a free drive that did not update its status in the past 4 hours (increase from 10 mins to 4 hours).
      • This change is only done in the client side (backend does not calculate this).

      stagerrm issues continued

      • There are several stagerrm-related issues in both our operations and development pages.
      • We need to aggregate all of them and discuss a common approach.
      • To be discussed between Joao, Julien and Richard.

      Improvements in gitlab CI workflow

      • The CI stage cta_valgrind has been taking a long time, and impacts the time that it takes to merge a commit into main.
      • Therefore, we will remove cta_valgrind from the list of mandatory CI stages (will be kept as optional). It will still be done as part of the scheduled CI tests.
      • The person tagging the release must check that the last commits passes the Valgrind tests. It must be written as a part of the checklist!

       

      • Besides this, the file ReleaseNotes.mb is always a source of rebase conflicts. We need to think of a strategy to avoid this conflicts (for example by clearly separating each person's commits in different files, or in different segments of the same file).
    • 2:30 PM 2:40 PM
      CTA dev board review 10m

      Objective

      • Look at the active issues in our CTA dev board.
      • Decide if they should be kept, removed, reassigned, prioritised, etc.

      Review "In progress" issues

      • Full CTA board: link

      Review specific topic

      We did not cover this topic during this week.

      It will be kept for a future dev meeting.

    • 2:40 PM 2:50 PM
      AOB 10m

      Other

      • Room 513/R-068 is booked every week, until the EOY, for the CTA dev meeting.