Ddev

Europe/Zurich
2/R-014 (CERN)

2/R-014

CERN

10
Show room on map
Alexandre Franck Boyer (CERN)
Zoom Meeting ID
62504856418
Host
Federico Stagni
Useful links
Join via phone
Zoom URL

# DIRAC Development Meeting (Ddev)

**At CERN**: Yan, Federico, Christophe, Juraj, Benedikt
**On Zoom**: Loris, Heloise, Bertrand, Ryun, Hideki, Simon, Xiaomei, Bertrand, Daniela, Ryun, Mazen, Jorge, Andrei, Alexandre, Christopher
**Apologies**: 

## Product Goals & Roadmaps

- Transition to DiracX:

```mermaid
flowchart LR
    subgraph CWL["CWL"]
        CWL1("CWL submission endpoint"):::inprogress
        CWL2("CWL production system")
        CWL3("Transformation system machinery"):::blocked
        CWL4("Use CWL natively in new matcher"):::blocked
    end

    subgraph Core["Core"]
        CoreTasks("Tasks"):::inprogress
        Core2("RSS"):::inprogress
        Core3("DMS")
    end

    subgraph WMS["WMS"]
        WMS1("Matcher"):::inprogress
        WMS2("Pilot authentication"):::inprogress
        WMS3("Pilot submission"):::blocked
    end

    CWL3 --> CWL4
    CoreTasks --> Core2 --> Core3
    CoreTasks --> WMS1
    CoreTasks --> CWL3
    WMS1 --> CWL4
    CoreTasks --> WMS3

    click CoreTasks "https://www.github.com" "This is a tooltip for a link"

    classDef done fill:#B2DFDB,stroke:#00897B,color:black,stroke-width:2px;
    classDef inprogress fill:#FFF9C4,stroke:#F9A825,color:black,stroke-width:2px;
    classDef blocked fill:#BBBBBB,stroke:#222222,color:black,stroke-width:2px;

    subgraph Legend
        L2("Completed"):::done
        L4("In progress"):::inprogress
        L1("Ready for work")
        L3("Blocked"):::blocked
    end
```

- CWL integration:

```mermaid
flowchart LR
    subgraph dirac_cwl["dirac-cwl"]
        job1("Prototype Job Endpoint"):::done
        transformation("Prototype Transformation Endpoint"):::inprogress
        workflows("Workflows"):::inprogress
        prod("Prototype Production Endpoint"):::inprogress
    end

    subgraph DiracX1["DiracX"]
        prod_diracx("Implement the CWL Production System")
        trans_diracx("Implement the CWL Transformation endpoint")
        trans_diracx_original("Implement the Transformation System"):::blocked
        diracx_tasks("Implement DiracX Tasks"):::blocked
        job_diracx("Implement the CWL Job Endpoint"):::inprogress
    end

    diracx_tasks --> trans_diracx_original
    trans_diracx_original --> trans_diracx
    transformation --> trans_diracx
    job1 --> workflows
    prod --> prod_diracx
    prod_diracx -.-> deliver2(["Can submit productions to DiracX /productions"]):::milestone
    trans_diracx -.-> deliver3(["Can submit transformations to DiracX /transformations"]):::milestone
    job_diracx -.-> deliver5(["Can submit jobs to DiracX /jobs"]):::milestone

    classDef done fill:#B2DFDB,stroke:#00897B,color:black,stroke-width:2px;
    classDef inprogress fill:#FFF9C4,stroke:#F9A825,color:black,stroke-width:2px;
    classDef blocked fill:#BBBBBB,stroke:#222222,color:black,stroke-width:2px;
    classDef milestone fill:#FFDFE5,stroke:#FF5978,color:#8E2236,stroke-width:2px;

    subgraph Legend
        L1("Completed"):::done
        L2("In progress"):::inprogress
        L3("Ready for work")
        L4("Blocked"):::blocked
        L5("Milestone"):::milestone
    end
```

 

## Refinements 

### Needs triage
https://github.com/orgs/DIRACGrid/projects/30/views/7

**Goal: build a shared understanding of the project.**

> DIRAC
- [pixi 0.68 issue](https://github.com/DIRACGrid/DIRAC/issues/8533)

> DiracOS2
- [Handle updates?](https://github.com/DIRACGrid/DIRACOS2/issues/30)
- [Python warnings](https://github.com/DIRACGrid/DIRACOS2/issues/146)


> WebAppDIRAC

> diracx-web
- [Oauth2-proxy](https://github.com/DIRACGrid/diracx-web/issues/482)
- [Sync types with backend openapi](https://github.com/DIRACGrid/diracx-web/issues/237)

> diracx
- [Freezegun 1](https://github.com/DIRACGrid/diracx/issues/536)
- [Freezegun 2](https://github.com/DIRACGrid/diracx/issues/544)
- [Freezegun 3](https://github.com/DIRACGrid/diracx/issues/548)
- [Freezegun 4](https://github.com/DIRACGrid/diracx/issues/549)
- [Harden GH Actions Config](https://github.com/DIRACGrid/diracx/issues/914)
- [CLI tests](https://github.com/DIRACGrid/diracx/pull/104) - oldest PR in diracx. [name=Chris] will ressurect it later.
- [Config mechanism](https://github.com/DIRACGrid/diracx/issues/830) - comment from Federico, not addressed
- [Integrate MCP Server](https://github.com/DIRACGrid/diracx/issues/827) - still needs to be discussed on DOps first
- [RSS](https://github.com/DIRACGrid/diracx/issues/790)
    - [Phase1](https://github.com/DIRACGrid/diracx/issues/836)
    - [Phase2](https://github.com/DIRACGrid/diracx/issues/889)

> Pilot

> diracx-charts
- [Deploy a pod serving the git config repo](https://github.com/DIRACGrid/diracx-charts/issues/148)
- [Use stargz snapshotter for faster loading](https://github.com/DIRACGrid/diracx-charts/issues/161)

> dirac-cwl
- [assigning an output sandbox to a job from the api](https://github.com/DIRACGrid/dirac-cwl/issues/92)
- [dirac-cwl executor tests](https://github.com/DIRACGrid/dirac-cwl/issues/116)

> signurlarity

- [name=Chris] TODO: we should add lhcb workflow transition documentation directly in diracx for the other communities
- TODO: adding a word about dropping pre-commit ci
- TODO: create issue for index prefix in diracx

- [name=Simon] we could use bandit within a pre-commit
- [name=Chris^2] we have this in diracx (check if it's supported within Ruff) -> TODO: create an issue for that

**External deps**

### [Temporary Section] In progress, predating the new organization

https://github.com/orgs/DIRACGrid/projects/30/views/8

Various people still need to deal with old and staled PRs. We will take them into account in the next sprints. 

 

### External dependencies

https://github.com/orgs/DIRACGrid/projects/30/views/9


---

[Planning Poker](https://en.wikipedia.org/wiki/Planning_poker)
Story points values (based on Fibo)
- `1pt`: Trivial, very clear (small bug fix, config change)
- `2pts`: Small, well understood (small feature, clear requirements)
- `3pts`: Medium, some unknowns (moderate feature)
- `5pts`: Large, significant complexity (major feature, integration)
- `8pts`: Very large, many unknowns (should probably be split)
- `13+pts`: TOO BIG - must split!
- `?`: not enough knowledge to answer (remember it's ok to ask any questions)

## Sprints

### Planning (Velocity and Planning Poker)

- Backlog: https://github.com/orgs/DIRACGrid/projects/30/views/3
- Current Sprint: https://github.com/orgs/DIRACGrid/projects/30/views/1

![](https://codimd.web.cern.ch/uploads/upload_83f6e993596c286787a956420109a61f.png)

**Average Velocity: 3.07 x FTEs** *Last update: Jan 21st*

#### :warning: Velocity is a planning tool, not a performance target

- Velocity going down is NOT bad
- Velocity going up is NOT always good (might mean over-estimation)
- Velocity varies sprint-to-sprint
- We track it to improve estimation, not to judge people

**What affects velocity:**
- Estimation accuracy (we're still learning)
- Complexity of work

**Our focus:** Delivering value and hitting commitments, not maximizing velocity numbers.

### June 11th (IN PROGRESS):

#### Target and Context
- Clean up existing issues/PRs: [Burning Charts](https://github.com/orgs/DIRACGrid/projects/30/insights?period=3M)
- Finish Phase1 of the RSS migration
- Finish Phase3 of the Matcher
- Make diracx-web stable
- CWL job submission endpoint
- ? Example: transition from LHCb workflow modules to commands

All the above are a copy of the previous sprint, which did not meet the targets.

#### Availability

- [name=alexandre] % 
- [name=natthan] %
- [name=luisa] %
- [name=loris] %
- [name=stella] %
- [name=jorge] %
- [name=ryun] %
- [name=federico] %
- [name=heloise] %
- [name=christophe] %
- [name=chris] %
- [name=janusz] %
- [name=mazen] %
- [name=andrei] %
- [name=yan] %
- [name=daniela] %

_ FTEs * _ = _ story points

Expected Story Points:
Persons:
Expected Velocity:

#### Sprint Planning: 

- Backlog: https://github.com/orgs/DIRACGrid/projects/30/views/3
- Sprint: https://github.com/orgs/DIRACGrid/projects/30/views/1

### May 28th (DONE):

#### Target and Context
- Clean up existing issues/PRs: [Burning Charts](https://github.com/orgs/DIRACGrid/projects/30/insights?period=3M)
- Finish Phase1 of the RSS migration
- Finish Phase3 of the Matcher
- Make diracx-web stable
- CWL job submission endpoint
- Example: transition from LHCb workflow modules to commands

#### Availability

- [name=alexandre] 50% 
- [name=natthan]
- [name=luisa]
- [name=loris] 70%
- [name=stella] 
- [name=jorge] 80%
- [name=ryun] 30%
- [name=federico]
- [name=heloise] 30%
- [name=christophe] 10%
- [name=chris] 
- [name=janusz]
- [name=mazen] 10%
- [name=andrei] 
- [name=yan] 100%
- [name=daniela] 

4.2 FTEs *  =  story points

Expected Story Points: 79
Persons: 4.2
Expected Velocity: 18.8

Comments: big difference (biggest, so far) between expected velocity and effective one. CHEP planning and long weekends (CH, France) largely affected it.

#### Sprint Planning: 

- Backlog: https://github.com/orgs/DIRACGrid/projects/30/views/3
- Sprint: https://github.com/orgs/DIRACGrid/projects/30/views/1

#### Sprint retrospective

*The sprint is a boat :boat: ; we are trying to reach an island (target); identify anchors (what slowed you down), wind (what helped), and rocks ahead (risks for next sprint)*

:warning: **Focus on the process, not people. We're here to improve together! 🚀**

**:anchor: Anchors (what slowed you down)**
- *Example: Unclear requirements on X; Waiting for Y delayed Z; ...*
    - RSS phase1 was more complex than what we expected: splitting of the issue could have allowed us to spot the issue earlier in phase1 (if we would have implemented the RSSSource with the CacheableSource for instance)
    - pixi bug prevented us from merging a few PRs in DIRAC: need to investigate

**:cloud: Wind (what helped)**
- *Example: Good communication in weekly meetings; Quick code reviews; Clear acceptance criteria on user stories; ...*

**🪨 Rocks (risks for next sprint)**
- *Example: Team member K on vacation; Dependency on external API L; Technical debt in M; ...*
    - CHEP

---

### Previous Sprints
#### Summary

- May 14th:
  - *11 Story Points / 4 people = 2.75 velocity*
  - Comments:
    - Lowest velocity since we started, but:
    - RSS end of phase1 is actually trickier than what we initially thought
    - Many people having to start preparing presentations (CHEP, LPC retreat)
    - Many people are working on tasks that are not in the scope of the sprint (to prepare future sprints): LHCb commands to replace the workflow modules, integration of CWL job submission endpoint within diracx
    - A CI failure in DIRAC preventing from merging

#### Sprint review: https://github.com/orgs/DIRACGrid/projects/30/views/11

Related to our goals:
- **DIRAC to DiracX transition:**
    - Mostly bug fixes

- **CWL integration:**
    - NTR

- **Match-Making POC:**
    - Finding all eligible jobs for a given node (python implementation)

- **DIRAC maintenance:**
    - Mostly bug fixes


- April 30th:
  - *42 Story Points / 4.1 people = 10.2 velocity*
  - Comments:
    - ~1/4 of the counted SP come from the integration of the tasks
    - `diracx-tasks` are here :tada: 
    - All the essential components are here to transition now.
    - RSS Phase1:
      - Should have been completed but it's still under development. [name=Loris] any blocking point? 
    - New Matcher: 
      - working on a v0.2 schema design expliciting more details about what we want

- April 16th:
  - *20 Story Points / 2.6 people = 7.7 velocity*
  - Comments:
    - Less people available during the sprint (holidays, CTAO had deadlines). Also some people seemed to spend more time than originally described, some of them less time.
    - A lot of bug fixes that were not planned

- April 2nd:
  - *37 Story Points / 3.1 people = 11.9 velocity*
  - Comments: NTR

- March 19th:
  - *23 Story Points / 3.5 people = 7.2 velocity* 
  - Comments:
    - Need to adapt the velocity computation because we are processing a lot of tasks not planned originally in the sprint (which is expected since we still have a lot of PRs without any attached issue to process, ...)

- March 5th:
  - *19 Story Points / 2.8 people = 6.8 velocity*
  - Comments:
    - Less people available during this sprint, but more realistic expectation, we almost reached the expected velocity!!
    - LHCb AI hackathon: [name=Alexandre] was much less available than expected.
    - Took into account items that were in progress before scrum process (added some SP): resurrecting diracx-web, RSS simplified...

- February 19th:
  - *38 Story Points / 4.4 people = 8.6 velocity*
  - Comments:
    - French holidays
    - [name=Alexandre] was more available than expected, but did not manage to quickly follow all the PRs.
    - A few tasks have been delayed (10 SP): waiting for further discussion on scheduling and diagrams for new LHCb workflows
    - Lot of "unplanned" items: expected as long as we have to deal with the large backlog of old items.

- February 5th:
  - *29 Story Points / 3.1 people = 9.4 velocity*
  - Comments:
    - LHCb-CERN had a computing workshop
    - Various people worked on old PRs I did not take into account :warning:

- January 21st:
  - *6 Story Points / 2.5 people = 2.4 velocity*
  - Comments:
    - LHCb-CERN had a team retreat, LHCb-Spain had a conference.

- January 7th:
  - *15 Story Points / 3.9 people = 3.8 velocity*
  - Comments:
    - No specific comment, the sprint was split by the holidays.

- December 10th:
  - *6 Story Points / 3 people = 2 velocity*
  - Comments:
    - About the same as the previous sprint: still a gap between expected/actual availability

- November 26th:
  - *6 Story Points / 3 people = 2 velocity*
  - Comments:
    - Much lower than the previous sprint because it included tasks started before the sprint.
    - Lots of "almost done" PRs: we are improving the description of the tasks and their size but still not enough (each task should bring value though).

- November 10th:
  - *22 Story Points / 4.3 people = 5.1 velocity*


#### Actionable Results from the Retrospective

- **Action:** Feature PRs should be thoroughly tested in certification.
  - Owner: developers
  - When: Sprint12
  - Status: 29/04/26 In Progress
- **Action:** Avoid verbose (AI-generated) issues with many implementation details that can deprecate over time.
  - Owner: developers and product owners
  - When: Sprint11
  - Status: 15/04/26 In Progress
- **Action:** Better view of the PRs ready to be reviewed vs needing changes.
  - Owner: developers
  - When: Sprint8
  - Status: 15/04/26 In Progress
- **Action:** Better communicate when a PR is going to be big, as soon as possible. Split the work in this case.
  - Owner: developers
  - When: Sprint6
  - Status: 21/01/26 DONE
- **Action:** Better use of the mattermost channel to get reviews on a given PR
  - Owner: everyone
  - By when: Sprint3
  - Status: 04/02/26 DONE
- **Action:** Define estimates and velocity based on Sprint2's results, taking into account external contributions (bonus Story Points) and availability
  - Owner: alexandre
  - By when: Sprint3
  - Status: DONE
- **Action:** Better define the scrum roles
  - Owner: alexandre
  - By when: Sprint5
  - Status: DONE
- **Action:** Better define `DONE` criteria (what should be included into the PR, and how to make sure we are not introducing too much technical debt)
  - Owner: everyone
  - By when: Sprint2
  - Status: DONE
- **Action:** Avoid planning dependent tasks in a same sprint
  - Owner: everyone
  - By when: Sprint2
  - Status: DONE

## AOB

 

There are minutes attached to this event. Show them.
    • Dirac(X) developers (Ddev): Dirac(X) developers
      Convener: Alexandre Franck Boyer (CERN)