Dops + Ddev
The monthly Dops meeting (Dirac(X) operations) will run just before the weekly Ddev (Dirac(X) developers) meeting.
-
-
10:00
→
11:00
Dirac(X) operations (Dops)Convener: Federico Stagni (CERN)
Dops – 18/06/2026
At CERN: Federico, Alexandre, Ryunosuke, Cedric, Marco, Ryun
On Zoom: Christophe, Luisa, Loris, Jorge, Simon, Daniela (first 30 min only), Heloise, Ueda, Janusz, Bertrand, Dhiraj, Xiaomei, Juraj, Yan, Stella
Apologies: Andrei
this meeting is being recorded
Previous meetings + follow-ups
- Dops 4 weeks ago. Follow-ups
- On the request for a long term support release (DIRAC):
- We agreed on creating a
rel-v9r0branch, to which all “needed” fixes have been backported fromintegration - Release(s) created accordingly
- v9.0.22 is “everyone’s” target
- This is to be considered as a “lily-pad” release, for jumping to highest ones later on.
- Daniela wants to test one more issue on that, and will get around to it soonish, sorry
- We agreed on creating a
- Federico sent out a google form for collecting requirements for DiracX Transformation/Production system
- 3 “simple” answers received. Most of the “higher requirements” ones still to come
- older CWL is coming to DiracX, and the new “hints”: https://codimd.web.cern.ch/SllN13jAQNSG25MjHB8Swg?both
- “Large” PR needs to be split
- On the request for a long term support release (DIRAC):
Communities issues and requests : roundtable
LHCb:
Federico+Christopher+Christophe+Alexandre+Ryun
- Running in production the latest releases of everything
Belle2
Ueda, Hideki, Cedric
- Smooth operation
Juno+BES3:
Xiaomei
- Nothing new
- fromPreviousMeeting
monitorFilesnot working correctly for production transformations. Maybe buggy,- Federico+Chris in LHCb a different mechanism is used
- Luisa in CTAO also a different solution
- 20th May Andrei found a bug, hot-fixed, will commit if tested OK
- 18th June Andrei submitted a fix PR: [8.0] fix: do not reset files for not yet submitted jobs
EGI+IN2P3
Andrei, Mazen
- NTR
JINR
Igor
- NTR
CTAO
Luisa, Natthan, Loris, Stella
- Answers prepared for the prod system, feedback from colleagues needed
- Preliminary design review of CTAO computing earlier this week
- Stressed the importance and usefulness of Transformation plugins
- fromPreviousMeeting Use case of possibly 10k short transformations (out of 1 production request)
CLIC
André
- NTR
FCC
Juraj
- Started effort, pointers collected. Started collecting answers for the form.
CMS
Andrea, Marco
- Not yet
GridPP:
Daniela, Simon, Janusz
- Production: v8.0.78 + security patches (will upgrade next week to v8.0.80)
- Fixed a couple of minor bugs in WebAppDIRAC, now in 5.0.14
Releases announcements and reviews
DiracOS
-
OK to remove the CentOS7 support (follow up from https://github.com/DIRACGrid/DIRACOS2/pull/181#issuecomment-4312989818): ++new issue/PR
- Federico This https://github.com/DIRACGrid/DIRAC/pull/8575 as first?
-
2.6x is the last version
- fromPreviousMeeting 2 issues opened by Daniela
- dependencies https://github.com/DIRACGrid/DIRACOS2/issues/173
- include latest htcondor: https://github.com/DIRACGrid/DIRACOS2/issues/169 (not urgent, however I was hoping for the latest long term support release --Daniela)
DIRAC
-
Note on the existing branches:
- there are now 3 “live” branches:
rel-v8r0,rel-v9r0,integration - please, target
integrationfor “everything” unless:- it’s a real bug fix (maybe backported)
- it’s a security fix (but that should go through a CVE)
- the “sweeper” of PRs is not anymore in action: you need to create separate PRs by yourself
- to avoid “I’ll do it later”, PRs will only be merged if they are already created for all the branches.
- there are now 3 “live” branches:
-
- includes some security fixes (ported to every branch)
-
- various backports from integration
-
v9.1.12 + v9.1.11, v9.1.10
- several not-only-security fixes
- several performance fixes
DiracX
- v0.2.0
- added AsyncTwoLevelCache
CWL
- The big PR was closed, will be re-worked into actually reviewable PRs
- Interactions with CWL founder in https://github.com/DIRACGrid/diracx/issues/858
DiracX-web
- first non-alpha release ?
- Not yet
Pilot
- NTR
Agenting and AI developments
- Open issue to discuss at some point, or followed-up among communities.
Feature requests, and developers’ issues: inputs and prioritizations from communities
Transformation/Production “system” in DiracX
- Nothing more to report for this meeting. We expect all updates by the next DOps.
Prioritized backlog: communities input
https://github.com/orgs/DIRACGrid/projects/30/views/3 contains the prioritized backlog.
- objections?
- something from https://github.com/orgs/DIRACGrid/projects/30/views/7 ?
AOB
- DIRAC is now officially an “HSF affiliated project” : https://hepsoftwarefoundation.org/projects/projects.html
- We should somehow “use this”
- CHEP highlights
- Alexandre presents some Dirac(X)-related CHEP highlights
Next appointments
-
Next meetings:
- Next DOps on August 13th (no DOps in the middle of summer)
-
WS/hackathons/conferences:
- DiracX hackathon: 1st and 2nd of July
- 20 people registered. Search for another room un-successful
- Social event on the evening of July 1st.
- 12th DUW: 13th-16th October
- registrations open! Free of charge, thanks to a local sponsor (waiting for Jiri to update a few things)
- Planning for DiracX hackathons in 2027
- (Almost) booked IdeaSquare for hackathon on 13th and 14th January 2027: https://indico.cern.ch/event/1699808/
reply:
I have made a temporary booking for your event in our calendar. I will contact you in September to confirm the booking with you
- (Almost) booked IdeaSquare for hackathon on 13th and 14th January 2027: https://indico.cern.ch/event/1699808/
- DiracX hackathon: 1st and 2nd of July
- Dops 4 weeks ago. Follow-ups
-
11:00
→
12:00
Dirac(X) developers (Ddev)Convener: Alexandre Franck Boyer (CERN)
# DIRAC Development Meeting (Ddev)
**At CERN**: Federico, Christophe, Chris
**On Zoom**: Loris, Heloise, Natthan, Simon, Bertrand, Amir, Yan, Andrei, Jorge, Stella, Daniela, Mazen
**Apologies**:## Product Goals & Roadmaps
- Transition to DiracX:
```mermaid
flowchart LR
subgraph CWL["CWL"]
CWL1("CWL submission endpoint"):::inprogress
CWL2("CWL production system")
CWL3("Transformation system machinery"):::blocked
CWL4("Use CWL natively in new matcher"):::blocked
endsubgraph Core["Core"]
CoreTasks("Tasks"):::inprogress
Core2("RSS"):::inprogress
Core3("DMS")
endsubgraph WMS["WMS"]
WMS1("Matcher"):::inprogress
WMS2("Pilot authentication"):::inprogress
WMS3("Pilot submission"):::blocked
endCWL3 --> CWL4
CoreTasks --> Core2 --> Core3
CoreTasks --> WMS1
CoreTasks --> CWL3
WMS1 --> CWL4
CoreTasks --> WMS3click CoreTasks "https://www.github.com" "This is a tooltip for a link"
classDef done fill:#B2DFDB,stroke:#00897B,color:black,stroke-width:2px;
classDef inprogress fill:#FFF9C4,stroke:#F9A825,color:black,stroke-width:2px;
classDef blocked fill:#BBBBBB,stroke:#222222,color:black,stroke-width:2px;subgraph Legend
L2("Completed"):::done
L4("In progress"):::inprogress
L1("Ready for work")
L3("Blocked"):::blocked
end
```- CWL integration:
```mermaid
flowchart LR
subgraph dirac_cwl["dirac-cwl"]
job1("Prototype Job Endpoint"):::done
transformation("Prototype Transformation Endpoint"):::inprogress
workflows("Workflows"):::inprogress
prod("Prototype Production Endpoint"):::inprogress
endsubgraph DiracX1["DiracX"]
prod_diracx("Implement the CWL Production System")
trans_diracx("Implement the CWL Transformation endpoint")
trans_diracx_original("Implement the Transformation System"):::blocked
diracx_tasks("Implement DiracX Tasks"):::blocked
job_diracx("Implement the CWL Job Endpoint"):::inprogress
enddiracx_tasks --> trans_diracx_original
trans_diracx_original --> trans_diracx
transformation --> trans_diracx
job1 --> workflows
prod --> prod_diracx
prod_diracx -.-> deliver2(["Can submit productions to DiracX /productions"]):::milestone
trans_diracx -.-> deliver3(["Can submit transformations to DiracX /transformations"]):::milestone
job_diracx -.-> deliver5(["Can submit jobs to DiracX /jobs"]):::milestoneclassDef done fill:#B2DFDB,stroke:#00897B,color:black,stroke-width:2px;
classDef inprogress fill:#FFF9C4,stroke:#F9A825,color:black,stroke-width:2px;
classDef blocked fill:#BBBBBB,stroke:#222222,color:black,stroke-width:2px;
classDef milestone fill:#FFDFE5,stroke:#FF5978,color:#8E2236,stroke-width:2px;subgraph Legend
L1("Completed"):::done
L2("In progress"):::inprogress
L3("Ready for work")
L4("Blocked"):::blocked
L5("Milestone"):::milestone
end
```## Refinements
### Needs triage
https://github.com/orgs/DIRACGrid/projects/30/views/7**Goal: build a shared understanding of the project.**
> DIRAC
> DiracOS2
- [Handle updates?](https://github.com/DIRACGrid/DIRACOS2/issues/30)
- [Python warnings](https://github.com/DIRACGrid/DIRACOS2/issues/146)> WebAppDIRAC
> diracx-web
- [Oauth2-proxy](https://github.com/DIRACGrid/diracx-web/issues/482)
- [Sync types with backend openapi](https://github.com/DIRACGrid/diracx-web/issues/237)> diracx
- [pixi lock file hook](https://github.com/DIRACGrid/diracx/issues/942)
- [Fix warnings in test suite](https://github.com/DIRACGrid/diracx/issues/935)
- [Cache static endpoint responses](https://github.com/DIRACGrid/diracx/issues/835)
- [ADRs](https://github.com/DIRACGrid/diracx/issues/588)
- [CLI tests](https://github.com/DIRACGrid/diracx/pull/104) - oldest PR in diracx. [name=Chris] will ressurect it later.
- [Integrate MCP Server](https://github.com/DIRACGrid/diracx/issues/827) - still needs to be discussed on DOps first
- [RSS](https://github.com/DIRACGrid/diracx/issues/790)
- [Phase1](https://github.com/DIRACGrid/diracx/issues/836)
- [Phase2](https://github.com/DIRACGrid/diracx/issues/889)> Pilot
> diracx-charts
> dirac-cwl
- [assigning an output sandbox to a job from the api](https://github.com/DIRACGrid/dirac-cwl/issues/92)
- [dirac-cwl executor tests](https://github.com/DIRACGrid/dirac-cwl/issues/116)> signurlarity
- [rustfs docker image not stable](https://github.com/DIRACGrid/signurlarity/issues/38)- [name=Chris] TODO: we should add lhcb workflow transition documentation directly in diracx for the other communities
- TODO: adding a word about dropping pre-commit ci**External deps**
### [Temporary Section] In progress, predating the new organization
https://github.com/orgs/DIRACGrid/projects/30/views/8
Various people still need to deal with old and staled PRs. We will take them into account in the next sprints.
### External dependencies
https://github.com/orgs/DIRACGrid/projects/30/views/9
---[Planning Poker](https://en.wikipedia.org/wiki/Planning_poker)
Story points values (based on Fibo)
- `1pt`: Trivial, very clear (small bug fix, config change)
- `2pts`: Small, well understood (small feature, clear requirements)
- `3pts`: Medium, some unknowns (moderate feature)
- `5pts`: Large, significant complexity (major feature, integration)
- `8pts`: Very large, many unknowns (should probably be split)
- `13+pts`: TOO BIG - must split!
- `?`: not enough knowledge to answer (remember it's ok to ask any questions)## Sprints
### Planning (Velocity and Planning Poker)
- Backlog: https://github.com/orgs/DIRACGrid/projects/30/views/3
- Current Sprint: https://github.com/orgs/DIRACGrid/projects/30/views/1
**Average Velocity: 3.07 x FTEs** *Last update: Jan 21st*#### :warning: Velocity is a planning tool, not a performance target
- Velocity going down is NOT bad
- Velocity going up is NOT always good (might mean over-estimation)
- Velocity varies sprint-to-sprint
- We track it to improve estimation, not to judge people**What affects velocity:**
- Estimation accuracy (we're still learning)
- Complexity of work**Our focus:** Delivering value and hitting commitments, not maximizing velocity numbers.
### June 24th (IN PROGRESS):
#### Target and Context
- **Transition**:
- Clean up existing issues/PRs: [Burning Charts](https://github.com/orgs/DIRACGrid/projects/30/insights?period=3M)
- Integrate ADRs & precise roadmap
- ~~Finish Job Monitoring and stabilize diracx-web~~
- **RSS**: Finish Phase1
- **Match Making**: Finish Phase3
- **Pilot**: Finish PR1#### Availability
- [name=alexandre] 50%
- [name=natthan] 20%
- [name=luisa] %
- [name=loris] 60%
- [name=stella] 30%
- [name=jorge] 70%
- [name=ryun] %
- [name=federico] 20%
- [name=heloise] 30%
- [name=christophe] %
- [name=chris] %
- [name=janusz] %
- [name=mazen] %
- [name=andrei] %
- [name=yan] 100%
- [name=Simon] 10%
- [name=daniela] 10%
- [name=Hideki] %
- [name=Benedikt] 20%_ FTEs * _ = _ story points
Expected Story Points:
Persons:
Expected Velocity:#### Sprint Planning:
- Backlog: https://github.com/orgs/DIRACGrid/projects/30/views/3
- Sprint: https://github.com/orgs/DIRACGrid/projects/30/views/1### June 11th (DONE):
Expected Story Points: 56
Persons: 3.8
Expected Velocity: 14.720/3.8 = 5.2
Comments: big difference between expected velocity and effective one.
- CHEP planning and long weekends (CH, France) largely affected it.#### Sprint Planning:
- Backlog: https://github.com/orgs/DIRACGrid/projects/30/views/3
- Sprint: https://github.com/orgs/DIRACGrid/projects/30/views/1#### Sprint review: https://github.com/orgs/DIRACGrid/projects/30/views/11
Related to our goals:
- **DIRAC to DiracX transition:**
- Mostly bug fixes- **CWL integration:**
- NTR- **Match-Making POC:**
- NTR- **DIRAC maintenance:**
- Prepared rel-v9r0: paddies release#### Sprint retrospective
*The sprint is a boat :boat: ; we are trying to reach an island (target); identify anchors (what slowed you down), wind (what helped), and rocks ahead (risks for next sprint)*
:warning: **Focus on the process, not people. We're here to improve together! 🚀**
**:anchor: Anchors (what slowed you down)**
- *Example: Unclear requirements on X; Waiting for Y delayed Z; ...*
- LHCb got issues in production with the AuthDB not cleaned up correctly: we should have better described the issue originally. This is expected to be better with new issues and the template.
- Certification machine can't be accessed by everyone: should define a clear policy about what we expect from this instance and who should take care of the tests.
- Lot of delay due to lack of reviewers: shall we let developers review PRs (could go with a the first pass, and then be assisted by architects): could come with some guidelines.
- Should be more careful with PR titles: we merged a `feat` PR which should have been `chore` or `docs` (bumped the diracx version in the wrong way - corrected before the release)
- pixi lock file was migrated to v7 and then reverted to v6 in another PR: we should be more careful about it. May be we could have a pre-commit hook to make sure the file is not touched if there is no deps changes -> minimum pixi version in pixi.toml
- CWL PR:
- was meant to move work done from dirac-cwl to diracx, but many other features were added, not easy to review (breaking the rule we have so big PRs should be split)
- was rediscussed it after being implemented and certified with the architects: as developers, we should better explain our roadmap (CWL, but also in general) because it was not clear for everyone (let's come up with a plan we all agree on before the hackathon)**:cloud: Wind (what helped)**
- *Example: Good communication in weekly meetings; Quick code reviews; Clear acceptance criteria on user stories; ...***🪨 Rocks (risks for next sprint)**
- *Example: Team member K on vacation; Dependency on external API L; Technical debt in M; ...*
- LHCb week
- CTAO review (Christophe away for a week)---
### Previous Sprints
#### Summary- May 28th:
- *16.2 Story Points / 4.2 people = 3.8 velocity*
- Comments:
- Big difference between expected velocity and effective one.
- CHEP planning and long weekends (CH, France) largely affected it.- May 14th:
- *11 Story Points / 4 people = 2.75 velocity*
- Comments:
- Lowest velocity since we started, but:
- RSS end of phase1 is actually trickier than what we initially thought
- Many people having to start preparing presentations (CHEP, LPC retreat)
- Many people are working on tasks that are not in the scope of the sprint (to prepare future sprints): LHCb commands to replace the workflow modules, integration of CWL job submission endpoint within diracx
- A CI failure in DIRAC preventing from merging- April 30th:
- *42 Story Points / 4.1 people = 10.2 velocity*
- Comments:
- ~1/4 of the counted SP come from the integration of the tasks
- `diracx-tasks` are here :tada:
- All the essential components are here to transition now.
- RSS Phase1:
- Should have been completed but it's still under development. [name=Loris] any blocking point?
- New Matcher:
- working on a v0.2 schema design expliciting more details about what we want- April 16th:
- *20 Story Points / 2.6 people = 7.7 velocity*
- Comments:
- Less people available during the sprint (holidays, CTAO had deadlines). Also some people seemed to spend more time than originally described, some of them less time.
- A lot of bug fixes that were not planned- April 2nd:
- *37 Story Points / 3.1 people = 11.9 velocity*
- Comments: NTR- March 19th:
- *23 Story Points / 3.5 people = 7.2 velocity*
- Comments:
- Need to adapt the velocity computation because we are processing a lot of tasks not planned originally in the sprint (which is expected since we still have a lot of PRs without any attached issue to process, ...)- March 5th:
- *19 Story Points / 2.8 people = 6.8 velocity*
- Comments:
- Less people available during this sprint, but more realistic expectation, we almost reached the expected velocity!!
- LHCb AI hackathon: [name=Alexandre] was much less available than expected.
- Took into account items that were in progress before scrum process (added some SP): resurrecting diracx-web, RSS simplified...- February 19th:
- *38 Story Points / 4.4 people = 8.6 velocity*
- Comments:
- French holidays
- [name=Alexandre] was more available than expected, but did not manage to quickly follow all the PRs.
- A few tasks have been delayed (10 SP): waiting for further discussion on scheduling and diagrams for new LHCb workflows
- Lot of "unplanned" items: expected as long as we have to deal with the large backlog of old items.- February 5th:
- *29 Story Points / 3.1 people = 9.4 velocity*
- Comments:
- LHCb-CERN had a computing workshop
- Various people worked on old PRs I did not take into account :warning:- January 21st:
- *6 Story Points / 2.5 people = 2.4 velocity*
- Comments:
- LHCb-CERN had a team retreat, LHCb-Spain had a conference.- January 7th:
- *15 Story Points / 3.9 people = 3.8 velocity*
- Comments:
- No specific comment, the sprint was split by the holidays.- December 10th:
- *6 Story Points / 3 people = 2 velocity*
- Comments:
- About the same as the previous sprint: still a gap between expected/actual availability- November 26th:
- *6 Story Points / 3 people = 2 velocity*
- Comments:
- Much lower than the previous sprint because it included tasks started before the sprint.
- Lots of "almost done" PRs: we are improving the description of the tasks and their size but still not enough (each task should bring value though).- November 10th:
- *22 Story Points / 4.3 people = 5.1 velocity*
#### Actionable Results from the Retrospective- **Action:** Feature PRs should be thoroughly tested in certification.
- Owner: developers
- When: Sprint12
- Status: 29/04/26 In Progress
- **Action:** Avoid verbose (AI-generated) issues with many implementation details that can deprecate over time.
- Owner: developers and product owners
- When: Sprint11
- Status: 15/04/26 In Progress
- **Action:** Better view of the PRs ready to be reviewed vs needing changes.
- Owner: developers
- When: Sprint8
- Status: 15/04/26 In Progress
- **Action:** Better communicate when a PR is going to be big, as soon as possible. Split the work in this case.
- Owner: developers
- When: Sprint6
- Status: 21/01/26 DONE
- **Action:** Better use of the mattermost channel to get reviews on a given PR
- Owner: everyone
- By when: Sprint3
- Status: 04/02/26 DONE
- **Action:** Define estimates and velocity based on Sprint2's results, taking into account external contributions (bonus Story Points) and availability
- Owner: alexandre
- By when: Sprint3
- Status: DONE
- **Action:** Better define the scrum roles
- Owner: alexandre
- By when: Sprint5
- Status: DONE
- **Action:** Better define `DONE` criteria (what should be included into the PR, and how to make sure we are not introducing too much technical debt)
- Owner: everyone
- By when: Sprint2
- Status: DONE
- **Action:** Avoid planning dependent tasks in a same sprint
- Owner: everyone
- By when: Sprint2
- Status: DONE## AOB
-
10:00
→
11:00