Dops + Ddev
→
Europe/Zurich
2/R-014 (CERN)
Description
The monthly Dops meeting (Dirac(X) operations) will run just before the weekly Ddev (Dirac(X) developers) meeting.
Dops – 12/03/2026
At CERN: Federico, Christophe, Christopher, Alexandre, André, Alan
On Zoom: Andrei, Hideki, Xiaomei, Luisa, Heloise, Stella, Loris, Vladimir, Daniela, Simon, Janusz, Bertrand, Mazen, Vladimir, Natthan
Apologies:
Previous meetings + follow-ups
- Dops 4 weeks ago. Follow-ups :
- Jobs rescheduling: summary went to https://github.com/DIRACGrid/diracx/issues/760#issuecomment-3891302360
- ready for design
- RSS in DIRAC, for DiracX:
- Federico Created “epic” issue https://github.com/DIRACGrid/diracx/issues/790
- Discussed among few of us, 2 clear parts:
- “urgent”: minimal interface in DiracX for reading RSS status
- “later”: full machinery
- A lengthy comment have been published, detailing a possible schedule of work
- 2 PRs in DIRAC:
- RSS: remove “nodes” tables, added developer docs
- Merged, part of DIRAC v9.1.0
- Patched in https://github.com/DIRACGrid/DIRAC/pull/8480
- RSS : removals and simplifications
- Merged, part of DIRAC v9.1.1
- RSS: remove “nodes” tables, added developer docs
- Follow-up will be discussed later today
- Jobs rescheduling: summary went to https://github.com/DIRACGrid/diracx/issues/760#issuecomment-3891302360
Communities issues and requests : roundtable
LHCb:
Federico+Christopher+Christophe+Alexandre+Ryun
- Kept installing latest release (now running with 9.1.1 - patched, see below)
- Regularly running 350k+ jobs
- Matcher under stress when LHCb HLT farm starts ramping up
- Database optimizations (some of which went into DIRAC v9.1.0) had to be added in order to address the heavy load
Belle2
Hideki, Ueda
- Memory consumption increased on one of the severs
- Test server moving to v9
Juno+BES3:
Xiaomei
- fromPreviousMeeting Heavy productions since August. Running on 2 servers.
- “only 20k” running jobs, but high job frequency because of short jobs are putting pressure on SandboxStore and CS
- Federico + Christophe solutions in v8 are mostly outside of DIRAC itself:
- the performances of the disk are critical
- in LHCb we created a DNS load balancer
- increase the validity of the CS (option to increase the refresh time)
- Federico + Christophe solutions in v8 are mostly outside of DIRAC itself:
- “only 20k” running jobs, but high job frequency because of short jobs are putting pressure on SandboxStore and CS
EGI+IN2P3
Andrei, Mazen
- Set up a second service (IN2P3) for specific VO
- Test system v9+X
- choosing FC plugin for v9
CTAO
Luisa, Nattan, Loris, Stella
- NTR
CLIC
André
NTR
GridPP:
Daniela, Simon
- Nothing to report
Releases announcements and reviews
DIRAC
-
- Last of the v9.0 patches
-
- First minor release since “TBD” adoption
- Contains MySQL changes (optional, suggested) detailed in the release notes
- See “Deployment” notes in release notes
- Yanked in pypi
-
- Should have also been a minor release
- Contains MySQL changes (optional, suggested) detailed in the release notes
- See “Deployment” notes in release notes
- Yanked in pypi
-
- Patched version due to bug introduced in v9.1.0 (also in v9.1.1)
DiracX
- v0.0.10
- “technical”
DiracOS
- 2.58
- No news since last Dops
Dirac-CWL
- Introduced a new JobWrapper that can run in DIRAC. A JobReport of its status was recently introduced
Pilot
- Will (finally) complete the removal of py2 support on Monday
Feature requests, and developers’ issues: inputs and prioritizations from communities
Jobs’ match-making (matching) mechanism for DiracX: issues and plans
- Federico sent out a mail thread with questions, Andrei, Luisa, Ueda answered. Questions in there with summary of answers:
- What are the limitations you encountered with the current system?
- Expressing RAM requirements
- Federico it’s in v9
- No priority boost for long Waiting jobs
- No priority manipulation
- It is not easy to understand the reason why a job has not been matched.
- there is an attempt to do that, in a script
- Expressing RAM requirements
- Do you make use of “Tags” for match-making? Which ones, and why?
- Yes, for specific classification
- Do you inject in the JDL specific, VO-dependent parameters?
- Belle2 adds OS tags
- Do some of your VOs or users use or are interested in CWL?
- Yes (CTAO), not yet (EGI-FG, Belle2)
- Do you have access to nodes that include heterogeneous resources (e.g. in HPCs)?
- Not yet, but on the horizon
- What could be a target rate of match-making operations?
- 200Hz
- What are the limitations you encountered with the current system?
- Next:
- Federico will create an “epic” issue with user stories
- A design will follow
MP Jobs accounting
Existing issue: https://github.com/DIRACGrid/diracx/issues/294
- Federico sent out a mail thread with questions and short list of user stories. Collected a few answers, extreme summary:
- MP jobs and MP pilots are run a bit everywhere, but at a low level and the missing accounting from it is only partially noticeable, but nonetheless needed
- Next:
- Federico will add the user stories to the issue above
- Design and implementation should follow soon after (rather high priority)
- Connected, writing down here for publicity: https://github.com/DIRACGrid/diracx/issues/562
- this “epic” is about accounting and monitoring (OLAP). The plan in there is not working, but maybe few ideas could be borrowed from there. To be designed.
Need for TSCatalog?
- Do anyone use the TSCatalog? https://github.com/DIRACGrid/diracx/issues/807
- seems not, but the option is recognized as possibly useful in a future
Prioritized backlog: communities input
https://github.com/orgs/DIRACGrid/projects/30/views/3 contains the prioritized backlog.
- objections?
- something from https://github.com/orgs/DIRACGrid/projects/30/views/7 ?
AOB
- Certification machines
- Documentation for how to use them for developments/testing added to https://github.com/DIRACGrid/DIRAC/wiki/Certifications – linked from mattermost channel
- CHEP abstract
- Submitted 2 abstracts:
- DiracX in action
- accepted as an oral, Alexandre as speaker
- Aligning DIRAC Workflows with CWL: A Unified and Reproducible Workflow Model for Grid-Scale Computing
- accepted as an oral, Ryun as speaker
- DiracX in action
- Submitted 2 abstracts:
- DIRAC as an “HSF affiliated project” : https://hepsoftwarefoundation.org/projects/affiliated.html
- NTR, but Andrei will circulate the report
Next appointments
-
Next Dops: 16th April
- Topics (draft):
- Agentic DiracX: https://github.com/DIRACGrid/diracx/issues/827
- Topics (draft):
-
WS/hackathons/conferences:
- DiracX hackathon: 24-25 March
- Another hackathon in June or beginning of July
- 12th DUW: 13th-16th October
- starting 1 day earlier (Tuesday) wrt to what previously announced, ending Friday 16th lunch time
There are minutes attached to this event.
Show them.