Dops + Ddev
The monthly Dops meeting (Dirac(X) operations) will run just before the weekly Ddev (Dirac(X) developers) meeting.
Dops – 16/04/2026
At CERN: Federico, Christophe, Christopher, Alexandre, André, Yan
On Zoom: Andrei, Hideki, Xiaomei, Luisa, Heloise, Dhiraj, Loris, Daniela, Bertrand, Vladimir, Ryun, Natthan
Apologies:
this meeting is being recorded
We plan, from now on, to record every meeting
Previous meetings + follow-ups
- Dops 5 weeks ago. Follow-ups
- RSS for DiracX
- Phase 1: https://github.com/DIRACGrid/diracx/issues/836. 3 tasks, 2 of them completed.
- Instrumental for phase 2: [RSS Phase 2] DIRAC legacy integration
#889 – after this it will be actually used. 3 tasks written.
- Instrumental for phase 2: [RSS Phase 2] DIRAC legacy integration
- Phase 1: https://github.com/DIRACGrid/diracx/issues/836. 3 tasks, 2 of them completed.
- Jobs’ match-making (matching) mechanism for DiracX: issues and plans
- an “epic” was created: https://github.com/DIRACGrid/diracx/issues/843
- User stories, followed by more technical description and work plan, with several tasks already written down
- Implementation approach is first of all about creating a prototype, which started in https://github.com/devlink42/poc-dirac-job-matchmaking
- Yan (LHCb intern) is working on it
- an “epic” was created: https://github.com/DIRACGrid/diracx/issues/843
- MP Jobs accounting
- Users stories have been added to https://github.com/DIRACGrid/diracx/issues/294
- Federico has a plan, partly discussed among few others: https://codimd.web.cern.ch/nyt-gRj-QnGyQ1SdXhorGw
- the idea is to implement the feature in DiracX directly, for only MP Jobs accounting. In the long run it would possibly serve as base for what we call Monitoring and Accounting (in DIRAC) for DiracX, so replacing the initial plan of https://github.com/DIRACGrid/diracx/issues/562.
- RSS for DiracX
Communities issues and requests : roundtable
LHCb:
Federico+Christopher+Christophe+Alexandre+Ryun
- Kept installing latest release. NTR otherwise
Belle2
Hideki, Ueda
- Memory issues from JobMonitor
Juno+BES3:
Xiaomei
- Too long LFN : check MySQL version, recent enough ones would at least avoid registering a truncated LFN
monitoriFilesnot working correctly for production transformations. Maybe buggy,- Federico+Chris in LHCb a different mechanism is used
- Luisa in CTAO also a different solution
EGI+IN2P3
Andrei, Mazen
- EGI: NTR
- The new IN2P3 service is operational now, for 1 specific community (which is using Rucio as DM)
CTAO
Luisa, Nattan, Loris, Stella
- Use case of possibly 10k short transformations (out of 1 production)
CLIC
André
- NTR
GridPP:
Daniela, Simon
-
Nothing new for production (still on 8.0.74)
-
Pre-prod
- My attempts to address this comment by Chris B (https://github.com/DIRACGrid/diracx/pull/851#discussion_r2999473646) resulted in us upgrading the test server from v9.0.2 to v9.0.20. This threw up a bunch of issues, the most prominent (and possibly most interesting to other people) one being: https://github.com/DIRACGrid/DIRAC/pull/8515 : You do need a storage management system to run DIRAC and the code should allow for this, as it did in v9.0.2. The link points to Federico’s fix.
- We also see: WARNING: MYSQL_OPT_RECONNECT is deprecated and will be removed in a future version. Fix in https://github.com/DIRACGrid/DIRAC/pull/8507
- Second pass at using one OpenSearch server for production and 3 pre-prod servers. This is done via index prefixes.
- To re-iterate what we stated during the DIRAC workshop: We really can’t afford one OpenSearch server per DIRAC install.
- The first issue was that in v8 and v9, no matter what we did, OpenSearch indices where also created for WMS/RMS despite: self.activityMonitoring False (the rogue indices look e.g. like this: dirac00._rmsmonitoring-index-2026-03)
- This was fixed in https://github.com/DIRACGrid/DIRAC/pull/8490 by Federico.
- We force backported this to our v8 installs to be able to continue testing. While it would be nice to have this backported to v8, it’s not vital.
- We think https://github.com/DIRACGrid/DIRAC/issues/8489 : [Feature] Introduce the concept of a global prefix for OpenSearch indexes is going in the right direction, and could be of interest to other smaller DIRAC installations.
- v9.0 pre-prod server: This issue (https://github.com/DIRACGrid/DIRAC/issues/8453): ‘/tmp’ filling up with proxies has been fixed, but unless it’s backported to v9.0, we cannot use v9.0. The original issue was a refactor gone wrong, so could we put the refactor of the refactor back into the release, please ?
- At the moment we cherry-picked the solution back to our v9.0 test server, but that would be an unfortunate approach to a production server.
-
diracos issues
- After upgrading to 2.60 we noticed the webapp using 100% CPU. Fixed in https://github.com/DIRACGrid/tornado_m2crypto/pull/7
- https://github.com/DIRACGrid/DIRACOS2/issues/174 ([Bug]: DIRACOS2 2.58+ requires $HOME): Resulted in failing jobs on the certification server that were attempting a “traditional” (as opposed to cvmfs) DIRAC install. The Imperial College grid site has not had $HOME for years, and I suspect we aren’t the only ones. The only reason this does not show up in the wild is that prod instances tend to use the cvmfs version.
- https://github.com/DIRACGrid/DIRACOS2/issues/169: [Feature]: Include htcondor-25 in diracos2: We are runnning HTCondor25 on our site and we see crashes that seem to be induced by jobs coming from the DIRAC certification server. The HTCondor mailing list suggests this might be a mismatch between the submitting and receiving condors. EGI is currently running a campaign to get sites to upgrade their condor installs. We think it would be a good idea to test this hypothesis before we hit a real problem.
-
Web App
- While fixing the 100% CPU issue, Simon also introduced this fix: https://github.com/DIRACGrid/WebAppDIRAC/pull/791 Only auto-reload webapp in development mode
-
Documentation:
- I made a first pass of documenting the install DiracX in a container: https://github.com/DIRACGrid/diracx/pull/851 I would like to get this released (but got sidetracked testing fixed for 9.0.20), and I am also at in-person-meetings the next 5 working days out of 6. Maybe something to discuss for the ops part of this meeting (any possibility of releasing a first pass, in combination with an “improvement” issue ?)
- https://github.com/DIRACGrid/DIRAC/issues/8513 (Documentation request for: NumberOfGPUs & AvailableRAM)
Releases announcements and reviews
DiracOS
- 2.61 (+ 2.60, 2.59)
- new version of VOMS https://conda-metadata-app.streamlit.app/?q=conda-forge%2Flinux-64%2Fvoms-2.1.3-hd035966_0.conda
- FIX: (#171) pin setuptools to <82.0
- NEW: (#168) add signularity
- (#176) Depend on tornado_m2crypto >=0.1.4
- effectively, 2.59 and 2.60 should be considered “buggy”. The latest release is always installed by default.
- 2 issues opened by Daniela
- dependencies https://github.com/DIRACGrid/DIRACOS2/issues/173
- include latest htcondor: https://github.com/DIRACGrid/DIRACOS2/issues/169
DIRAC
- v9.1.6 (+ v9.1.5, 4, 3)
- v9.1.5 is buggy, and has been yanked in pypi
- Core NEW: (#8484) Add DIRAC_FAST_PROCESS_POOL as experimental feature to speed up the REA
- WMS CHANGE: (#8479) get cpu work loeft from a single source of truth
v9.1.7 is awaiting for DiracX release first (see below)
DiracX
-
v0.0.13 (+ v0.0.12, v0.0.11)
- implemented authdb tables cleanup (#815)
- replace container base images with pixi-managed environments (#810)
- what does it mean for https://github.com/DIRACGrid/container-images ?
- add diracx-tasks (#842) (63d3a01)
- This should have been logically in v0.1.0, but for technical reasons (chicken-and-egg issue??) could not be done
- add task to clean sandbox store (#883) (ab38f04)
- core: strict UTC datetime validation for pydantic models (#477) (ed3d934)
-
- not yet there, but first PRs merged for it
?? A “proper” release is awaiting for diracx-charts
New documentation:
- how to make a release (tested during last hackathon)
- advanced tutorial (tested during last hackathon)
- deploy in containers: https://github.com/DIRACGrid/diracx/pull/851 (draft)
- this is for DiracX services, tasks will be done later on
Reminder: the PR titles should match the conventional commits spec, this is enforced via https://github.com/amannn/action-semantic-pull-request – this is determining how releases are numbered.
Dirac-CWL
- Ryun: CWL is coming to DiracX, and the new “hints”: https://codimd.web.cern.ch/SllN13jAQNSG25MjHB8Swg?both
DiracX-web
- ?? waiting for https://github.com/DIRACGrid/diracx-web/pull/484 before creating the first non-alpha release
Pilot
- Completed the removal of py2 support.
- If you really need python2, it is still possible by using
py2_eoltag: https://github.com/DIRACGrid/Pilot/tree/py2_eol- But it will be rendered useless by https://github.com/DIRACGrid/DIRAC/pull/8339
Feature requests, and developers’ issues: inputs and prioritizations from communities
Nothing specific.
Prioritized backlog: communities input
https://github.com/orgs/DIRACGrid/projects/30/views/3 contains the prioritized backlog.
- objections?
- something from https://github.com/orgs/DIRACGrid/projects/30/views/7 ?
AOB
- CMS and DiracX:
- CMS wants to use DiracX as their Workflow Management System (basically, the production system). Their review concluded that it’s feasable. Technical work and contribution from CMS starts now.
- 3 developers added to the DiracGrid organization in github, to the
diracproject-usersML, and to the DiracX mattermost:- Valentin Kuznetsov
- Alan Malta
- Todor Ivanov
- Certification machines
- All working as expected
- DIRAC as an “HSF affiliated project” : https://hepsoftwarefoundation.org/projects/affiliated.html
- Andrei sent out an updated answer
- CHEP is in about 40 days
Next appointments
-
Few changes to some of the next meetings
- Federico will host the Ddev on April 30th
- There won’t be a Ddev on Thursday 14th of May (holiday at CERN), so anticipated to Wednesday 13th of May, same time
- The next DOps will be on Wednesday 20th of May, same time
- Federico will host the Ddev on May 28th (CHEP week)
-
WS/hackathons/conferences:
- DiracX hackathon: 1st and 2nd of July
- registrations are open
- will also organize a social dinner this time, on the 1st of July
- 12th DUW: 13th-16th October
- registrations not yey open: waiting for Jiri
- DiracX hackathon: 1st and 2nd of July