BiLD-Dev
Bi-Weekly "Loyal" DIRAC developers meeting. And, following, the LHCbDIRAC developers meeting.
Zoom: BiLD
https://cern.zoom.us/j/62504856418?pwd=TU1kb01SOFFpSDBJeWVBdU9qemVXQT09
Meeting ID: 62504856418
Passcode: 12345678
BiLD – 27/03/2025
At CERN: Federico, Christophe, Christopher, Alexandre, Ryunosuke, Robin, Theau
On Zoom: Simon, Janusz, Hideki, Henryk, Alexey, Jorge, Vladimir, Xiaomei, Daniela, Luisa, Cedric
Apologies: André
Follow-up from previous meetings
- Last BiLD was March 6th
- Last DIRAC certification hackathon on March 20th: https://github.com/orgs/DIRACGrid/projects/24
- the last for v9?
- pretty OK, really minor issues found
- Daniela How’s everyone’s CHEP reviews coming along ?
DIRAC communities roundtable
LHCb:
Federico+Alexandre+Christophe+Christopher+Ryun+Vladimir+Alexey+Robin+Theau
- On Monday we will start the migration to DIRAC v9
- will take long time because we are “profiting” for doing many database updates:
- update MySQL to 8.4
- update ROW_FORMAT to “Dynamic” (many tables were created when the default was “Compact”)
- updated character set to
utfmb4
- defrag
- the many DB changes for DIRAC v9 (this is the only “obligatory” change)
- we’ll deploy also lhcbdiracx and lhcbdiracx-web
- will take long time because we are “profiting” for doing many database updates:
ILC/Calice/FCC
André
- NTR
Belle2
Hideki
- Smooth DIRAC operations, few questions about preparation for v9 migration
EGI
Andrei
- Problems with certificates for some communities, asking for moving to use tokens instead. So, looking for soon migration to DIRAC9.DiracX. Simple workflows, so DiracX simple job management can be enough for this community.
- A young engineer starting on the 1st May for the GreenDIGIT/DIRAC
GridPP:
Daniela, Simon, Janusz
- Nothing to report on production.
Juno
Xiaomei
- NTR
CTAO
Natthan, Luisa
- Fully move to IAM since VOMS server at CC-IN2P3 is now stopped
- Issue with the ProxyManager -> we run it with Tornado till it is not fixed (on the v9 instance)
- DiracX CA, which solution for trusted CA?
- igtf will not distribute let’s encrypt, maybe the google one will be there by the end of the year
Topics from GitHub discussions and bots
- only un-answered DIRAC and DiracX topics with discussion updates:
- Opensearch question
- Safely pass secrets within jobs
- Federico some private communication
DIRAC releases
- v8.0.71
- FIX: (#8083) htcondor x509 unsupported version
- CHANGE: (#8072) conditionally reset the rlimit for xroot
- CHANGE: (#8070) Disable Bearer token for HTTPs unless upload/TPC
- NEW: (#8046) findFileByMetadata method for Rucio
DIRAC projects
DIRAC:
Issues by milestone:
- v8.0:
- Using cgroups to limit job resource usage
- Federico I was hoping Simon you would complete the job…
- Using cgroups to limit job resource usage
- v9.0:
- Left 2 bug reports in there
- After v9.0:
- NTR
- dirac-install-component can’t install Tornado service
- stompy errors in matcher
- DMS Error: Server error while serving listDirectory
PRs discussed:
- [8.0] fix: DISET and proxy location
- hotfixed in LHCb
- [8.0] Add PreferredURLPattern for URL sorting
- input from Belle2, which likes the idea of global setting
- [9.0] feat: added foreign keys to PilotAgentsDB
- [8.0] Clear any non-UTF encodable environment variables in pilots
WebApp:
- from previous meeting One draft PR
Pilot:
- Robin started work for using new pilot security model (DiracX) note
DIRACOS:
- New release https://github.com/DIRACGrid/DIRACOS2/releases/tag/2.52
- FIX: (#144) Remove setrlimit in XRootD
Documentation:
- NTR
OAuth2:
- NTR
management
- NTR
diraccfg
- NTR
DB12
- from previous meeting https://github.com/DIRACGrid/DIRAC/issues/7760#issuecomment-2482420604
- Federico proposed to create “alternate” benchmark
Rucio
- NTR, apart from the fact that the Rucio metadata catalog PR was included in the last release.
Tests
- from previous meeting Federico Started adding Rucio to Dirac integration tests
- –> to Janusz
DiracX:
- Road Map : https://github.com/chaen/diracx/blob/roadmap/docs/ROADMAP.MD
- dependabot alert
- 4 alerts, but last 3 are for the same issue
Issues
- Lack or foreign key use in databases could lead to orphan data
- Federico created DIRAC PR (would need a porting in DiracX if accepted)
- Open access and require auth not working inside a router
PRs discussed:
- feat: enable remote pilot logging system [MISSING AUTH]
- Federico last comment is “Waiting for the pilot authentication to be ready”, what does it mean exactly? is it expected by who?
- Robin can work on it --> Adding pilot registrations and authentification
- Federico last comment is “Waiting for the pilot authentication to be ready”, what does it mean exactly? is it expected by who?
- chore: remove
__all__
from non-__init__
files - fromPreviousMeeting OTel proof of concept PR: https://github.com/DIRACGrid/diracx/pull/379
- This is a doc PR that is not yet merge-able. Christophe giving instructions to Jorge on how to proceed
DiracX-charts:
- NTR
DiracX-web:
- Lots of “bump” auto PRs
Release planning, tests and certification
-
Certification machines
- NTR
-
Next hackathon(s)
- not sure…
-
Federico We will tag DIRAC v9 (and diracx, web, etc) in around a week.
Next appointments
-
Meetings:
- BiLD: in 2 weeks
-
WS/hackathons/conferences:
- DiracX hackathon: 5 and 6 May - https://indico.cern.ch/event/1501369/
- few registered already
- Dirac Users’ Workshop: 17th-20th September 2205 - https://indico.cern.ch/e/duw11
- registrations open, Xiaomei added few info for VISA
- DiracX hackathon: 5 and 6 May - https://indico.cern.ch/event/1501369/
AOB
- fromPreviousMeeting DIRAC was invited to be an “HSF affiliated project” : https://hepsoftwarefoundation.org/projects/affiliated.html
- Andrei, André, Federico met with Edoardo and Michel Jouvin for few clarifications. Andrei will call a consortium meeting
LHCbDIRAC
-
LHCbDiracX packages are now on pypi (“release 0.0.1a3”)
-
Alexandre posted update to Moving
Job finalization
step from the workflow to theJobWrapper
: Transition Plan for Enhancing HPC Exploitation in DIRAC/LHCbDIRAC with connected draft PR Draft: feat(wms): New LHCb workflows -
Hardware replacement campaign 2025: OTG – instructions in https://clouddocs.web.cern.ch/using_openstack/resizing_a_vm.html
- “Resize implies a downtime depending on the amount of data on the disk and the load on the infrastructure during the resize. It may last from 15 minutes to 8 hours.”
- "On Linux VMs, it’s strongly recommended to execute fstrim -va to discard any unused filesystem blocks, making the resize operation quicker. "
- Affected machines:
VM Name Recommended Action Flavor Compatible Flavor diracxcertif RESIZE m2.large m4.medium lbcertifdirac70 RESIZE m2.large m4.meidum lbdiraclogstash01 RESIZE m2.2xlarge m4.xlarge lbdiraclogstash02 RESIZE m2.2xlarge m4.xlarge lbvobox300 RESIZE m2.2xlarge m4.xlarge lbvobox401 RECREATE m2.3xlarge NONE
- last 4 in DT in next week?
-
Update plan: https://codimd.web.cern.ch/dnfwITCRRTSvhopGDjlHSA?both