BiLD-Dev
Bi-Weekly "Loyal" DIRAC developers meeting. And, following, the LHCbDIRAC developers meeting.
Zoom: BiLD
https://cern.zoom.us/j/62504856418?pwd=TU1kb01SOFFpSDBJeWVBdU9qemVXQT09
Meeting ID: 62504856418
Passcode: 12345678
BiLD (Bi-weekly DIRAC Development meeting) – 22/06/2023
At CERN: Federico, Alexandre, Andrè, Christophe, Christopher, Simon M
On Zoom: Andrei, Cedric, Daniela, Igor, Michel, Simon F, Ueda, Hideki, Janusz, Xiaomei, Vladimir
Apologies:
Follow-up from previous meetings
- Last “standard” BiLD 4 weeks ago
- Trying to catch all updates below
- The BiLD of previous week was dedicated to DiracX, minutes added
- Next Thursday: BiLDx meeting
- In 2 weeks: DiracX hackathon
- Last hackathon long time ago
DIRAC communities roundtable
LHCb:
Federico+Alexandre+Simon+Christophe+Christopher+Alexey
- Installed (almost) latest DIRAC production release (v8.0.21)
- Got a request from DESY on moving to HTCondor 10 (only tokens)
- Alexandre being finalized, instructions on how to make this happen will be provided
- SAM/ETF tests had to be updated
- Issuer ID to be agreed, so ticket-ed all the HTCondor CEs
- Andrei add to VO card
- PRs needed
- Alexandre being finalized, instructions on how to make this happen will be provided
- Moved to use Singularity (Apptainer) “inner” CE everywhere
EGI
Andrei
- No updates on the production side
- Testing HTCondor and ARC with CheckIn tokens (multi-VO tokens), sites need to be configured. Not yet working with ARC.
- will be the same problem with Multi-VO IAM
CLIC/ILC/Calice
André
- NTR
Belle2
Hideki, Michel, Ueda
- Updating to py3
- ActivityMonitor stopped working
- Andrei what about removal of support for GSI?
- not yet fully decided
GridPP:
Daniela, Simon, Janusz
- Upgraded preprod server to v8.0.21
- Also done to test fixed wrt dirac.cfg/pilot.cfg as noted here: https://github.com/DIRACGrid/Pilot/issues/188
- One bug found: https://github.com/DIRACGrid/DIRAC/pull/7067 (merged)
- Initial tests for AREX, HTCondor & Cloud worked (apart from the issues mentioned above).
- Expecting to upgrade main server within the next month. Current target date is July 5th. Users have been warned.
- A note on Operations: We finally found the root of our intermittent File Catalog overloads. A user had issued a malformed RMS request in January, and DIRAC didn’t give up even after 540000000+ (half a billion in case you don’t want to the zeros) requests.
My poor server: http://www.hep.ph.ic.ac.uk/~dbauer/tmp/dirac01.png
In the end we had to hack the production database and set all the individual file requests to Failed by hand as just cancelling the request didn’t do anything:
update File set Status = ‘Failed’ where LFN like ‘/t2k.org/%’ and Status = ‘Waiting’;
proved the solution to our troubles. Simon expects this to be re-implemented in ‘awesome’ (or possibly even in ‘configurable to not try eternally’) in diracX. Daniela is unimpressed. This seems to happen a lot to her lately.- Christophe I am interested in the logs, because normally all the protections that you are asking are already in place. “Cancel” is the nuclear option and never seen it failing
- Andrè Is it using FTS? I have seen some issues sometimes there
- Daniela Yes it is.
Juno
Xiaomei
- v8.0.21 on pre-production server.
- Later will try using tokens for CEs. Juno is using dedicated IAM instance.
JINR
Igor
- Trying to fix
psutil
issue reported in discussions (file busy). Some jobs had to be rescheduled. - Can run JINR DIRAC from CVMFS.
- from previous meeting Pilots rather “large”, basically pointing to https://github.com/DIRACGrid/Pilot/issues/166
- a solution coded in https://github.com/DIRACGrid/Pilot/pull/187
Topics from GitHub/Discussions
only un-answered topics with discussion updates:
- Are you still using a
HTCondor
local schedd to submit pilots? And why?- Still needed.
- AgentMonitoring and JobAgent entries in ES
- I guess it’s a bug…
DIRAC releases
- v7r3
- No patches created recently, but should be done
- v8r0
- v8r1
- NTR
DIRAC projects
DIRAC:
Issues by milestone:
v7r3:
- Only one issue left, about documentation. Christophe do you still want to take care?
- will close.
v8.0:
- 15+ open issues
- Closed some yesterday, some more of these might be closed/moved
- Mix of groups for ReportGeneratorHandler
- seen in LHCb production – might take care only on Web part
v8.1:
- 20+ open issues
- Some of these might be closed/moved
PRs discussed:
- [v7r3] Change and fix proxy renewal logic in AREX
- [8.0] fix the interactions between the Matcher and the PoolCE
- Tested and Merged
- [v8r0] Close DISET selector when we no longer need it
- Merged
- [8.0] fix: interacting with CEs using tokens
- Ready to be merged?
- [8.1] Validating the JDL format using pydantic
- OK to merge?
- [8.1] Parsing the jdl input by using the Job API
- To be kept or not? – not
WebApp:
Pilot:
- [integration] Remote Pilot Logger to Tornado
- (DIRAC PR)
- [devel] fully removed dirac-install and python2 DIRAC client installations
- OK to merge? (
devel
branch)
- OK to merge? (
- master: add new command RegisterPilot
- Tested in Jenkins, OK?
- Setting up preinstalled DIRAC in a pilot
- Andrei going in the direction of what Chris is suggesting. Do we really need to install the other architectures?
- Christopher probably not, but there ways to do that
- Andrei will update the PR. After maybe LHCb can use that. Mechanism for
bashrc
for different VOs will need to be agreed.
- Andrei going in the direction of what Chris is suggesting. Do we really need to install the other architectures?
- fix: adding DIRACSYSCONFIG (for pilot.cfg) to local environment
- Merged
- Actions removed support for py2.7
- Should we try a workaround or we leave like this?
- Christopher coded a workaround during the meeting
- Should we try a workaround or we leave like this?
DIRACOS2:
- Files not found when sourcing diracosrc with latest diracos release
- Fixed, waiting for release
- Updating DIRACOS2 to OpenSSL 3
- Fixed
- from previous meeting Python 3.11 seems to be fine
- Propose to update immediately after OpenSSL 3
- Should be OK to go: https://github.com/DIRACGrid/DIRACOS2/pull/78
Documentation:
- NTR
- Message from RTD for config file…?
- This was for the empty and obsolete "pilot" documentation project. This was deleted now
OAuth2:
from previous meeting Andrei request from EGI to demonstrate that one VO can run with tokens only
- On that way. JobAgent should also be instrumented with that. User with only token can’t do that.
from previous meeting Check In is progressing: compute
scopes available, they are accepting the idea of using client access tokens (possibility to associate a client to a given VO). They would probably not accept a same client to deal both with client and user access tokens (security concerns with the scopes available in the clients).
from previous meeting
- WLCG timeline document: https://zenodo.org/record/7014668#.YyLag9JBwQ9
- Meeting note from 8/12/2022: https://demo.hedgedoc.org/cWl2Y5MtTwSn4V5Uic_K5w
tornado/HTTPs
- from previous meeting Issue https://github.com/DIRACGrid/DIRAC/issues/6495 keeps track of what can and what will not be moved to https
- might not conclude this – might close it now!
management
- from previous meeting 3 issues left, still valid
diraccfg
- version 1.0? still tbd
COMDIRAC
Daniela
- I realize it’s fairly basic, but could we please merge https://github.com/DIRACGrid/DIRAC/pull/7024 before the next hackathon ?
- Done!
DB12
Alexandre
- NTR
Rucio
- NTR
Tests
- NTR
Release planning, tests and certification
Certification machines
- lbcertifdirac70 machine:
- NTR
- Federico not rush, but should we move to a Alma9 box?
- Outside of CERN would be better, in CC probably
- Andrei machine is already there, need to decide how to set this one up
- We could also use the new box to test the installation procedure
- Outside of CERN would be better, in CC probably
Next hackathon(s)
- in 2 weeks, “standard” v8.1.0aX one.
AOB
Next hackathon: in 2 weeks (just after the DiracX hackathon – either there or much later)
Next BiLD: July 27th
DIRAC+Rucio WS https://indico.cern.ch/event/1252369/
- Registration is open, do register, pay fee, take flights etc.
- block booking of few hotels
- note for VISA: https://indico.cern.ch/event/1252369/page/29513-visa-information
- poster prepared, I asked a few of you to make some publicity…
LHCbDIRAC
- v11.0: deploy board in https://trello.com/b/Ep0PAkbv/deploy-110
- Singularity CE everywhere
- Chris B “CTA apparently has something similar to the lbmcsubmit stuff so it might make sense to move some of my gitlab <-> dirac production system stuff to vanilla dirac.”
- Federico I talked with them yesterday, they will put on GitHub what they have
- LHCbDIRAC hackathon?
- no rush
- https://lhcb-auth.web.cern.ch/
- Used it in prod