BiLD (Bi-weekly DIRAC Development meeting) – 22/06/2023
At CERN: Federico, Alexandre, Andrè, Christophe, Christopher, Simon M
On Zoom: Andrei, Cedric, Daniela, Igor, Michel, Simon F, Ueda, Hideki, Janusz, Xiaomei, Vladimir
Apologies:
Follow-up from previous meetings
- Last “standard” BiLD 4 weeks ago
- Trying to catch all updates below
- The BiLD of previous week was dedicated to DiracX, minutes added
- Next Thursday: BiLDx meeting
- In 2 weeks: DiracX hackathon
- Last hackathon long time ago
DIRAC communities roundtable
LHCb:
Federico+Alexandre+Simon+Christophe+Christopher+Alexey
- Installed (almost) latest DIRAC production release (v8.0.21)
- Got a request from DESY on moving to HTCondor 10 (only tokens)
- Alexandre being finalized, instructions on how to make this happen will be provided
- SAM/ETF tests had to be updated
- Issuer ID to be agreed, so ticket-ed all the HTCondor CEs
- PRs needed
- Moved to use Singularity (Apptainer) “inner” CE everywhere
EGI
Andrei
- No updates on the production side
- Testing HTCondor and ARC with CheckIn tokens (multi-VO tokens), sites need to be configured. Not yet working with ARC.
- will be the same problem with Multi-VO IAM
CLIC/ILC/Calice
André
Belle2
Hideki, Michel, Ueda
- Updating to py3
- ActivityMonitor stopped working
- Andrei what about removal of support for GSI?
GridPP:
Daniela, Simon, Janusz
- Upgraded preprod server to v8.0.21
- Expecting to upgrade main server within the next month. Current target date is July 5th. Users have been warned.
- A note on Operations: We finally found the root of our intermittent File Catalog overloads. A user had issued a malformed RMS request in January, and DIRAC didn’t give up even after 540000000+ (half a billion in case you don’t want to the zeros) requests.
My poor server: http://www.hep.ph.ic.ac.uk/~dbauer/tmp/dirac01.png
In the end we had to hack the production database and set all the individual file requests to Failed by hand as just cancelling the request didn’t do anything:
update File set Status = ‘Failed’ where LFN like ‘/t2k.org/%’ and Status = ‘Waiting’;
proved the solution to our troubles. Simon expects this to be re-implemented in ‘awesome’ (or possibly even in ‘configurable to not try eternally’) in diracX. Daniela is unimpressed. This seems to happen a lot to her lately.- Christophe I am interested in the logs, because normally all the protections that you are asking are already in place. “Cancel” is the nuclear option and never seen it failing
- Andrè Is it using FTS? I have seen some issues sometimes there
- Daniela Yes it is.
Juno
Xiaomei
- v8.0.21 on pre-production server.
- Later will try using tokens for CEs. Juno is using dedicated IAM instance.
JINR
Igor
- Trying to fix
psutil issue reported in discussions (file busy). Some jobs had to be rescheduled. - Can run JINR DIRAC from CVMFS.
- from previous meeting Pilots rather “large”, basically pointing to https://github.com/DIRACGrid/Pilot/issues/166
Topics from GitHub/Discussions
only un-answered topics with discussion updates:
DIRAC releases
- v7r3
- No patches created recently, but should be done
- v8r0
- v8r1
DIRAC projects
DIRAC:
Issues by milestone:
v7r3:
- Only one issue left, about documentation. Christophe do you still want to take care?
v8.0:
v8.1:
- 20+ open issues
- Some of these might be closed/moved
-
Other issues:
PRs discussed:
WebApp:
Pilot:
DIRACOS2:
Documentation:
- NTR
- Message from RTD for config file…?
- This was for the empty and obsolete "pilot" documentation project. This was deleted now
OAuth2:
from previous meeting Andrei request from EGI to demonstrate that one VO can run with tokens only
- On that way. JobAgent should also be instrumented with that. User with only token can’t do that.
from previous meeting Check In is progressing: compute scopes available, they are accepting the idea of using client access tokens (possibility to associate a client to a given VO). They would probably not accept a same client to deal both with client and user access tokens (security concerns with the scopes available in the clients).
from previous meeting
tornado/HTTPs
management
- from previous meeting 3 issues left, still valid
diraccfg
COMDIRAC
Daniela
DB12
Alexandre
Rucio
Tests
Release planning, tests and certification
Certification machines
- lbcertifdirac70 machine:
- Federico not rush, but should we move to a Alma9 box?
- Outside of CERN would be better, in CC probably
- Andrei machine is already there, need to decide how to set this one up
- We could also use the new box to test the installation procedure
Next hackathon(s)
- in 2 weeks, “standard” v8.1.0aX one.
AOB
Next hackathon: in 2 weeks (just after the DiracX hackathon – either there or much later)
Next BiLD: July 27th
DIRAC+Rucio WS https://indico.cern.ch/event/1252369/
LHCbDIRAC
- v11.0: deploy board in https://trello.com/b/Ep0PAkbv/deploy-110
- Singularity CE everywhere
- Chris B “CTA apparently has something similar to the lbmcsubmit stuff so it might make sense to move some of my gitlab <-> dirac production system stuff to vanilla dirac.”
- Federico I talked with them yesterday, they will put on GitHub what they have
- LHCbDIRAC hackathon?
- https://lhcb-auth.web.cern.ch/