BiLD (Bi-weekly DIRAC Development meeting) – 28/10/2021
At CERN: Federico, Alexandre, Christophe, Christopher, André (in our offices…)
On Zoom: Andrii, Cedric, Hideki, Janusz, Simon, Xiaomei
Apologies: Ueda, Andrei
Follow-up from previous meetings
- Sweeping: Applied also on WebAppDIRAC
- Andrei New RFC/Discussion to replace ?
- hackathon of 21st October, 8.0.0a5:
- second hackathon on v8.0 pre-release
- not many issues found, but also not much new code specific to 8.0 went in
DIRAC communities roundtable
GridPP:
Simon+Janusz
- from previous meeting issue with vomses/vomsdir on client installations
- created PRs for management and for DIRACOS2 to change X509_VOMSES, X509_VOMS_DIR for using Dirac provided files if variables point to bad location
- should be fine, awaiting final proof
- testing 7.3 on dev instance. py2 server + py3 client
ILC/CLIC/FCC:
André
LHCb:
Federico+Christophe+Christopher+Alexandre
- issues with Optimizers getting stuck is more and more serious. Several fixes applied but no final resolution yet.
EGI:
Andrii
- from previous meeting Tried HTTP service with a FormationFileCatalog. Found that the http URL is not taking a specific service and rather uses the service name as the same as the handler name
- Christophe please make an issue with that, it’s probably fixable.
- Users trying to use HTTP, trying to move to py3 on specific server
Belle2:
Hideki
BES:
Xiaomei
- from previous meeting Tried to install 7.3 for updating to python3. Tried with new
install_site.sh
, failing somehow
Topics from GitHub/Discussions or Google forum
DIRAC releases
- v7r2p30:
- Core (#5492) Do not return failures when sending tasks to executors
- Core (#5461) Sleep instead of dropping requests when services are overloaded
- FrameworkSystem (#5492) ComponentInstaller: look also for HTTPs services
- TransformationSystem (#5509) Allow GroupSize parameter to be mutable, add setGroupSize to command transformation CLI
- WorkloadManagementSystem (#5480) delay to 45 minutes the execution of failover requests
- RequestManagementSystem several fixes
- 7.3.7:
- Resources (#5523) Use PilotManagerClient to get CE status instead of interrogating PilotAgentsDB in HTCondorCE
- Resources (#5500) VMDIRAC Utilities - added possibility to specify extra yum installable packages in the configuration
- Resources (#5500) VMDIRAC - added possibility to specify ssh connection to the VM in the configuration
- 8.0.05:
DIRAC projects
DIRAC:
Issues by milestone:
- v7r1:
- only 2 issues still open, for documentation, for Andrei
- v7r2:
- v7r3:
- v8.0:
- Andrii I made some progress reflected in the PRs
Other issues:
PRs discussed:
- v7r1 PRs have been closed
- we went through the existing PRs, brief comments for each of them.
WebApp:
- from previous meetings Problem with empty ComponentMonitoring data - to be checked
- need to use the DIRAC service, not go directly to ElasticSearch
- from previous meetings Started to make PRs to use BaseRequesthandler, for WebApp handlers
- Christophe more documentation is needed. I am looking at the PRs in DIRAC, it is taking me quite some time.
black
and the pr-sweeper
added here too
- Federico please do not merge PRs without review. No exceptions.
- PRs for patches and notes, should they be there?
- they are always creating conflicts, you can just push directly to the branches
Pilot:
- NTR
black
should be added here too (no pr-sweeper
as we don’t make releases).
DIRACOS:
DIRACOS2:
- from previous meeting
fts-rest
on py3?
- fts3-rest is added
- the “official” client will be worked “soon”
- from previous meetings Still missing
rucio
- from previous meeting few issue with
gfal2
and gsoap
, working to put in condaforge
- soon it should be available
- Christophe
gsoap
is only for SRM, maybe it won’t be needed…? Only used for tape. I will get news today about it
Documentation:
OAuth2:
tornado/HTTPs
- from previous meeting In 8.0 one of the main changes is in the base services. Andrii is working for having HTTP/REST DIRAC services that would accept both tokens and proxies (the focus is on tokens, also becuase NGINX does not accept proxies). This will be in 8.0
- Still, users once per year will need to upload their proxies because not all resources behind will digest tokens (and also the DISET DIRAC services will need to accept them)
management
diraccfg
COMDIRAC
- Added prepartions for deploying to pypi
- Added some actions for testing
- Started moving to py3
DB12
Alexandre
- from previous meetings Port to python3:
- next is (i) merge the PR (ii) make the package (iii) use it in DIRAC (iv) apply the “fix” 1.16 - 1.19.
- Alexandre Imane’s PR has been merged, I created another PR to add a function to correct the norm according to the Python version used and the CPU model. I also added the analysis and all the files to run it, this will be probably needed in the future.
other externals, including Rucio
- from previous meetings Janusz 2 PRs open against
rucio
, one will be useful for defining scope without a rucio config
- one PR merged, second still hanging
- Done some changes on the DIRAC side. Worked together with Simon on debugging the file uploads. The rucio side of DIRAC does not easily work in multi-VO. Rucio meeting today to discuss that.
- changes quite substantial
Release planning, tests and certification
-
Certification machines
- lbcertifdirac70 machine:
- lbcertifdiracoauth machine:
- Federico are there token-accepting test resources to try? to add to the setup there
-
Next hackathon(s)
AOB
Next hackathon on November 4th:
Next BiLD on November 11th:
LHCbDIRAC
- Federico the BKK is a lot of work. There’s plenty of possibility for SQL injections, looking to use https://cx-oracle.readthedocs.io/en/latest/user_guide/bind.html
- should evaluate if using directly SQLAlchemy would make sense
- Federico first, anyway the tests should be expanded
- Christopher Big PR for getting ready for
lb-prod-run
stuff. Moved submitAndMatch
test to gitLab, using also a secret in gitlab CI.
- TECH starting next week, program of work being developed in https://codimd.web.cern.ch/0VG3xjEWR3KXnlWT8osSYA?both
- time to submit project for ISIMA student, agreed on a generic project for “WMS” performance improvements, including
- improve SiteDirector to use
processpool
. This should be a first, ~easy task
- replace the executor framework with a proper message/task queue. E.g. using
celery
. This should be properly investigated first.
- Christophe the RMS could benefit of the same