BiLD-Dev
Bi-Weekly "Loyal" DIRAC developers meeting. And, following, the LHCbDIRAC developers meeting.
Join Zoom Meeting
https://cern.zoom.us/j/91083694183?pwd=ZkdDY1I1YkJVc2o3UTdBY1BRZE15UT09
Meeting ID: 910 8369 4183
Passcode: 12345678
One tap mobile
+41432107108,,91083694183# Switzerland
+41315280988,,91083694183# Switzerland
Dial by your location
+41 43 210 71 08 Switzerland
+41 31 528 09 88 Switzerland
+41 43 210 70 42 Switzerland
+33 1 8699 5831 France
+33 1 7037 2246 France
+33 1 7037 9729 France
Meeting ID: 910 8369 4183
Find your local number: https://cern.zoom.us/u/artUfAdNB
Join by SIP
91083694183@188.184.89.188
91083694183@188.185.118.153
Join by H.323
188.184.89.188
188.185.118.153
Meeting ID: 910 8369 4183
Passcode: 12345678
BiLD (Bi-weekly DIRAC Development meeting) – 24/06/2021
At CERN: Still, nobody :/
On Zoom: Federico, André, Andrei, Andrii, Cedric, Christophe, Christopher, Hideki, Simon, Vladimir
Apologies: Daniela
Follow-up from previous meetings
- Ran hackathon last week on 7.3.0a12:
- trello board here
- Usual few issues found, but nothing dramatically wrong
- Next week we’ll do it on py3 server
- PRs merging strategy?
- this is done, but some discussions emerged so not yet applied (see PR)
- Poll on using
blackor not: https://docs.google.com/forms/d/1c0B4l5AfZTg8UZXG4DPQw_OZ59YOTYd2leesnYiPhIY/- 6 yes, 3 no
DIRAC communities roundtable
GridPP:
Daniela
- “we are testing v7r2 and I cannot get multicore jobs to work on HTCondorCEs. They get matched, but do not get assigned 8 cores by the batch system. It works for ARCs. I added a UKI-SOUTHGRID-RALPP to the certification server to be able to test this and it looks the same (e.g.2978 & 2975).”
CLIC:
André
- CTA moving to production this week
LHCb:
Federico+Chris+Chris+Marko
- While running 1st Real Data productions since long time, found a couple bugs
- Most important is TransformatioManagerHandler discovers extensions of DBs
- This is NOT ONLY for TS, but potentially for all the services in DIRAC!
- PR discussed briefly, no objections
- looks like only Belle2 has DBs extensions
- Most important is TransformatioManagerHandler discovers extensions of DBs
- Installed ComponentSupervisionAgent on 2 machines, fix in PR (alredy merged) from Federico+André
- py3 client on CVMFS, advertised, not yet the default
- LHCbPilot (MR made, merged, functionality not yet used)
EGI (+ France Grilles):
Andrei
- from previous meeting M2Crypto and VOMS extension follow-up (issue)
- still ongoing
- from previous meeting ES with certificate access still slow?
- still ongoing
- Moved to using Pilot3
- Some users using curl complain for a very old (2009) version of curl in DIRACOS - can this be updated to something more recent ?
- issue to be discussed on github
Belle2
Hideki
- Migrated to v7r0. Currently working fine. Using DIRACOS (already from v6r22)
- Using Pilot3, also with extension (on CVMFS)
- Already starting to think about v7r1 migration
JINR
Igor
- working on JobLaunchPad, maybe could be useful also for vanilla DIRAC.
DIRAC releases
- v7r1p43:
- improved compatibility with v7r2
- v7r2p10 + v7r2p11:
- some py3 improvements
- (#5189) PoolComputingElement use concurrent.futures.ProcessPoolExecutor + adding a shutdown
- this fixes a rather important bug on Pool CE, that was exiting before the payloads were completed
- (#5199) remove CPUScalingFactor calls, should fix some TimeLeft calculations
- (#5185) Add DIRAC_FEWER_CFG_LOCKS environment variable to significantly improve multithreading performance in CS heavy workloads
- this should be removed “soon” anyway
- Christopher still need DIRACOS for that.
- The integration test jobs seem to fail a bit more often than the others – should be checked
- v7r3-pre12 (7.3.0a12):
- (#5178) JobMonitoring does not need to look into TaskQueue
DIRAC projects
DIRAC:
Issues by milestone:
- v7r1:
- A couple of issues opened and already addressed
- v7r2:
- Not much movements there
- v7r3:
- Not much movements there either
PRs discussed:
- from previous meeting [v7r2] Helper functions to make working with errors easier
- several comments to it, deserves some discussion, but no updates
- py3:
- installation docs
- server installation improvements
- trying to respect as much as possible what’s the current way of doing things, so e.g. keeping the
versionsdirectory
- trying to respect as much as possible what’s the current way of doing things, so e.g. keeping the
- more fixes
WebApp:
- Issue: Distributing the WebApp for Python 3 installations
- Would need 2 more packages, proposals:
Pilot:
- NTR
DIRACOS:
xroot5?- NTR
condor 9?- NTR
DIRACOS2:
- from previous meeting
fts-reston py3?- NTR
- Andrii jwt + other token stuff need to be added here
VMDIRAC:
- from previous meeting Agreed to merge into DIRAC for v7r3
- Simon not done yet
Documentation:
- PR with several updates, especially for py3 by Christopher + Federico
- André I will commit fixes in there
OAuth2:
- from previous meeting Any news from issue WLCG and Token transition – reminders and requirements ?
- NTR
- from previous meeting Andrii forwarded https://docs.google.com/document/d/1ZQzElD866yV6t9AomeW6r-L9s-64pNB95OFDd1EZEbA/edit?ts=60814a09#heading=h.ur0csvxkk8tc which contains technical drawings about the proposed implementation.
- no news
- Discussed the py2/py3 support
- the agreement is that py2 does not need to be supported. As of now the py2 tests are failing because of the imports, and the integration tests because DIRACOS does not (and will never contain) the required libraries. DIRACOS2 (which is py3 only) will need to have the required libraries. The py2 tests should be shielded from running on the affected files.
- we will test these developments in py3 only, on the specific certification machine, but only after the first hackathon.
tornado/HTTPs
- Federico opened PR for adding HTTPs Matcher
- moved to a draft after discussion, as Matcher isn’t the best example for moving to https (non-duplicatable). Many things in that PR still good, though
management
- from previous meeting 1 task left for deploying on CVMFS
- Not yet done, Marko left, Andrei should take over
- 3 PRs for py3 installations:
- install_site.sh
- WebApp compilation and distribution
- they should all be reviewed + merged before the hackathon next week
diraccfg
- one more release will need to go DIRACOS
COMDIRAC
- NTR
other externals, including Rucio
- Cedric + Federico made the tests to pass on original PR targeting v7r0. PR was merged 2 weeks ago.
- all future work should be done on v7r2+
- merging in v7r2+ highlighted issues with
rucio-clientspackage and python 3.9: https://github.com/rucio/rucio/issues/4670- as of now, some workarounds have been added, to be reverted
- also asked for
rucioon conda-forge
Release planning, tests and certification
-
Certification machine(s)
- lbcertifdiracoath machine:
- installed test py3 server
- authcertif machine:
- NTR
- lbcertifdiracoath machine:
-
Next hackathon
- On py3, and on lbcertifdiracoath machine: https://github.com/DIRACGrid/DIRAC/wiki/Certifications#oauth-certification-hackathons
- will be a mini-hackathon for testing python3 server
- prepared a new board type: https://trello.com/b/rG7JLalo/py3-mini-hackathon
- On py3, and on lbcertifdiracoath machine: https://github.com/DIRACGrid/DIRAC/wiki/Certifications#oauth-certification-hackathons
AOB and topics from Google forum
Next hackathon (py3) on July 1st: https://indico.cern.ch/event/1052653/
Next BiLD on July 08th: https://indico.cern.ch/event/1052655/
LHCbDIRAC
- https://trello.com/b/BQvEbteg/deploy-v10r2
- not much progress
- from previous meeting python3 client on CVMFS
- Ganga does not work --> now it does
- LHCbPilot would need to add a flag --> https://gitlab.cern.ch/lhcb-dirac/LHCbPilot/-/merge_requests/32
- SandboxStore: installed in prod
- 28% migrated -
rsync-ing - disabled the cleaning
- 28% migrated -
- We need new boxes + duplicate the JobManager