BiLD (Bi-weekly DIRAC Development meeting) – 17/11/2022
On Zoom: Federico, André, Andrei, Alexey, Alexandre, Christopher, Daniela, Hideki, Igor, Michel, Simon, Janusz, Xiaomei, Vladimir
Apologies:
Follow-up from previous meetings
- hackathon November 10th, on 8.1.0a4:
- Mostly concentrated on running on all flavors of ARC
- some fixes to be done for AREX
- (Minor) issue with accounting
- Last BiLD 2 weeks ago
DIRAC communities roundtable
LHCb:
Federico+Alexandre+Christophe+Christopher+Alexey
- Running in production v7.3.3x
- LHCbDIRAC hackathon done on top of 8.0.4, no (big) issues found, so soon-ish we will deploy it in production
- Discussing about migrating all ARC to ARC6, or maybe AREX
- Alexandre Tried to use
AREXCE
to push pilots with 10 ARC instances in the DIRAC certification environment- Pilots fail in every instances
- In 9 of them, there is no output (same result if I push a simple
ls
command instead of a pilot, so I guess it comes from the ARC instances) - In 1 of them (
ce2.cis.gov.pl
), I can get an output, the pilot does not seem to be able to contact DIRAC services, but I can see a non empty user.proxy
file, need to investigate - I also tried to push jobs with
arc-ce02.pic.es
(a special ARC instance leading to resources with no external connectivity) and there was no issue.
- Alexandre Conclusion: there are still many cases to investigate before using the
AREXCE
everywhere. We will probably have to migrate a few instances to ARC6
at least before transitioning entirely to AREX
ILC/CLIC/FCC/Calice:
André
- Started setting up RHEL 9 servers for moving to Dirac 7.3 and py3 servers
EGI:
Andrei
- Smooth running (v7r3, python3, Tornado for one server (catalog, TokenManager))
- One server with htcondor 9.11.2 for tokens testing.
- asked directly the developer to backport to 9.0 on conda-forge. Should be re-compiled “properly”.
- Setting up Galaxy VO
- Preparing a PR for SiteDirector fixes being tested right now (for token-pilot submissions)
Belle2
Hideki
- Using v7r2
- Deployed ElasticSearch, issues with ComponentMonitoring
- Federico ComponentMonitoring on ES never fully worked. What’s working:
- v7r2: WMSHistory
- v7r3: WMSHistory and JobParameters
- v8.0: “everything works” (it is documented), and ComponentMonitoring is spilt between Agent and Service Monitoring.
NICA
Igor
- Issues with remove of MaxJobInFillMode. Thanks to Federico: #6549. Tried that instruction and result will come in 3 days(default VOMS lifetime for NICA proxies)
- Observe rare block of DIRAC Pilot install during conda.exe run. 100 processes spawned and do nothing. Processes related to /tmp/mamba*** Investigating that.
IHEP/*
Xiaomei
- Using 7.3/py3
- Trying to configure central logging (following the guide)
GridPP:
Daniela+Simon+Janusz
- Production: v7.3.26
- Latest pre-prod: v7.3.33, still not fully happey (but then, are we ever?):
Topics from GitHub/Discussions or Google forum
only un-answered topics below:
DIRAC releases
- v7r3
- v7.3.3
- Resources
- NEW: (#6526) add the ARCLogLevel option in ARC
- NEW: (#6526) add the ComputingInfoEndpoint in ARC6
- NEW: (#6502) Added network selection to CloudCE.
- FIX: (#6516) Support using ARC that was compiled against SWIG 4.1+
- few more fixes for AREX added to rel-v7r3 branch
- v8r0
- v8r1
- v8.1.0a4
- Framework
- NEW: (#6450) added TornadoComponentMonitoringHandler
- NEW: (#6450) added TornadoNotificationHandler
- NEW: (#6450) added TornadoUserProfileManagerHandler
DIRAC projects
DIRAC:
Issues by milestone:
- v7r3:
- v8.0:
- 18 open issues, 4 new since last time, 1 for Bdii2CSAgent might be solved already
- v8.1:
Other issues:
- Requirements and tasks for token-based pilot submissions
- quite discussed/requested topic
- will be able to test hopefully in the hackathon next week
- will require v8.0+
- Simon can we use the
oidc-agent
packaged with conda?- Christopher not really, it’s packaging “is a disaster”
- probably worthy investigating why
PRs discussed:
WebApp:
- Some PRs with minor fixes merged in (v5.0 and v5.1)
Pilot:
- Updated
devel
version. Tried also in hackathon, seems not breaking anything of the existing. - Updated
devel
to install py3 DIRAC by default
DIRACOS2:
Documentation:
OAuth2:
tornado/HTTPs
- from previous meeting v8.0.1 WebApp ssl issue
- PR not merged, maybe “not to merge”
- Federico for “full production” setup we are not there yet
- is nginx “mandatory”?
- can we run more instances?
- Andrei there’s no “upload” solution for DIRAC SE
management
- from previous meeting 3 issues left, still valid
- Andrei Updated the script, should be uploaded here
diraccfg
- from previous meeting Christopher Do we want to make a release that drops support for py2?
COMDIRAC
- Daniela: I made great progress sitting in the back of the WLCG workshop. However I will spend the next three weeks with what Simon calls derisively “management” and I call “making sure stuff gets funded”, so unless someone volunteers this project is on hold until then. As a reminder the pull request can be found under: https://github.com/DIRACGrid/DIRAC/pull/6403
DB12
Rucio
Tests
Release planning, tests and certification
Certification machines
- lbcertifdirac70 machine:
- lbcertifdiracoauth machine:
Next hackathon(s)
- Next week, on lbcertifidirac70 and with v8.1.0aX
AOB
CHEP2023
- abstracts submitted:
- Federico going to do it today
- Daniela+Simon+Janusz submitting few abstracts
Confirmed next workshop as “DIRAC&Rucio workshop 2023”, KEK, 16-20 October 2023
- added to HSF calendar
- announced to Rucio WS last week
- no indico yet
Next hackathon: November 24th
Next BiLD: December 1st
LHCbDIRAC
- v10r4: deploy board in https://trello.com/b/kzUKdMts/deploy-v10r3
- https://lhcb-auth.web.cern.ch/
- from previous meeting Andrei this is not properly configured yet. Not usable for Pilot submission yet (compute scopes not added).
- Federico For the certification setup: I have updated several CEs to be either ARC6 or AREX. Removed others, set
dryRun=True
for Bdii2CSAgent - LHCbDIRAC hackathon
There are minutes attached to this event.
Show them.