BiLD – 31/10/2024
At CERN: Federico, André, Christopher, Alexandre, Ryunosuke, Cedric
On Zoom: Simon, Daniela, Hideki, Andrei, Vladimir, Xiaomei, Janusz
Apologies:
Follow-up from previous meetings
- Last BiLD was October 3rd
- Last DIRAC certification hackathon on October 10th
- CHEP 2024 19th-25th Oct, Krakow
- Federico points I retained:
- There is a WLCG Token Trust and Traceability WG that provides the same recommendations that we always heard for proxies ( Daniela: Well, it tried to solve the same problems…)
- IAM talk. Daniela: Development still driven by WLCG, technical debt, MFA now implemented. It looks like the developers would like to have a proper multi-VO IAM as they clearly get pressured by their own immediate employer to do so :-)
- Alice is doing interesting stuff with their full-node submission
- Alice implemented cgroups for subdivision of WNs (we should do the same)
- “The correlation between HS23 and DB12 (whole) is “almost” acceptable”
- CMS is overloading
- Daniela would like to note that LHCb should not be overloading at the Tier2s. If you look at the actual numbers, CMS efficiency even with overloading rarely tops 75% at the Tier2s, while e.g. here at Imperial LHCb hovers around 98%
- Federico don’t worry, LHCb was not thinking about doing the same!
- I found only one talk from DUNE
- Yes, JustIN, by LHCb export A McNab. (Comment by Daniela)
- Daniela Apart from Federico’s plenary DIRAC was also mentioned in Xiaomei’s JUNO plenary, and twice during parallel sessions (CTA & HERD)
DIRAC communities roundtable
LHCb:
Federico+Alexandre+Christopher+Vladimir
- Stressing the Transformation System, applying patches
- having thousands of Transformations, some of them having 10M+ jobs
- TransformationAgents seem “fine”, WorkflowTaskAgent performances need to improve
- Stressing also the WMS, some jobs running twice (same jobID picked up by 2 different pilots)
- Sending jobs to message broker stops the rest, if the broker is down issues arise. Timeout added, but single lock around logging is problematic
- patched in latest release
- error message coming from M2Crypto, sending negative data
- also patched in latest release
ILC/Calice/FCC
André
EGI
Andrei
- Setting up token based pilot submission with the production instance of Check-In (biomed)
- Setting up the diracx test system with K3S in parallel with DIRAC 9.0. Struggling with setting up the basic setup (cert-manager, DEX, …)
Belle2
Hideki, Cedric
- Completed migration to EL9, and to OpenSearch (still v1)
GridPP:
Daniela, Simon, Janusz
- We would like to upgrade to the latest v8 version, but we have too many problems with unrenewed proxies. Not every VO went down the WLCG route of 7 day proxies, so we tend to see them more often: https://github.com/DIRACGrid/DIRAC/discussions/7842
- Andrei the current CEs are not renewing the pilot proxy, so at the moment the only way is to have long pilot proxies
- Alexandre ARC is doing it (upgrading the delegation), I do not think HTCondorCE does it. Anyway this would not help with the bundled proxy that we are using now
- proxy renewing itself? does not seem “correct”
- Andrei proxy delegation thread in SiteDirector?
- or maybe a Dirac-only internal token/proxy (without VOMS) only for renewing the user proxy
- maybe re-using the same solution used for the CloudCE. Simon will have a look
Topics from GitHub/Discussions
only un-answered topics with discussion updates:
DIRAC releases
DIRAC projects
DIRAC:
Issues by milestone:
Other issues:
PRs discussed:
WebApp:
- Sencha request (how it developed)
- Sencha made an official request “to EGI” asking for license explanation. After some exchange, Sencha got convinced that a paid license is not needed
- Upgrade ExtJS to 6.6? (right now we are using 6.2)
- …maybe NOT! (and no volunteers!)
Pilot:
- from previous meeting Janusz some doc to write
DIRACOS:
Documentation:
- from previous meeting Need to decide on strategy for DiracX documentation – André to take care?
OAuth2:
management
- from previous meeting Always upload releases to CVMFS
- still not working (did not work for for 8.0.53)
- Andrei created a new script, so PR needed
- Daniela We just had a problem for a different VO (comet) where cvmfs would not distribute newly created directories, but happily update new content in old ones - not sure if this could be related ?
diraccfg
DB12
Rucio
Tests
DiracX:
Issues
PRs discussed:
DiracX-charts:
DiracX-web:
- Alexandre
- Improvements to JobMonitoring app in a large-ish
- Added the user’s details at login
- Some tests for the extension needs to be added
- asked Ryunosuke to be a reviewer for the DiracX-web PRs
Release planning, tests and certification
-
Certification machines
- No updates, using the old setup at CERN:
- lbcertifdirac70 for DIRAC code
- DBs from CERN
- DiracX running on paas.cern.ch (OpenShift)
-
Next hackathon(s)
Next appointments
AOB
- Projects for ISIMA students
- Deadline: mid of November
LHCbDIRAC
- BKK:
- on
- Several queries taking way longer than what they should
- Chris “won’t survive another year” in the current conditions
- ConsistencyChecks performance improved
- Ryun possibility of running on a subset of a BkQuery for prescaling
- Ryun’s DAG graphs: we can run “DAGs” in productions (sequentially)
- https://gitlab.cern.ch/-/snippets/3303
- CWL, one day
- from previous meeting Some runs without luminosity. Total luminosity is there.
- Frèderic Hemmer to check it
There are minutes attached to this event.
Show them.