BiLD (Bi-weekly DIRAC Development meeting) – 28/05/2020
At CERN: Nobody, of course!
On Vidyo: Federico, Andrei, Andrii, André, Christopher, Christophe, Igor, Daniela, Hideki, Janusz, Alexandre, Simon, Cedric, Aymane, Vladimir
Apologies: Marko
Follow-up from previous meeting
NTR
DIRAC communities roundtable
GridPP:
- Working on v7r1 in certification setup
- not going very well
- Solved the issues with the proxy and M2Crypto
- Switched to HTCondorCEs at Imperial, trying from DIRAC (v6r22) shows lots of jobs in “Held” state: proxy keeps expiring?
- It is not clear if this is due to DIRAC or the CE
CLIC:
- Tried dirac-management for creating tarballs inside our CI
- Re-added -D/–destination option for outputting tarballs
- Works
- Xroot5 in diracos: Marko working on it
LHCb:
- M2Crypto issues: all looks like corrected
- one flag still needed for high scale, might become the default in one or 2 patches
- Pilot3 files: on s3 based web (also)
- Running on HPCs: CINECA (all “standard” but with fat KNL nodes). SDumont (no CE, SLURM, fat nodes, limited CPUTime)
- CINECA looks OKish but there are some doubts on how DB12 works there. Also, several jobs are killed by the watchdog, to be investigated.
- One issue on SDumont: computation of CPU time left need to be fixed (see discussions in https://github.com/DIRACGrid/DIRAC/issues/4544)
France Grilles:
- Pilot3?
- Strong request came to maintain the REST interface (RESTDirac extension)
- maintaining it is a nuisance right now (it is on a separate machine)
- Christophe: once the core or DIRAC will talk https then it will be trivial
- Andrei: nevertheless we have an operational issue right now.
EGI:
- Check-in being tested: resolved a few bugs on our side
- Running v7r0 in production
- Development machine is based on v7r1
Belle2:
- Migration to v6r22 ongoing
- Thinking about moving to Pilot3:
- Question about how to feed Pilot3 file and how the pilot wrapper works.
- Federico: we can’t put pilot files on CVMFS
- The dirac-distribution container does not work with the structure of BelleDIRAC, where the web and DIRAC extensions are merged
- Rucio and DIRAC:
- certification is starting in BelleDIRAC
- mid of June, when validated, it will be committed to vanilla DIRAC
- covers all the methods that were in Lcg FC (in a way this is specific to Belle2)
Nica:
- Updated DIRAC to v7r0p24
- Users needing dirac-dms-* scripts in their jobs, and this is still not possible (Pilot3)
- DIRACOS was downloaded from lhcb-rpm.cern.ch, created a ddos attack (Andrei should have sorted it out uploading to DIRACOS)
- New site in Mexico (SSH CE, torque):
Juno:
- Moving from CREAM to HTCondor CEs (no issues)
- Planning to move to version 7
Current situation
DIRAC
- v6r22:
- v7r0:
- v7r1:
- v7r1p3 created, just inheriting changes from v6r22 and v7r0
- v7r2:
WebApp:
- looking for new responsible for the WebApp as Zoltan has left
- Tests fixed for v4r0, not v4r1 (prettier)
- Some tasks to look at
Pilot3:
DIRACOS:
- xroot5?
- Chris looking at it, taking over from Marko (who is on vacation). Not so easy. Evaluating the effort needed.
- Discussing how to evolve in DIRACOSv2
VM:
- Igor: small fix in PR, to be merged.
Documentation:
OAuth2:
- Being tested in EGI framework
tornado and other externals
management
- All versions from releases.cfg uploaded to EGI CVMFS
- Andrei made a script to pick up all the packages from releases.cfg (stand-alone script)
- Still “private” but can be and should be added to this repo
- Maybe can be added to the dirac-distribution
- https://github.com/DIRACGrid/DIRAC/issues/4604
- in general: we should be doing more automated deploys
diraccfg
Release planning, tests and certification
Weekly development(s) focus
NTR
DIRAC: current PRs and tasks being worked on, or topics from Google forum
PRs:
On issues:
AOB
Next BiLD in 2 weeks.
LHCbDIRAC
- Creation of releases: should be fine now (anyway, we’ll need more automation)
- M2Crypto:
- looks fine, only 1 machine still, can be tried on more than one
- Client using M2Crypto: we should try out ourselves first
- BKK
- password updated also for the production instance
- Cert instance (INT12r) moved to 19c
- need to update the instant client on the machine (in puppet) [Chris]
- A bit more sensible docs to go to https://lbdevops.web.cern.ch/lbdevops/
- Chris B: is it OK to use a certificate for downloading user proxies?
- A: no other solutions, please document in the docs above