Luigi-based Workflow Management with b2luigi/law

Europe/Berlin
Description

Connection to Zoom:

https://cern.zoom.us/j/68140792288?pwd=RjQ3OS9DZzlwSjdaNHVLbXJ0ckRSdz09

    • 13:00 13:10
      Introduction 10m
      Speaker: Artur Il Darovic Gottmann (KIT - Karlsruhe Institute of Technology (DE))

      Main goal: Having a software tool to submit complicated (HEP) analysis workflows

      Example of such a workflow:

      1. Reconstruction and/or Simulation of additional dataset samples
      2. Skimming of required datasets
      3. N-tuplization of the skimmed samples for further analysis
      4. Creation of additional quantities based on the n-tuples (including for example NN training)
      5. Histogramm production
      6. Statistical Inference & Plots

      Essential & nice-to-have features for the workflow management:

      • Take dependencies of different workflow steps into account
      • Start subsequent workflow steps automatically after previous are finished
      • Support different batch system backends (e.g. HTCondor, SLURM, qsub, etc.)
      • Support of WLCG grid backends & most probably tools (gbasf2, crab, ...)
      • Support of local & remote file access & transfer (gridftp, xrootd, webdav, ...)
      • Support for sending the software environment to the batch system (tensorflow environment, CMSSW, basf2, ...)
      • Support of pipelining of different steps of the workflow (so executing the next step of a workflow on part of data with finished previous step, while over parts of data still processed by that step)
      • ...

      Very suitable starting point for all this is luigi & that's why we are here :)

    • 13:10 13:40
      b2luigi 30m
      Speaker: Michael Eliachevitch (University of Bonn)

      Comments & questions:

      • Matthias: how many schedulers are running? Communication with grid? Wrapper is a fine solution for the time-being.
        • Michael: only 1 scheduler normally, but definable with parameters. Can re-use the scheduler created by paramters to the task. Works with batch-system. Grid: Wrapping around the grid submission for Belle 2 (gbasf2), using job management from the wrapped tool. Working with DIRAC would perhaps allow more control on jobs from the grid (more complicated).
      • Marcel:
        • Question 1: submit many parallel jobs (slide 4); is this a deamon?
          • Michael: No, using a single process, but is in the core of the tool, so difficult to answer (JobWorkerSubmissionWorker)
        • Question 2: Slide 5: How to select a batch system?
          • Michael: via parameters of a Task: batch_system
        • Question 3: Concept of dispatchable tasks: Is it "just" Popen() ?
          • Michael: Essentially yes, but you can write a python function instead of running a script within.
      • Artur: interruption of CTRL + C, how is this solved?
        • Michael:
          • usual/local: all subtasks are killed
          • grid: grid jobs aren't killed, but the status is resumed.
          • htcondor: not killed, jobs will resubmitted, if output not available

       

    • 13:40 14:10
      law 30m
      Speaker: Marcel Rieger (CERN)

      Comments & questions:

       

      • Michael & Marcel: storing mapping of job ID to tasks is done via json files (for tracking finished & pending jons)
      • Moritz & Michael: what about resubmitted removed/deleted jobs (outputs)
        • picked up properly
      • Artur: HTCondor Clusters of jobs?
        • Marcel: Not done yet, but planned ---> submission faster
      • Matthias: input sandboxes are created per job ---> too many sandboxes produced. Can be solved by HTCondor clustering
        • Marcel: bypassing input HTCondor sandbox: job manager, job file factory per submission creating only shell script & jdl file.
        • Marcel: instead of creating a sandbox, create the environment by yourself (within a task), upload to remote SE, and download within job
      • Moritz: example of complicated + large-scale law examples?
        • Marcel sending it in a chat
        • Matthias: available at ETP
      • Michael: profiting within b2luigi & law works nicely through going upstream with parts going to luigi
        • Marcel: luigi tries to keep limited as it is, law is too overwhelming to be contribution package (e.g. having storage part)
      • Moritz: what about documentation:
        • Marcel: wants to pushing it as soon as ready with 2 papers in the pipeline.
    • 14:10 15:00
      Discussion 50m

      Questions as guideline:

      •  What is the basic idea of the tool? What the largest benefit from it?
      •  A simple, minimal example how to use it?
      •  How is the integration of batch systems solved in general?
      •  An example of an HTCondor based task (since this is where  most of the overlap is settled, as far as I can say)?
      •  Support of grid file-transfer tools/protocols?
      •  Is it possible to outsource at least the batch system  integration (with extended support of grid file-transfer tools)  into a common framework, which then can be used as a module  both for b2luigi and/or law?

      Discussion was about:

      • including parts of law in b2luigi and vice versa
        • should be possible
          • law parts are independent of each other and could be used within b2luigi
          • In addition b2luigi could be used as luigi within law to put profit from b2luigi developments within law
          • Could some Belle 2 specific part be put as a package into law?
          • What about crab and gbasf2 as grid backends for law?
      • gbasf2 vs. crab? Difference?
        • can be used for analysis jobs
        • some possibilities to ship additional software for both
        • but: gbasf2 not possible to break out of basf2 analysis paths ---> maybe possible to discuss with gbasf2 developers? (@Matthias)