Main goal: Having a software tool to submit complicated (HEP) analysis workflows
Example of such a workflow:
- Reconstruction and/or Simulation of additional dataset samples
- Skimming of required datasets
- N-tuplization of the skimmed samples for further analysis
- Creation of additional quantities based on the n-tuples (including for example NN training)
- Histogramm production
- Statistical Inference & Plots
Essential & nice-to-have features for the workflow management:
- Take dependencies of different workflow steps into account
- Start subsequent workflow steps automatically after previous are finished
- Support different batch system backends (e.g. HTCondor, SLURM, qsub, etc.)
- Support of WLCG grid backends & most probably tools (gbasf2, crab, ...)
- Support of local & remote file access & transfer (gridftp, xrootd, webdav, ...)
- Support for sending the software environment to the batch system (tensorflow environment, CMSSW, basf2, ...)
- Support of pipelining of different steps of the workflow (so executing the next step of a workflow on part of data with finished previous step, while over parts of data still processed by that step)
- ...
Very suitable starting point for all this is luigi & that's why we are here :)