6th Open Science Practitioners Forum: Analysis Workflows
Discussion from the Chat
1. Training & Tutorials (ATLAS, RECAST, Snakemake, REANA)
Discussion on whether to include this topic in analysis software trainings is ongoing.
There used to be some RECAST-specific sessions, e.g.:
https://alexschuy.github.io/2020-08-27-usatlas-recast-tutorial/index.html
Clarification: reference was to the week-long ATLAS software tutorials:
https://atlas-software.docs.cern.ch/analysis/
Additional training material and events also exist.
Request for Tutorials
Question:
Could you provide the link to Snakemake/REANA tutorials?
Response:
Snakemake tutorial:
https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html
REANA tutorial:
https://hsf-training.github.io/hsf-training-reana-webpage/
(Note: The REANA tutorial mostly uses Yadage; Snakemake parts still need to be finalized.)
2. REANA Usage (LHCb Example)
REANA usage in LHCb is currently limited.
One example involved running an analysis on REANA using Snakemake underneath, which made setup relatively smooth:
https://indico.cern.ch/event/1380367/contributions/5880485/attachments/2831210/4946726/M_Sarpis_LHCb_Analysis_with_Snakemake.pdf
3. Snakemake with HTCondor
Snakemake works on lxplus via HTCondor using the following profile:
https://github.com/Snakemake-Profiles/htcondor
Executor plugin repositories:
-
https://github.com/jannisspeer/snakemake-executor-plugin-htcondor
-
https://github.com/htcondor/snakemake-executor-plugin-htcondor
Identified Issue
The HTCondor executor plugin does not allow specifying JobFlavour.
Jobs are submitted with the default 20-minute time limit, causing longer jobs to be aborted.
Possible Solution
Supporting the necessary Condor classads for JobFlavour should be straightforward (documentation or implementation update).
In the plugin README, custom job resources must be defined with a classad_ prefix, e.g.:
classad_JobFlavour
Usability Feedback
It can be confusing that there is:
-
an HTCondor cluster plugin, and
-
a profile that uses the generic-cluster plugin under the hood.
Some participants expressed willingness to help improve the plugin.
Additionally, some changes appear to have been submitted upstream:
https://github.com/jannisspeer/snakemake-executor-plugin-htcondor/pull/16
4. SLURM, MPI & GPU Support
Question:
Is Snakemake more modern now? Can we run SLURM/MPI jobs?
Response:
Yes — there is a SLURM executor plugin:
https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html
The SLURM plugin supports:
-
MPI jobs
-
GPU jobs
Extended SLURM feature support is on the roadmap.
5. Grid Executors (ATLAS, CMS, LHCb)
A current limitation of Snakemake is lack of grid executor support.
Potential improvements:
-
ATLAS: PanDA executor plugin
-
CMS: CRAB integration
-
LHCb: Possibly DIRAC
Such integrations would significantly increase adoption.
6. Workflow Visualization
Question:
Are there tools to draw Snakemake workflows as diagrams?
Response:
Workflow visualization is an integral part of Snakemake.
7. Cross-Community Similarities & Computing Strategy
There are strong similarities between:
-
Particle physics
-
Astronomy
-
Other scientific communities
Computational workflow needs are largely similar.
Sociological factors often play as large a role as technical ones.
There are also plans to unify approaches to scientific computing in overlapping areas (e.g., within the ESCAPE project), including tools such as:
-
Rucio
-
REANA
User interface design and well-structured examples were highlighted as important adoption factors.
8. Software Provisioning (EESSI)
There is a long-term plan to provide Snakemake via the
European Environment for Scientific Software Installations (EESSI).
Development progress is ongoing but slow.