Hybrid analysis pipelines in the REANA reproducible analysis platform

Diego Rodriguez Rodriguez (CERN)


In this paper we introduce and study the feasibility of running hybrid analysis pipelines using the REANA reproducible analysis platform. The REANA platform allows researchers to specify declarative computational workflow steps describing the analysis process and to execute the workflow pipelines on remote containerised Kubernetes-orchestrated compute clouds. We have designed an abstract job controller component permitting to execute different parts of the analysis workflow on different compute backends, such as HTCondor and SLURM. We have prototyped the designed solution including the job execution, job monitoring, and input/output file transfer mechanism between the various backends used in the computational workflow. We have tested the prototyped solution using several model particle physics analyses. The present work paves the way towards supporting hybrid analysis workflows in the REANA reusable analysis platform and studies the underlying reproducibility challenges inherent to using hybrid analysis patterns in particle physics data analyses.

Diego Rodriguez Rodriguez (CERN) Jan Okraska (University of Warsaw (PL)) Rokas Maciulaitis (Ministere des affaires etrangeres et europeennes (FR)) Tibor Simko (CERN)

