25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

Aligning DIRAC Workflows with CWL: A Unified and Reproducible Workflow Model for Grid-Scale Computing

27 May 2026, 14:03
18m
MHMK 202

MHMK 202

Oral Presentation Track 4 - Distributed computing Track 4 - Distributed computing

Speaker

Ryunosuke O'Neil (CERN)

Description

Delivering reproducible computational workflows across heterogeneous and distributed computing infrastructures remains a significant challenge for many scientific communities. Workflow standards such as the Common Workflow Language (CWL) offer a portable and declarative means to describe complex pipelines but their integration into large-scale, data-driven workload management systems remains an open and evolving area.

DIRAC is a workload and workflow management system used by scientific collaborations to operate distributed computing resources spanning grids, clouds, and high-performance computing systems. While DIRAC provides mature mechanisms for job scheduling, data management, and large-scale productions, it has historically relied on a combination of DIRAC-specific workflow descriptions expressed through Python APIs, XML payloads, and Job Description Language (JDL) files. This fragmentation complicates interoperability with external workflow tools and limits the reuse of workflows outside the DIRAC ecosystem.

In this paper, we present the current state of the integration of CWL into DIRAC as a unified workflow specification. Rather than using CWL as a simple submission or translation layer, we progressively align the DIRAC workflow model with CWL semantics. CWL is used to express workflow structure, execution steps, resource requirements, and containerized execution environments, while DIRAC retains responsibility for data handling and large-scale execution on distributed resources. This work is conducted in the context of DiracX, the next-generation evolution of DIRAC, and builds on early technical exchanges with CWL maintainers.

We report on the implementation status and initial feedback from scientific communities experimenting with CWL-based workflows within DIRAC. These early results highlight both the benefits and the remaining challenges of operating CWL workflows at grid scale. They also illustrate how adopting a standard workflow language can improve portability, interoperability, and reproducibility across distributed computing environments.

Author

Co-authors

Jorge Lisa Laborda (IFIC) Mr Loris Van Katwijk (LUPM IN2P3/CNRS) Luisa ARRABITO (LUPM IN2P3/CNRS) Dr Mykhailo Dalchenko (University of Geneva) Natthan PIGOUX Ryunosuke O'Neil (CERN) Stella-Maria Renucci (LUPM IN2P3/CNRS)

Presentation materials

There are no materials yet.