Speaker
Description
ALICE has undergone a substantial software transformation from Run 2 to Run 3, embracing a message-passing, distributed-computing paradigm that unifies online and offline processing. Building on this shift, we present the Monte Carlo (MC) production framework developed within the O2DPG environment, which orchestrates full Run 3 and Run 4 simulation workflows across the heterogeneous computing landscape of the WLCG Grid.
The system is designed to fully exploit multicore sites while navigating memory-bound resource constraints common in large-scale MC campaigns. Its workflow engine schedules algorithms as asynchronous graph tasks, enabling fine-grained concurrency, automatic checkpointing, and stage-wise construction of simulation artefacts. Runtime monitoring allows the framework to dynamically optimize scheduling behavior and aggressively prune temporary data, reducing storage pressure without user intervention. Additional features such as background “hole-filling” jobs, built-in support for distributed execution, and the ability to mix programming languages or software versions at the task level provide further flexibility for large-volume production.
This contribution highlights the architecture, operational experience, and advantages of this system, demonstrating how it enables efficient, resilient, and scalable MC production for ALICE in the Run 3/4 era.