19–25 Oct 2024
Europe/Zurich timezone

Whole-node scheduling in the ALICE Grid: Initial experiences and evolution opportunities

23 Oct 2024, 13:48
18m
Room 2.B (Conference Room)

Room 2.B (Conference Room)

Talk Track 4 - Distributed Computing Parallel (Track 4)

Speaker

Marta Bertran Ferrer (CERN)

Description

JAliEn, the ALICE experiment's Grid middleware, utilizes whole-node scheduling to maximize resource utilization from participating sites. This approach offers flexibility in resource allocation and partitioning, allowing for customized configurations that adapt to the evolving needs of the experiment. This scheduling model is gaining traction among Grid sites due to its initial performance benefits. Additionally, understanding common execution patterns for different workloads allows for more efficient scheduling and resource allocation strategies.

However, managing the entire set of resources on a node requires careful orchestration. JAliEn employs custom mechanisms to dynamically allocate idle resources to running workloads, ensuring overall resource usage stays within the node's capacity.

This paper evaluates the experiences of the first sites using whole-node scheduling. It highlights its suitability for accommodating jobs with varying resource demands, particularly those with high memory requirements.

Primary author

Presentation materials