Speaker
Description
ALICE Grid sites employ heterogeneous resource allocation policies, where each configuration is tailored to the specific conditions of the sites, their user communities, and local scheduling preferences. The design and implementation of JAliEn have been specifically developed to be flexible and adaptable to these varied configurations and execution systems, allowing it to utilize the allocated resources as efficiently as possible.
In the typical Grid scheduling scenario, sites have traditionally run with 8-core slots. However, a significant number of sites have begun migrating to larger allocations, which range from 16-core slots up to the use of whole nodes, where the latter case is typical in HPC resources. In the current operational scenario, half of the ALICE Grid CPU cores are still allocated in 8-core slots, while another third of the total CPU cores have already transitioned to whole-node scheduling. The remaining minority of resources are configured as 16, 64, or 96-core slots. Some recent nodes offer up to 640 cores in a single batch slot. Given its flexibility and potential to mix workloads with differing resource requirements and usage patterns, whole-node scheduling has become our preferred resource allocation scheme. It also proves highly beneficial for the efficient management of heterogeneous computing resources, such as GPUs. Consequently, we are focusing our major development efforts on improving resource management specifically within this scheduling model.
This contribution details how ALICE was one of the first LHC experiments to transition to whole-node scheduling and shares the experience we have accumulated during its five years of operation, from the initial implementation targeting High-Performance Computing (HPC) facilities to its current large-scale deployment. The key strengths of this scheduling model are presented, specifically highlighting the use cases where it has been essential for processing physics workloads that are critical to the experiment's scientific goals.