Speaker
Description
Dynamic resource provisioning in the WLCG is commonly based on meta-scheduling and the pilot model. For a given set of workflows, a meta-scheduler computes the ideal set of resources; so-called pilot jobs integrate these resources into an overlay batch system, which then processes the initial workflows. While offering a high level of control and precision, the strong coupling between components limits scalability, flexibility and robustness. These shortcommings are more severe when workflows and resources are under limited control - such as a WLCG site executing anonymous pilot jobs on external, non-WLCG resources.
In order to integrate dynamic resources, the GridKa Tier 1 centre has developed a new approach for dynamic provisioning that is suitable for the WLCG and beyond. By design, our approach decouples the distinct responsibilities of workflow scheduling, resource provisioning and meta-scheduling. Instead of seeking an optimal solution for a coupled scheduling and meta-scheduling problem, we divide the task into composable but isolated, self-balancing domains. Not only does this naturally provide scalability, flexibility and robustness, it also allows us to manage a variety of resources and situations in a common way. We have successfully used our work for provisiong HPC and Cloud resources to the WLCG, as well as managing abstract resources in the form of Multi-Core and Single-Core allocations.
This contribution discusses the benefits and limitations of our new approach to dynamic resource provisioning and compared to competing approaches.
Consider for promotion | Yes |
---|