Clusters running differently sized jobs can easily suffer from fragmentation: Large chunks of free resources are required to run larger jobs, but smaller jobs can block parts of these chunks, making the remainder too small. For example, clusters in the WLCG must provide space for 8-core jobs, while there is a constant pressure of 1-core jobs. Common approaches to this issue are the DEFRAG daemon, custom scheduling ordering, and delays that protect free chunks.
At the GridKa Tier 1 cluster, providing roughly 30.000 cores and growing, we have developed a new approach to stay responsive and efficient at large scales. By tagging new jobs during submission, we can manage job groups using HTCondor's inbuilt ConcurrencyLimit feature. So far, we have successfully used this to enforce fragmentation limits for small jobs in our production environment.
This contribution highlights the challenges of fragmentation in large scale clusters. Our focus is on scalability and responsiveness on the one hand, as well as maintainability and configuration overhead on the other hand. We show how our approach integrates with regular scheduling policies, and how we achieve proper utilisation without micromanaging individual resources.