European HTCondor Workshop 2018

Name: European HTCondor Workshop 2018
Start: 2018-09-04T12:30:00+01:00
End: 2018-09-07T14:00:00+01:00
Location: RAL

4–7 Sept 2018

RAL

Europe/London timezone

Support

hepix-2018condorworkshop-support@hepix.org

Managing Cluster Fragmentation using ConcurrencyLimits

4 Sept 2018, 15:10

25m

CR12, R68 (RAL)

CR12, R68

RAL

Science and Technology Facilities Council Rutherford Appleton Laboratory Harwell Campus Didcot OX11 0QX United Kingdom Tel: +44 (0)1235 445 000 Fax: +44 (0)1235 445 808 N 51° 34' 27.6" W 1° 18' 52.6" (51.57433,-1.31462)

HTCondor presentations and tutorials Workshop presentations

Max Fischer (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))

Clusters running differently sized jobs can easily suffer from fragmentation: Large chunks of free resources are required to run larger jobs, but smaller jobs can block parts of these chunks, making the remainder too small. For example, clusters in the WLCG must provide space for 8-core jobs, while there is a constant pressure of 1-core jobs. Common approaches to this issue are the DEFRAG daemon, custom scheduling ordering, and delays that protect free chunks.

At the GridKa Tier 1 cluster, providing roughly 30.000 cores and growing, we have developed a new approach to stay responsive and efficient at large scales. By tagging new jobs during submission, we can manage job groups using HTCondor's inbuilt ConcurrencyLimit feature. So far, we have successfully used this to enforce fragmentation limits for small jobs in our production environment.

This contribution highlights the challenges of fragmentation in large scale clusters. Our focus is on scalability and responsiveness on the one hand, as well as maintainability and configuration overhead on the other hand. We show how our approach integrates with regular scheduling policies, and how we achieve proper utilisation without micromanaging individual resources.

Max Fischer (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))

Manfred Alef (Karlsruhe Institute of Technology (KIT)) Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))

gridka_fragmentation.pdf

European HTCondor Workshop 2018

Support

Managing Cluster Fragmentation using ConcurrencyLimits

CR12, R68

RAL

Speaker

Description

Author

Co-authors

Presentation materials