Scheduling multicore workload on shared multipurpose clusters

14 Apr 2015, 14:30
15m
C209 (C209)

C209

C209

oral presentation Track6: Facilities, Infrastructure, Network Track 6 Session

Speaker

Jeff Templon (NIKHEF (NL))

Description

With the advent of workloads containing explicit requests for multiple cores in a single grid job, grid sites faced a new set of challenges in workload scheduling. The most common batch schedulers deployed at HEP computing sites do a poor job at multicore scheduling when using only the native capabilities of those schedulers. This talk describes how efficient multicore scheduling was achieved at the three sites represented in the author list, by implementing dynamically-sized multicore partitions via a minimalistic addition to the Torque/Maui batch system already in use at those sites. The first part of the talk covers the theory related to this particular problem, which is also applicable to e.g. the scheduling of large-memory jobs or data-aware jobs similarly comprising part of a highly heterogenous workload. The system design is also presented, linking it to previous work at Nikhef on grid-cluster scheduling. The second part of the talk presents an evaluation of several months of production operation at the three sites.

Primary author

Jeff Templon (NIKHEF (NL))

Co-authors

Dr Alessandra Forti (University of Manchester) Dr Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES) Carlos Acosta Silva (Universitat Autònoma de Barcelona (ES)) Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES) Ronald Starink (Nikhef)

Presentation Materials