Speaker
Jeff Templon
(NIKHEF (NL))
Description
With the advent of workloads containing explicit requests for multiple
cores in a single grid job, grid sites faced a new set of challenges
in workload scheduling. The most common batch schedulers deployed at
HEP computing sites do a poor job at multicore scheduling when using
only the native capabilities of those schedulers. This talk describes
how efficient multicore scheduling was achieved at the three sites
represented in the author list, by implementing dynamically-sized
multicore partitions via a minimalistic addition to the Torque/Maui
batch system already in use at those sites.
The first part of the talk covers the theory related to this
particular problem, which is also applicable to e.g. the scheduling of
large-memory jobs or data-aware jobs similarly comprising part of a
highly heterogenous workload. The system design is also presented,
linking it to previous work at Nikhef on grid-cluster
scheduling. The second part of the talk presents an evaluation of
several months of production operation at the three sites.
Author
Jeff Templon
(NIKHEF (NL))
Co-authors
Dr
Alessandra Forti
(University of Manchester)
Dr
Antonio Perez-Calero Yzquierdo
(Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
Carlos Acosta Silva
(Universitat Autònoma de Barcelona (ES))
Jose Flix Molina
(Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
Ronald Starink
(Nikhef)