Efficient provisioning for multicore applications with LSF

Not scheduled
15m
OIST

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495
poster presentation Track6: Facilities, Infrastructure, Network

Speaker

Stefano Dal Pra (INFN)

Description

Tier-1 sites providing computing power for HEP experiments are usually tightly designed for high throughput performances. This is pursued by reducing the variety of supported usecases and tuning for performances those ones, the most important of which have been that of single-core jobs. Moreover, the usual workload is saturation: each available core in the farm is in use and there are queued jobs waiting for their turn to run. Enabling multicore jobs thus requires dedicating a number of hosts where to run, and waiting for them to free the needed number of cores. This drain-time introduces a loss of computing power driven by the number of unusable empty cores. As an increasing demand for multicore capable resources have emerged, a Task Force have been constituted in WLCG, with the goal to define a simple and efficient multicore resource provisioning model. This paper details the work done at the INFN Tier1 to enable multicore support for the LSF batch system, with the intent of reducing to the minimum the average number of unused cores. The adopted strategy has been that of dedicating to multicore a dynamic set of nodes, whose dimension is mainly driven by the number of pending multicore requests and fairshare priority of the submitting user. The node status transition, from single to multi core et vice versa, is driven by a finite state machine which is implemented in a custom multicore director script, running in the cluster. After describing and motivating both the implementation and the details specific to the LSF batch system, results about performance are reported. Factors having positive and negative impact on the overall efficiency are discussed and solutions to reduce at most the negative ones are proposed.

Primary author

Presentation materials