9-13 July 2018
Sofia, Bulgaria
Europe/Sofia timezone

Fair Share Scheduler for OpenNebula (FaSS): implementation and performance tests

10 Jul 2018, 16:00
1h
Sofia, Bulgaria

Sofia, Bulgaria

National Culture Palace, Boulevard "Bulgaria", 1463 NDK, Sofia, Bulgaria
Poster Track 7 – Clouds, virtualization and containers Posters

Speakers

Dr Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare, Torino) Sara Vallero (Universita e INFN Torino (IT)) Valentina Zaccolo (Universita e INFN Torino (IT))

Description

A small Cloud infrastructure for scientific computing likely operates in a saturated regime, which imposes constraints to free applications’ auto-scaling. Tenants typically pay a priori for a fraction of the overall resources. Within this business model, an advanced scheduling strategy is needed in order to optimize the data centre occupancy.
FaSS, a Fair Share Scheduler service for OpenNebula (ONE), addresses this issue by satisfying resource requests according to an algorithm, which prioritizes tasks according to an initial weight and to the historical resource usage of the project. In this contribution, we are going to describe the implementation of FaSS Version 1.0, released in March 2017 as a product of the INDIGO-DATACLOUD project. The software was designed to be less intrusive as possible in the ONE code, and interacts with ONE exclusively through its XML-RPC interface. The native ONE scheduler is preserved for matching requests to available resources.
FaSS is made by five functional components: the Priority Manager (PM), a set of fair-share algorithms, Terminator (TM), the XML-RPC interface and the database. The main module, the PM, periodically requests the list of pending Virtual Machines (VMs) to ONE and re-calculates the priorities in the queue by interacting with an algorithm module of choice. In FaSS 1.0, the default algorithm is Slurm's MultiFactor. The TM module runs asynchronously with respect to the PM and it is responsible for removing from the queue VMs in pending state for too long as well as terminating or suspending running VMs after a configurable Time-to-Live. The XML-RPC server of FaSS intercepts the calls from the First-In-First-Out scheduler of ONE and sends back the reordered VMs queue. FaSS database is InfluxDB. It stores the initial and recalculated VM priorities and some additional information for accounting purposes. No information already present in the ONE DB is duplicated in FaSS.
In this contribution we are also going to show the results of FaSS functional and stress tests performed at the Cloud infrastructure of the INFN-Torino computing centre.
In this contribution we are also going to show the results of FaSS functional and stress tests performed at the Cloud infrastructure of the INFN-Torino computing centre.

Primary authors

Dr Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare, Torino) Sara Vallero (Universita e INFN Torino (IT)) Valentina Zaccolo (Universita e INFN Torino (IT))

Presentation Materials