4–8 Nov 2019
Adelaide Convention Centre
Australia/Adelaide timezone

CloudScheduler V2: Distributed Cloud Computing in the 21st century

4 Nov 2019, 14:15
15m
Riverbank R7 (Adelaide Convention Centre)

Riverbank R7

Adelaide Convention Centre

Oral Track 7 – Facilities, Clouds and Containers Track 7 – Facilities, Clouds and Containers

Speaker

Randall Sobie (University of Victoria (CA))

Description

The cloudscheduler VM provisioning service has been running production jobs for ATLAS and Belle II for many years using commercial and private clouds in Europe, North America and Australia. Initially released in 2009, version 1 is a single Python 2 module implementing multiple threads to poll resources and jobs, and to create and destroy virtual machine. The code is difficult to scale, maintain or extend and lacks many desirable features, such as status displays, multiple user/project management, robust error analysis and handling, and time series plotting, to name just a few examples. To address these shortcomings, our team has re-engineered the cloudscheduler VM provisioning service from the ground up. The new version, dubbed cloudscheduler version 2 or CSV2, is written in Python 3 runs on any modern Linux distribution, and uses current supporting applications and libraries. The system is composed of multiple, independent Python 3 modules communicating with each other through a central MariaDB (version 10) database. It features both graphical (web browser), and command line user interfaces and supports multiples users/projects with ease. Users have the ability to manage and monitor their own cloud resources without the intervention of a system administrator. The system is scalable, extensible, and maintainable. It is also far easier to use and is more flexible than its predecessor. We present the design, highlight the development process which utilizes unit tests, and show encouraging results from our operational experience with thousands of jobs and workernodes. We also present our experience with containers for running workloads, code development and software distribution.

Consider for promotion Yes

Primary authors

Colin Roy Leavett-Brown (University of Victoria (CA)) Colson Driemel (University of Victoria (CA)) Danika MacDonell (University of Victoria (CA)) Fernando Fernandez Galindo (TRIUMF (CA)) Frank Berghaus (University of Victoria (CA)) Marcus Ebert (University of Victoria) Michael Paterson (U) Randall Sobie (University of Victoria (CA)) Rolf Seuster (University of Victoria (CA)) Shaelyn Tolkamp (University of Victoria) Tahya Weiss-Gibbons (University of Victoria)

Presentation materials