Speaker
Description
The cloudscheduler VM provisioning service has been running production jobs for ATLAS and Belle II for many years using commercial and private clouds in Europe, North America and Australia. Initially released in 2009, version 1 is a single Python 2 module implementing multiple threads to poll resources and jobs, and to create and destroy virtual machine. The code is difficult to scale, maintain or extend and lacks many desirable features, such as status displays, multiple user/project management, robust error analysis and handling, and time series plotting, to name just a few examples. To address these shortcomings, our team has re-engineered the cloudscheduler VM provisioning service from the ground up. The new version, dubbed cloudscheduler version 2 or CSV2, is written in Python 3 runs on any modern Linux distribution, and uses current supporting applications and libraries. The system is composed of multiple, independent Python 3 modules communicating with each other through a central MariaDB (version 10) database. It features both graphical (web browser), and command line user interfaces and supports multiples users/projects with ease. Users have the ability to manage and monitor their own cloud resources without the intervention of a system administrator. The system is scalable, extensible, and maintainable. It is also far easier to use and is more flexible than its predecessor. We present the design, highlight the development process which utilizes unit tests, and show encouraging results from our operational experience with thousands of jobs and workernodes. We also present our experience with containers for running workloads, code development and software distribution.
Consider for promotion | Yes |
---|