Speaker
Andrew McNab
(University of Manchester (GB))
Description
The Vacuum model provides a method for managing the lifecycle of virtual machines based on their observed success or failure in finding work to do for their experiment. In contrast to centrally managed grid job
submission and cloud VM instantiation systems, the Vacuum model gives resource providers direct control over which experiments' VMs or jobs are created and in what proportion. This model also leads to a highly
decentralised, feedback-based infrastructure, in which the responsibility of providing VMs for the same experiment can be undertaken by a mix of sites, local groups, national teams, and the experiment's central
operations staff. This mixture is well matched to the variety of entities which need to act as a virtual resource provider, due to differences in available funding and objectives. We present three implementations of this model developed by GridPP: Vac which manages VMs on autonomous hypervisor machines; Vcycle which manages VMs on IaaS systems such as OpenStack; and a system for managing VMs at an HTCondor site.
The Pilot VM architecture originally developed by LHCb is particularly suitable to the lightweight Vacuum approach, and involves the configuration of an environment similar to conventional WLCG batch worker nodes within the VM, and then the execution of the pilot framework client as with conventional grid jobs. We present Pilot VM designs used to run production jobs for LHCb, ATLAS, CMS, and GridPP DIRAC, where the same VM design is able to run on all three implementations of the Vacuum model for VM management.
Author
Andrew McNab
(University of Manchester (GB))
Co-author
Andrew David Lahiff
(STFC - Rutherford Appleton Lab. (GB))