Mar 23 – 27, 2015
Physics Department, Oxford University
Europe/London timezone

The Vacuum model for running jobs in VMs

Mar 26, 2015, 3:15 PM
25m
Martin Wood Lecture Theatre, Parks Road (Physics Department, Oxford University)

Martin Wood Lecture Theatre, Parks Road

Physics Department, Oxford University

Grid, Cloud & Virtualisation Grids, Clouds and Virtualisation

Speaker

Andrew McNab (University of Manchester (GB))

Description

The Vacuum model provides a method for managing the lifecycle of virtual machines based on their observed success or failure in finding work to do for their experiment. In contrast to centrally managed grid job submission and cloud VM instantiation systems, the Vacuum model gives resource providers direct control over which experiments' VMs or jobs are created and in what proportion. This model also leads to a highly decentralised, feedback-based infrastructure, in which the responsibility of providing VMs for the same experiment can be undertaken by a mix of sites, local groups, national teams, and the experiment's central operations staff. This mixture is well matched to the variety of entities which need to act as a virtual resource provider, due to differences in available funding and objectives. We present three implementations of this model developed by GridPP: Vac which manages VMs on autonomous hypervisor machines; Vcycle which manages VMs on IaaS systems such as OpenStack; and a system for managing VMs at an HTCondor site. The Pilot VM architecture originally developed by LHCb is particularly suitable to the lightweight Vacuum approach, and involves the configuration of an environment similar to conventional WLCG batch worker nodes within the VM, and then the execution of the pilot framework client as with conventional grid jobs. We present Pilot VM designs used to run production jobs for LHCb, ATLAS, CMS, and GridPP DIRAC, where the same VM design is able to run on all three implementations of the Vacuum model for VM management.

Primary author

Andrew McNab (University of Manchester (GB))

Co-author

Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))

Presentation materials