Speaker
Andrew David Lahiff
(STFC - Rutherford Appleton Lab. (GB))
Description
The recently introduced vacuum model offers an alternative to the traditional methods that virtual organisations (VOs) use to run computing tasks at sites, where they either submit jobs using grid middleware or create virtual machines (VMs) using cloud APIs. In the vacuum model VMs are created and contextualized by the site itself, and start the appropriate pilot job framework which fetches real jobs. This allows sites to avoid the effort required for running grid middleware or a cloud. Here we present an implementation of the vacuum model based entirely on HTCondor, making use of HTCondor's ability to manage VMs. Extensive use is made of job hooks, including for preparing fast local storage for use in the VMs, carrying out contextualization, and updating job ClassAds about the status of the VMs and their payloads. VMs for each supported VO are created at regular intervals. If there is no work or there are fatal errors, no additional VMs are created. On the other hand, if there is real work running, further VMs can be created. Since the HTCondor negotiator decides whether to run the VMs or not, fairshares are naturally respected. Normal grid or locally-submitted jobs can run on the same resources and share the same physical worker nodes that are also used as hypervisors for running VMs.
Primary author
Andrew David Lahiff
(STFC - Rutherford Appleton Lab. (GB))