Speaker
Andrew David Lahiff
(STFC - Science & Technology Facilities Council (GB))
Description
While migration from the grid to the cloud has been gaining increasing momentum in recent times, WLCG sites are currently still expected to accept grid job submission, and this is likely to continue for the foreseeable future. Furthermore, sites which support multiple experiments may need to provide both cloud and grid-based access to resources for some time, as not all experiments may be ready to move to the cloud at the same time. In order to make optimal use of resources, a site with a traditional batch system as well as a cloud resource may want to make opportunistic use of their cloud at times when there are idle jobs and all worker nodes in the batch system are busy, by extending the batch system into the cloud. We present two implementations, one based on HTCondor and the other based on SLURM as a batch system, in which virtualized worker nodes are provisioned on demand using a StratusLab cloud.
Authors
Andrew David Lahiff
(STFC - Science & Technology Facilities Council (GB))
Ian Collier
(UK Tier1 Centre)