Speaker
Andrew David Lahiff
(STFC - Rutherford Appleton Lab. (GB))
Description
Today it is becoming increasingly common for WLCG sites to provide both grid and cloud compute resources. In order to avoid the inefficiencies caused by static partitioning of resources it is necessary to integrate grid and cloud resources. There are two options to consider when doing this. The simplest option is to have the cloud manage all the physical hardware and use entirely virtualised worker nodes in the batch system. The downside of this is that everything is virtualised, whereas it might be useful to be able to run jobs in the batch system directly on hardware in order to achieve the best performance. In such a configuration it is essential that batch jobs can't interfere with the virtual machines running on the same node or affect the hypervisors in a negative way, hence containerisation of batch jobs is particularly important. It is also important to consider the implications of having both a batch system and cloud scheduling jobs and virtual machines on the same machines. We present an investigation into the feasibility of having a batch system and cloud sharing the same set of physical resources carried out at the RAL Tier-1.
Primary author
Andrew David Lahiff
(STFC - Rutherford Appleton Lab. (GB))
Co-authors
Frazer Barnsley
(STFC - Rutherford Appleton Lab. (GB))
George Ryall
(STFC)
Ian Peter Collier
(STFC - Rutherford Appleton Lab. (GB))