Speaker
Dr
Ulrich Schwickerath
(CERN)
Description
As part of CERN's Agile Infrastructure project, large parts of the CERN batch farm have been moved to virtual machines running on CERNs private IaaS (link is external) cloud. During this process a large fraction of the resources, which had previously been used as physical batch worker nodes, were converted into hypervisors. Due to the large spread of the per-core performance (rated in HS06) in the farm, caused by its heterogenious nature, it is necessary to have a good knowledge of the performance of the virtual machines. This information is used both for scheduling and accounting. While in the previous setup worker nodes were classified and benchmarked based on the purchase order number, for virtual batch worker nodes this is no longer possible; the information is now either hidden or hard to retrieve. Therefore we developed a new scheme to classify worker nodes in terms of their performance. The new scheme is flexible enough to be usable both for virtual and physical machines in the batch farm. It should be possible to apply it as well to public clouds and more dynamic future batch farms with worker nodes coming and going at a high rate.
The used scheme, experiences and lessons learned will be presented. Possible extensions as well as application to a more general case, for example in the context of accounting within WLCG, will be covered.
Author
Dr
Ulrich Schwickerath
(CERN)
Co-authors
Janos Daniel Pek
(CERN)
Jerome Belleman
(CERN)