15–19 Oct 2012
Institute of High Energy Physics
Asia/Shanghai timezone

Testing SLURM batch system for a grid farm: functionalities, scalability, performance and how it works in a GRID environment

17 Oct 2012, 16:30
30m
C305 (Institute of High Energy Physics)

C305

Institute of High Energy Physics

19B YuquanLu Shijingshan Beijing China
Presentation Computing & Batch Services Computing

Speaker

Dr Giacinto Donvito (INFN-Bari)

Description

We will show all the work done in order to install and configure the batch system itself together with the security configuration needed. In this presentation we will show the results of the deep testing that we have done on SLURM, in order to be sure that it will cover all the needed functionalities like: priorities, fairshare, limits, QoS, failover capabilities and others. We will report also on the possibility of exploiting this batch system within a complex mixed farm environment where grid job, local job and interactive activities are managed exploiting the same batch system. From a point of view of the scalability we will show how the SLURM batch system is able to deal with the increasing number of node, CPU and jobs served. We will also show the performance achieved with several client accessing the same batch server. We also will make some comparison with other available open source batch system both in terms of performance and functionalities. We will also provide feedback on mixed configuration with SLURM and MAUI as job scheduling. We will also describe the work done in order to support SLURM in a EGI grid environment.

Summary

As the grid computing farm are increasing in size in terms of nodes but even more in terms of CPU slots available, it become of great interest to have a scheduler solution that could scale up to tens of thousands of CPU slots and hundreds of nodes. In order to try to keep the Total Cost of Ownership as low as possible it will be preferred to have an easy to use and open source solution. SLURM is able to fulfil all those requirements and it looks promising also in terms of community that is supporting it, as it is used in several of the TOP500 supercomputing. For this reason we deeply tested the SLURM batch system in order to prove if it could be a suitable solution. In the work we will present the result of all the test executed on SLURM batch system and the results of the development activity carried on in order to provide the possibility to use SLURM in a grid computing environment.

Primary author

Dr Giacinto Donvito (INFN-Bari)

Presentation materials