CHEP 2018 Conference, Sofia, Bulgaria

Name: CHEP 2018 Conference, Sofia, Bulgaria
Start: 2018-07-09T08:00:00+03:00
End: 2018-07-13T13:00:00+03:00
Location: Sofia, Bulgaria

9–13 Jul 2018

Sofia, Bulgaria

Europe/Sofia timezone

Contact us

A Feasibility Study about Integrating HTCondor Cluster Workload with SLURM Cluster Workload

12 Jul 2018, 11:30

15m

Hall 10 (National Palace of Culture)

Hall 10

National Palace of Culture

presentation Track 8 – Networks and facilities T8 - Networks and facilities

Ran Du

There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with SLURM as the workload manager. The resources of the HTCondor cluster are provided by multiple experiments, and the resource utilization has reached more than 90% by adopting a dynamic resource share mechanism. Nevertheless, there will be a bottleneck if more resources are requested by multiple experiments at the same moment. On the other hand, parallel jobs running on the SLURM cluster reflect some specific attributes, such as high parallel degree, low quantity and long wall time. Such attributes make it easy to generate free resource slots which are suitable for jobs from the HTCondor cluster. As a result, if there is a mechanism to schedule jobs from the HTCondor cluster to the SLURM cluster transparently, it would improve the resource utilization both for two clusters. HTCondor provides HTCondor-C to schedule jobs to other clusters managed by different workload managers, for example, SLURM. However, it's not enough if we would like to decide which, when and where jobs are allowed to schedule by SLURM. Also, how to manage the re-scheduled jobs running on the SLURM cluster will be a problem. Furthermore, design philosophy and application scenes are different between HTCondor and SLURM, large quantity of jobs in a short period may bring extra scheduling load for SLURM. In this paper, after a brief background introduction, we will describe the problems to integrate two cluster workloads, and we will also present possible solutions to these problems.

Ran Du Jingyan Shi (IHEP) Mr Xiaowei Jiang (IHEP（中国科学院高能物理研究所）) Jiaheng Zou (IHEP) Mr Zhenyu Sun Ms Hongnan Tan

feasibility_study_htcondor_slurm_final.pdf

feasibility_study_htcondor_slurm_final.pptx

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

A Feasibility Study about Integrating HTCondor Cluster Workload with SLURM Cluster Workload

Hall 10

National Palace of Culture

Speaker

Description

Authors

Presentation materials

Choose timezone

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

Speaker

Description

Authors

Presentation materials