Jul 9 – 13, 2018
Sofia, Bulgaria
Europe/Sofia timezone

Multicore workload scheduling in JUNO

Jul 12, 2018, 12:00 PM
15m
Hall 7 (National Palace of Culture)

Hall 7

National Palace of Culture

presentation Track 3 – Distributed computing T3 - Distributed computing

Speaker

Xiaomei Zhang (Chinese Academy of Sciences (CN))

Description

The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment which will start in 2020. To fasten JUNO data processing over multicore hardware, the JUNO software framework is introducing parallelization based on TBB. To support JUNO multicore simulation and reconstruction jobs in the near future, a new workload scheduling model has to be explored and implemented in JUNO distributed computing system which was built on DIRAC. Inside this model, the evolution of pilot model from singlecore to multicore is the key issue. Two multicore pilot strategies will be stated and evaluated in this paper. One is customized pilots whose size varied with the resource requirement of payloads in the Task Queue. The other uses common pilots with equal size, which allow internal scheduling inside allocated resources to accept more than one payload with various core requirements. With the SLURM and cloud testbed built, the tests have been done to evaluate these two strategies and study their efficiency in JUNO use cases. The paper also will present an algorithm designed to solve “big” job starvation and improve efficiency with a hybrid of various-core jobs submitted.

Primary authors

Xiaomei Zhang (Chinese Academy of Sciences (CN)) Mr Kang Li (SooChow University) Andrei Tsaregorodtsev (Marseille) Dr Xianghu Zhao (Insitute of High Energy Physics)

Presentation materials