Speakers
Description
In the near future, many new high energy physics (HEP) experiments with challenging data volume are coming into operations or are planned in IHEP, China. The DIRAC-based distributed computing system has been set up to support these experiments. To get a better utilization of available distributed computing resources, it's important to provide experimental users with handy tools for the management of their tasks in grid environment. In this talk, we present the design and development of a common task submission and management infrastructure named JSUB, which aims to simplify the process of dealing with massive jobs in distributed environment. The framework covers functionalities including task creation, splitting and submission, run-time workflow control, task monitoring and management, failure recovery, and dataset management. The DIRAC parametric jobs feature has been implemented and greatly improve submission rate, and the task monitoring and management is implemented as a DIRAC service to allow users to track and operate on tasks through web portal. JSUB provides a flexible task description interface in YAML, allowing physics users to conveniently customize their computing tasks.
Currently the JSUB software has been developed and put into use for JUNO and CEPC experiments. The software is also highly extensible to other HEP experiments. The designs and techniques could be interesting to other experiments which also use DIRAC as workload management system.
This topic has only been presented inside JUNO collaboration. And this would be the first time to be presented for a wider audience.
Speaker time zone | Compatible with Asia |
---|