15–19 Apr 2024
Laboratoire Astroparticule et Cosmologie (APC) de l'Université Paris-Cité
Europe/Paris timezone

The intelligent operation and maintenance system of the IHEP computing platform

19 Apr 2024, 10:15
25m
Amphithéatre Buffon (Laboratoire Astroparticule et Cosmologie (APC) de l'Université Paris-Cité)

Amphithéatre Buffon

Laboratoire Astroparticule et Cosmologie (APC) de l'Université Paris-Cité

15 rue Hélène Brion 75013 Paris France
Basic and End-User IT Services Basic and end-user IT services

Speaker

Yaosong Cheng (Institute of High Energy Physics Chinese Academy of Sciences, IHEP)

Description

Based on extensive experience in system maintenance and advanced artificial intelligence technology, we have designed the IHEP computing platform's intelligent operations and maintenance system. Its primary goal is to ensure optimal utilization and efficiency of computing resources.
This system automatically detects user jobs that cause anomalies in computing services and dynamically adjusts their available resources in real time.
Utilizing AI algorithms, it swiftly conducts fast, near real-time analysis of the file system's operational status and logs, identifying potential users and their process names that may be triggering anomalies.
After querying the computing node where the suspected abnormal job is located through the job scheduler, the system utilizes AI algorithms to conduct real-time analysis of the job to determine whether its behavior is causing excessive system load. Once confirmed, the system notifies the job scheduler and file system to limit the number of user job operations and the total I/O volume.
This system is employed for comprehensive monitoring and intelligent operations management of the computing platform. It dynamically adjusts the scale of available resources for users based on the overall situation of the computing platform, ensuring fair and efficient data processing for all users.

Speaker release Yes

Authors

Yaosong Cheng (Institute of High Energy Physics Chinese Academy of Sciences, IHEP) 石京燕 shijy

Presentation materials