Various cluster monitoring tools are adapted or developed at IHEP, which show the health status of each device or aspect of IHEP computing platform separately. For example, Ganglia shows the machine load, Nagios monitors the service status, and Job-monitor tool developed by IHEP counts the job success rate and so on. But those monitoring data from different tools are independent and not easy to be analyzed relatively. Integrate and analysis all the monitoring data from multiple sources can provide more valuable information such as health trends and potential errors.
Now, Integrated Monitoring Tools are deployed at IHEP which collects Ganglia, Nagios, Syslog and other monitor metrics. Some cluster monitoring projects based on this Integrated Monitoring Tools have been applied to IHEP.
|Desired length||20 minutes|