Speaker
Dr
John Kennedy
(LMU Munich)
Description
The ATLAS production system is responsible for the distribution of
O(100,000) jobs per day to over 100 sites worldwide.
The tracking and correlation of errors and resource usage within such a
large distributed system is of extreme importance.
The monitoring system presented here is designed to abstract the
monitoring information away form the central database of jobs. This
approach
ensures that the monitoring does not destructively interfere with the
production itself and provides faster responses to monitoring queries.
The design and functionality of the system is discussed and the possible
future development of monitoring tools for the ATLAS Production System are
explored.
Submitted on behalf of Collaboration (ex, BaBar, ATLAS) | ATLAS |
---|
Author
Dr
John Kennedy
(LMU Munich)