Speaker
Dr
Conrad Steenberg
(CALIFORNIA INSTITUTE OF TECHNOLOGY)
Description
We present the architecture and implementation of a bi-directional system for
monitoring long-running jobs on large computational clusters. JobMon comprises an
asyncronous intra-cluster communication server and a Clarens web service on a head
node, coupled with a job wrapper for each monitored job to provide monitoring
information both periodically and upon request. The Clarens web service provides
authentication, encryption and access control for any external interaction with
individual job wrappers.
Primary author
Dr
Conrad Steenberg
(CALIFORNIA INSTITUTE OF TECHNOLOGY)
Co-authors
Dr
Elliot Lipeles
(University of California San Diego)
Dr
Frank Wuerthwein
(University of California San Diego)
Mr
Shih-Chieh Hsu
(University of California San Diego)