Speaker
Iosif Legrand
(CALTECH)
Description
The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) framework provides a set of distributed services for monitoring, control, management and global optimization for large scale distributed systems. It is based on an ensemble of autonomous, multi-threaded, agent-based subsystems which are registered as dynamic services. They can be automatically discovered and used by other services or clients. The distributed agents can collaborate and cooperate in performing a wide range of management, control and global optimization tasks using real time monitoring information.
An essential part of managing global-scale systems is a monitoring system that is able to monitor and track in real time many site facilities, networks, and tasks in progress. The monitoring information gathered is essential for developing the required higher level services, the components that provide decision support and some degree of automated decisions and for maintaining and optimizing workflow in large scale distributed systems. These management and global optimization functions are performed by higher level agent-based services. Current applications of MonALISA’s higher level services include optimized dynamic routing, control and optimization for large scale data transfers on dedicated circuits, data transfers scheduling, distributed job scheduling and automated management of remote services among a large set of grid facilities. MonALISA is currently used around the clock in several major projects and has proven to be both highly scalable and reliable. More than 320 services are running at sites around the world, collecting information about computing facilities, local and wide area network traffic, and the state and progress of the many thousands of concurrently running jobs.
Authors
Catalin Cirstoiu
(UPB)
Ciprian Dobre
(UBP)
Costin Grigoras
(CERN)
Harvey Newman
(CALTECH)
Iosif Legrand
(CALTECH)
Ramiro Voicu
(CALTECH)