Speaker
Dr
Iosif Legrand
(CALTECH)
Description
MonaLISA (Monitoring Agents in A Large Integrated Services Architecture) provides a
distributed service for monitoring, control and global optimization of complex
systems including the grids and networks used by the LHC experiments. MonALISA is
based on an ensemble of autonomous multi-threaded, agent-based subsystems which able
to collaborate and cooperate to perform a wide range of monitoring and decision tasks
in large scale distributed applications, and to be discovered and used by other
services or clients that require such information. It is a fully distributed system
with no single point of failure.
The system is deployed now at more than 340 sites, serving several large Grid
communities (ALICE, CMS, OSG, Ultralight, LCG-Russia… ), and it is monitoring around
one million parameters in near real-time (complete information for computing nodes,
jobs, end to end connectivity, accounting, different grid services, network traffic
and topology). MonALISA and its APIs are currently used by different tools in High
Energy Physics (CMS job submission systems, Alien, Xrootd, Ganga, Diane…) to collect
specific monitoring data which is used as an automatic feedback to different user
communities to understand how these complex systems are used and to detect problems.
The system is able to react to specific conditions, triggered by alarm conditions,
and thus to automatically select an appropriate action. MonALSIA is also used for
optimizing global workflows in distributed systems.
Primary authors
Adrian Muraru
(CERN)
Catalin Cirstoiu
(CERN)
Ciprian Dobre
(Polytechnic University of Bucharest)
Costin Grigoras
(CERN)
Prof.
Harvey Newman
(CALTECH)
Dr
Iosif Legrand
(CALTECH)
Lucian Musat
(Polytechnic University of Bucharest)
Ramiro Voicu
(CALTECH)