Dr Iosif Legrand (CALTECH)
MonaLISA (Monitoring Agents in A Large Integrated Services Architecture) provides a distributed service for monitoring, control and global optimization of complex systems including the grids and networks used by the LHC experiments. MonALISA is based on an ensemble of autonomous multi-threaded, agent-based subsystems which able to collaborate and cooperate to perform a wide range of monitoring and decision tasks in large scale distributed applications, and to be discovered and used by other services or clients that require such information. It is a fully distributed system with no single point of failure. The system is deployed now at more than 340 sites, serving several large Grid communities (ALICE, CMS, OSG, Ultralight, LCG-Russia… ), and it is monitoring around one million parameters in near real-time (complete information for computing nodes, jobs, end to end connectivity, accounting, different grid services, network traffic and topology). MonALISA and its APIs are currently used by different tools in High Energy Physics (CMS job submission systems, Alien, Xrootd, Ganga, Diane…) to collect specific monitoring data which is used as an automatic feedback to different user communities to understand how these complex systems are used and to detect problems. The system is able to react to specific conditions, triggered by alarm conditions, and thus to automatically select an appropriate action. MonALSIA is also used for optimizing global workflows in distributed systems.