27 September 2004 to 1 October 2004
Interlaken, Switzerland
Europe/Zurich timezone

Development and use of MonALISA high level monitoring services for the star unified Meta-Scheduler

30 Sept 2004, 17:50
20m
Theatersaal (Interlaken, Switzerland)

Theatersaal

Interlaken, Switzerland

oral presentation Track 4 - Distributed Computing Services Distributed Computing Services

Speaker

E. Efstathiadis (BROOKHAVEN NATIONAL LABORATORY)

Description

As a PPDG cross-team joint project, we proposed to study, develop, implement and evaluate a set of tools that allow Meta-Schedulers to take advantage of consistent information (such as information needed for complex decision making mechanisms) across both local and/or Grid Resource Management Systems (RMS). We will present and define the requirements and schema by which one can consistently provide queue attributes for the most common batch systems (PBS, LSF, Condor, SGE, etc). We evaluate the best scalable and lightweight approach to access the monitored parameters from a client perspective and, in particular, the feasibility of accessing real-time and aggregate information using the MonaLISA monitoring framework. Client programs are envisioned to function in a non-centralized, fault tolerant fashion. Inherent delays as well as scalability issues of each approach (implementing it at a large number of sites) will be discussed. The MonALISA monitoring framework, being an ensemble of autonomous multi-threaded, agent based systems which are registered as dynamic services and are able to collaborate and cooperate in performing a wide range of monitoring tasks in a large scale distributed applications, is a natural choice for such a project. MonALISA is designed to easily integrate existing monitoring tools and procedures and provide information in a dynamic self-describing way to any other service or client. We intend to demonstrate the usefulness of this consistent approach for queue monitoring by implementing a monitoring agent within the STAR Unified Meta-Scheduler (SUMS) framework. We believe that such developments could highly benefit Grid laboratory efforts such as the Grid3+ and the OpenScience Grid (OSG).

Primary authors

E. Efstathiadis (BROOKHAVEN NATIONAL LABORATORY) I. Legrand (California Institute of Technology) L Hajdu (BNL)

Presentation materials

PDF