22–26 Sept 2008
Harbiye Askeri Museum
Europe/Zurich timezone

Monitoring Grid Jobs with L&B Notifications in GridView and Experiment Dashboard

23 Sept 2008, 16:43
Harbiye Askeri Museum

Harbiye Askeri Museum

Istanbul
Poster Poster Demos and Posters

Speakers

Ales Krenek (CESNET) James Casey (CERN) Julia Andreeva (CERN)

Describe the added value of the grid for your activity, or the value your tool or service adds for other grid users. This should include the scale of the activity and of the potential user community, and the relevance for other scientific or business applications.

Logging and Bookkeeping (L&B) is a service which tracks gLite (and eventually
other -- CREAM, Condor) Grid jobs and provides aggregate information on their
state. Frequent massive queries for job state (which would be necessary in
monitoring tools) do not come for free, while the really new information is
only a fraction of their result set, typically.

On the contrary, L&B notification messages are triggered when a changed job
state matches criteria of any subscription. Delivery of the notification
messages uses the same mechanism as the proven L&B event delivery, then.
Altogether, there is no need of polling for status changes.

Criteria for triggering notification (matching set of jobs and state
transitions) are quite rich, therefore the system is highly customizable
and able to follow specific community needs.

In our work monitoring systems (Dashboard, GridView) become subscribers
for L&B notifications, benefiting from the offered functionality.

Describe the activity, tool or service using or enhancing the EGEE infrastructure or results. A high-level description is needed here (Neither a detailed specialist report nor a list of references is required).

Monitoring systems currently used on EGEE like GridView and Experiment
Dashboard provide job monitoring functionality. To yield accurate
view on the Grid, a reliable mechanism to get information on job status
changes is necessary.

Notification subsystem of Logging and Bookkeeping provides the necessary
functionality and through integration with these monitoring systems it is
becoming an important component of the EGEE monitoring infrastructure.

Report on the impact of the activity, tool or service. This should include a description of how grid technology enabled or enhanced the result, or how you have enabled or enhanced the infrastructure for other users.

Due to its direct interaction with various Grid services, L&B is a valuable
source of job processing data. Using L&B the monitoring system can offer more
accurate and richer information (e.g. failure reasons) on jobs to be displayed
the users, while favoring necessary security constraints. Views on the jobs
can be easily customized (per VO, Resource Broker, site, selected status
transitions only, etc.).

The system is reliable wrt. network outages and even machine crash - L&B
notification messages are queued on disk and delivered when the client
reconnects, updating the view on jobs status accordingly. The same mechanism
is leveraged for client mobility - once a new delivery endpoint is announced,
messages queued in the meantime are redirected.

Altogether, tighter integration of L&B with grid monitoring systems,
with specific emphasis in using L&B notifications, provides their users
with better view on Grid jobs while keeping requirements on the infrastructure
manageable.

Primary authors

Co-authors

Presentation materials

There are no materials yet.