25–29 Sept 2006
CICG
Europe/Zurich timezone

Monitoring and Ranking of Grid Failures using FailRank

26 Sept 2006, 14:00
5h 30m
CICG

CICG

CICG, 17 rue de Varembé, CH - 1211 Geneva 20 Switzerland
Board: 28
Poster Users & Applications Poster session

Speaker

Mr Kyriacos Neocleous (University of Cyprus)

Description

Detecting and managing failures in an automated way is an important step toward the goal of a dependable grid. Currently, this is an extremely complex task that relies on over-provisioning of resources, ad-hoc monitoring and user intervention. We present the FailRank architecture, a simple yet powerful framework for integrating and ranking information sources that characterize failures in a grid system. In the FailRank architecture, feedback sources (e.g. websites, LDAP queries, representative low-level measurements, etc) are continuously coalesced into a representative array of numeric vectors, the FailShot Matrix (FSM). FSM is then continuously ranked using efficient top-k query processing algorithms in order to identify the K sites with the highest potential to feature some failure. This allows system administrators to focus their attention on the sites with the highest potential to run into failures and resource brokers to divert jobs away from the respective sites. We identify challenges and preliminary solutions for a variety of complementary tasks including exploratory data analysis and prediction.

Authors

Dr Chryssis Georgiou (University of Cyprus) Dr Demetrios Zeinalipour-Yazti (University of Cyprus) Mr Kyriacos Neocleous (University of Cyprus) Prof. Marios Dikaiakos (University of Cyprus)

Presentation materials

There are no materials yet.