Speaker
Mr
Kyriacos Neocleous
(University of Cyprus)
Description
Detecting and managing failures in an automated way is an
important step
toward the goal of a dependable grid. Currently, this is an
extremely complex
task that relies on over-provisioning of resources, ad-hoc
monitoring and user
intervention. We present the FailRank architecture, a
simple yet powerful
framework for integrating and ranking information sources
that characterize
failures in a grid system. In the FailRank architecture,
feedback sources (e.g.
websites, LDAP queries, representative low-level
measurements, etc) are
continuously coalesced into a representative array of
numeric vectors, the
FailShot Matrix (FSM). FSM is then continuously ranked
using efficient top-k
query processing algorithms in order to identify the K sites
with the highest
potential to feature some failure. This allows system
administrators to focus
their attention on the sites with the highest potential to
run into failures and
resource brokers to divert jobs away from the respective
sites. We identify
challenges and preliminary solutions for a variety of
complementary tasks
including exploratory data analysis and prediction.
Authors
Dr
Chryssis Georgiou
(University of Cyprus)
Dr
Demetrios Zeinalipour-Yazti
(University of Cyprus)
Mr
Kyriacos Neocleous
(University of Cyprus)
Prof.
Marios Dikaiakos
(University of Cyprus)