21-25 September 2009
Hotel Barcelo Sants
Europe/Zurich timezone

Towards fault-tolrant grids with virtualization technology

Not scheduled
Hotel Barcelo Sants

Hotel Barcelo Sants



Radosław Januszewski (PSNC)

Growing size of the Grid infrastructures makes the infrastructure more and
more prone to failures. Mean time between failures of petaflop systems is
counted in at most hundred of hours. Ensuring proper fault tolerance for
long-running applications is one of the problems the Pl-Grid, the Polish
NGI, is trying to solve. Traditionally, the problem was attacked by
utilizing different implementations of checkpoint and restart services
which, unfortunately, are inherently an application or operating
system-specific feature. The growing popularity of virtualization software
available for modern OS makes it possible to employ its freeze-and-resume
features to acquire checkpoint/restart functionality. We are striving to
provide an integration of virtualization with the Grid middleware in order
to ensure a seamless blend of those technologies. The goal is to provide an
automatic or grid-workflow driven mechanism allowing for more fault-tolerant
and flexible computing environments.

