11–14 Feb 2008
<a href="http://www.polydome.org">Le Polydôme</a>, Clermont-Ferrand, FRANCE
Europe/Zurich timezone

Simple, fault tolerant, lightweight grid computing approach for bag-of-tasks applications

12 Feb 2008, 14:20
20m
Champagne (<a href="http://www.polydome.org">Le Polydôme</a>, Clermont-Ferrand, FRANCE)

Champagne

<a href="http://www.polydome.org">Le Polydôme</a>, Clermont-Ferrand, FRANCE

Speaker

Yiannis Georgiou (LIG Laboratory)

Description

Our approach consists by a (RMS) resource management system OAR, responsible for the efficient allocation of local cluster resources and a grid lightweight service CIGRI that uses only the idle cluster resources by not interfering to the normal functionality of the interconnected clusters. The approach is based on the concept of "best effort" tasks, introduced by OAR. This type of jobs have the minimum execution priority and are submitted only if there is an idle resource. However, if during their execution the resource is requested by a local cluster user, the grid "best-effort" job is killed by the local RMS. The CIGIR grid fault-treatment mechanism can resubmit the killed jobs and thus guarantee a successful completion of the whole calculation. Features like web portal for grid monitoring, checkpoint/restart, results collection, support of diskless PCs environment (ComputeMode) and application data transfer are implemented and provide ease of use and quality of service to the user.

URL for further information:

oar.imag.fr/ , cigri.imag.fr/ , computemode.imag.fr/ , ciment.ujf-grenoble.fr/ , grid5000.fr/

Provide a set of generic keywords that define your contribution (e.g. Data Management, Workflows, High Energy Physics)

alternative lightweight grid, fault-tolerance, scheduling, BoT applications, best-effort tasks

4. Conclusions / Future plans

CIGRI/OAR softwares have been active research projects since 2002. In one of the contexts where they are used (CIMENT), its users can benefit of the power of 6 different clusters with a total of more than 700 processors of heterogeneous machines, for execution of large-scale scientific applications.
The experimental method used to study the CIGRI grid service and evaluate the new functionalities is conducted upon Grid5000 experimental platform. OAR is the official RMS used on Grid5000 platform.

1. Short overview

An alternative grid computing approach for large scale computation, is the exploitation of idle resources.
We present a simple, scalable and fault tolerant grid service of transparently harnessing idle cluster resources and idle diskless desktop workstations for executing large-scale scientific "bag-of-tasks" (BoT) applications.

3. Impact

The mainstream grid computing approach of Globus combines security, resource discovery and resource access in grid environments. It provides standardized services to construct computational grids. However, the installation, configuration and maintenance of this system, is a rather complicated task and requires a highly skilled support team, which not a lot laboratories are willing to afford.
Lower-cost solutions were introduced by technologies like desktop grid (Seti@home) which is based on the idea of harvesting the computing power (of individual desktop PCs) going idle on the Internet. In the case of multiple distinct administrative domains that want to share their resources, similar approaches are provided by OurGrid and Condor platforms.
In a similar context our lightweight approach shares similarities with the above projects. As a matter of fact, the limited security measures and the support of simple BoT applications, makes CIGRI the lighter and simpler solution of both.

Primary author

Yiannis Georgiou (LIG Laboratory)

Co-authors

Bruno Bzeznik (Projet CIMENT) Nicolas Capit (LIG Laboratory) Olivier Richard (LIG Laboratory)

Presentation materials