4th EGEE User Forum/OGF 25 and OGF Europe's 2nd International Event

Name: 4th EGEE User Forum/OGF 25 and OGF Europe's 2nd International Event
Start: 2009-03-02T09:00:00+01:00
End: 2009-03-06T17:30:00+01:00
Location: Le Ciminiere, Catania, Sicily, Italy

2–6 Mar 2009

Le Ciminiere, Catania, Sicily, Italy

Europe/Rome timezone

Support

Kristina.Ulrika.Gunne@cern.ch

A new approach to High Availability

4 Mar 2009, 11:00

25m

Galilei (120) (Le Ciminiere, Catania, Sicily, Italy)

Galilei (120)

Le Ciminiere, Catania, Sicily, Italy

Viale Africa 95100 Catania

Oral Emerging Technologies within the EGEE infrastructure Novel Services and Technologies

Dr Federico Calzolari (Scuola Normale Superiore - INFN Pisa)

High availability has always been one of the main problems for a data center. Till now high availability was achieved by host per host redundancy, a highly expensive method in terms of hardware and human costs. A new approach to the problem can now be offered by virtualization. Using virtualization, it is possible to achieve a redundancy system for all the services running on a data center.

Conclusions and Future Work

Exploiting virtualization and ability to install a host from scratch in a completely automatic manner, it is possible to achieve a sort of host on demand, where the start-up of a backup virtual machine is done only when the disaster occurs. As extension of the 3RC architecture, several storage solutions will be tested to store and centralize all the virtual disks, from NAS to SAN, to grant data safety and access from everywhere.

URL for further information

http://3rc.sns.it

Detailed analysis

This new approach to high availability allows to distribute the running virtual machines over the only servers up and running, by exploiting the features of the virtualization layer: start, stop and move virtual machines between physical hosts. The system (3RC) is based on a finite state machine, providing the possibility to restart each virtual machine over any physical host, or reinstall it from scratch. A complete infrastructure has been developed to install operating system and middleware in a few minutes. When the damage concerns a physical host, the hosted virtual machines are automatically moved to other physical hosts, by acting a selection of the lower loaded among the available servers.

Impact

This new approach to high availability needs only one remote controller, while each one of the physical machines with a virtual environment on board can act as backup server, to recover services running on a crashed physical host. Even the remote controller can run shared among the physical servers of the computing center, to reduce the possibility of a single point of failure in case of disaster.
The system is able to recover in less than 3 minutes a crashed machine (e.g. overloaded), and in less than 7 minutes a critically damaged host (e.g. where the /boot partition has been deleted).To virtualize the main servers of a data center, a new procedure has been developed to migrate physical to virtual hosts. The whole Grid data center SNS-PISA, part of the EGEE/LCG CERN production Grid, is running at the moment in virtual environment under the high availability system.

Keywords

High Availability, Virtualization, Host on demand, PXE

Dr Federico Calzolari (Scuola Normale Superiore - INFN Pisa)

Dr Alberto Ciampa (INFN Pisa) Dr Enrico Mazzoni (INFN Pisa) Dr Silvia Arezzini (INFN Pisa)

Slides

Calzolari_3RC.pdf

Calzolari_3RC.ppt

4th EGEE User Forum/OGF 25 and OGF Europe's 2nd International Event

Support

A new approach to High Availability

Galilei (120)

Le Ciminiere, Catania, Sicily, Italy

Speaker

Description

Conclusions and Future Work

URL for further information

Detailed analysis

Impact

Keywords

Author

Co-authors

Presentation materials

Choose timezone

4th EGEE User Forum/OGF 25 and OGF Europe's 2nd International Event

Support

Speaker

Description

Conclusions and Future Work

URL for further information

Detailed analysis

Impact

Keywords

Author

Co-authors

Presentation materials