2–6 Mar 2009
Le Ciminiere, Catania, Sicily, Italy
Europe/Rome timezone

A new approach to High Availability

4 Mar 2009, 11:00
25m
Galilei (120) (Le Ciminiere, Catania, Sicily, Italy)

Galilei (120)

Le Ciminiere, Catania, Sicily, Italy

Viale Africa 95100 Catania
Oral Emerging Technologies within the EGEE infrastructure Novel Services and Technologies

Speaker

Dr Federico Calzolari (Scuola Normale Superiore - INFN Pisa)

Description

High availability has always been one of the main problems for a data center. Till now high availability was achieved by host per host redundancy, a highly expensive method in terms of hardware and human costs. A new approach to the problem can now be offered by virtualization. Using virtualization, it is possible to achieve a redundancy system for all the services running on a data center.

Keywords

High Availability, Virtualization, Host on demand, PXE

Impact

This new approach to high availability needs only one remote controller, while each one of the physical machines with a virtual environment on board can act as backup server, to recover services running on a crashed physical host. Even the remote controller can run shared among the physical servers of the computing center, to reduce the possibility of a single point of failure in case of disaster.
The system is able to recover in less than 3 minutes a crashed machine (e.g. overloaded), and in less than 7 minutes a critically damaged host (e.g. where the /boot partition has been deleted).To virtualize the main servers of a data center, a new procedure has been developed to migrate physical to virtual hosts. The whole Grid data center SNS-PISA, part of the EGEE/LCG CERN production Grid, is running at the moment in virtual environment under the high availability system.

URL for further information

http://3rc.sns.it

Detailed analysis

This new approach to high availability allows to distribute the running virtual machines over the only servers up and running, by exploiting the features of the virtualization layer: start, stop and move virtual machines between physical hosts. The system (3RC) is based on a finite state machine, providing the possibility to restart each virtual machine over any physical host, or reinstall it from scratch. A complete infrastructure has been developed to install operating system and middleware in a few minutes. When the damage concerns a physical host, the hosted virtual machines are automatically moved to other physical hosts, by acting a selection of the lower loaded among the available servers.

Conclusions and Future Work

Exploiting virtualization and ability to install a host from scratch in a completely automatic manner, it is possible to achieve a sort of host on demand, where the start-up of a backup virtual machine is done only when the disaster occurs. As extension of the 3RC architecture, several storage solutions will be tested to store and centralize all the virtual disks, from NAS to SAN, to grant data safety and access from everywhere.

Author

Dr Federico Calzolari (Scuola Normale Superiore - INFN Pisa)

Co-authors

Dr Alberto Ciampa (INFN Pisa) Dr Enrico Mazzoni (INFN Pisa) Dr Silvia Arezzini (INFN Pisa)

Presentation materials