25–29 Sept 2006
CICG
Europe/Zurich timezone

Building gLite based scalable Grid environment with HP SFS

26 Sept 2006, 14:00
5h 30m
CICG

CICG

CICG, 17 rue de Varembé, CH - 1211 Geneva 20 Switzerland
Board: 19
Poster Users & Applications Poster session

Speaker

Mr Péter Dóbé (BME)

Description

In BME we have assembled a site consisting of a Grid Gate (GG), a Storage Element (SE) and over thirty Working Nodes (WN). The GG and SE as well as many of the WNs are HP ProLiant G2 servers with two Intel Xeon 3.00GHz processors and 2GB RAM. In most cases it is very difficult to maintain and administer such sites. Expanding these with new nodes is a time-consuming task that requires extraordinary attendance. We have created an NFS based solution, which allows nodes to be added in a matter of minutes without prior installation of gLite software. The worker nodes are nearly diskless: most part of the file system on each is served via NFS root located on the GG host. This supplies all the necessary applications and configuration files to operate in the Grid environment. Only temporary data files and the host-specific private keys and certificates are stored locally. The latter two are required by the PKI based authentication mechanism that is commonly used in Grids such as the EGEE Grid. It is also possible to deploy completely diskless nodes where these are downloaded from a secure network. The hosts may contain large capacity disks, which can be used not only for temporary storage for the worker nodes, but as a storage disk in a Disk Pool Manager (DPM) architecture. The operating system is still served by NFS or another possibility is to download a complete file system image to a local hard disk. With this realization we have created an easily manageable site. The hosts are connected with a Gigabit Ethernet network, which also connects them to an HP Scalable File Share (SFS) storage system of approximately 3 Terabytes capacity. To make both the GG and SE accessible from outside, each of them has another network interface which connects it to the Internet. The site is part of the EGEE infrastructure, and hence runs the gLite middleware. We have also established a Virtual Organization called “egeebme” and a local Certificate Authority for testing and educational purpose. This allows students and researchers to become acquainted with gLite without applying for a globally accepted certificate and VO membership. The homogeneous set of the HP servers makes it possible to use one common kernel image on each host without further configuration on the hosts separately. Inserting another identical HP ProLiant G2 server needs no special action. For other hardware configurations, only a different type of kernel image is required, but the same NFS root can be used. User authentication on the site is provided by Kerberos and LDAP running on the GG machine. The system on the SE contains the client tools for the HP SFS and is configured to make the storage space available to the Grid infrastructure. The SFS is based on the Lustre File System, and provides an efficient administration environment and a single point of management. The flexible Lustre technology permits a huge variety of configurations. Meta-data and object data is stored on different disks, providing separate scalability for both. This allows grid administrators to fine tune the system to meet the specific needs of the different types of applications. Files are stored on Object Storage Targets (OST) and administrative data is handled by the meta-data Server (MDS). File-data can be striped across multiple OSTs; allowing extreme file sizes and multiplying file I/O performance. SFS provides a network-independent solution with high network performance, redundancy, higher availability and transaction rates than the standard solutions, also offering compatibility between different types of distributions and architectures. It can be reached through the conventional standard network file systems. Network bandwidth and latency is improving rapidly, which obsoletes current storage technologies. HP SFS therefore supports the most recent types of network systems as interconnect. Gigabit Ethernet is only the slowest possibility, but InfiniBand or Myrinet can provide 770 Mbyte/s network performance. Robust computing nodes require an effective and reliable access to the main file system. While using NFS, we have been experiencing file I/O malfunctions, so we have been looking for a more sophisticated solution. By upgrading from NFS to SFS based centralized file systems random failures and network dropouts can be eliminated. In addition to being part of the European Grid, the site serves as a computing cluster for the computations at BME Faculty of Architecture. The problem being solved is calculating the prestressing strength of reinforced concrete bars used in bridges. This can be modeled as a Boundary Value Problem (BVP) that is easy to parallelize by parameter sweeping, i.e. dividing the parameter domain into smaller subdomains each node can work on separately.

Authors

Mr Dénes Németh (BME) Mr Péter Dóbé (BME)

Co-author

Dr Imre Szeberényi (BME)

Presentation materials

There are no materials yet.