Speaker
Mr
Péter Dóbé
(BME)
Description
In BME we have assembled a site consisting of a Grid Gate
(GG), a Storage Element
(SE) and over thirty Working Nodes (WN). The GG and SE as
well as many of the WNs are
HP ProLiant G2 servers with two Intel Xeon 3.00GHz
processors and 2GB RAM.
In most cases it is very difficult to maintain and
administer such sites. Expanding
these with new nodes is a time-consuming task that requires
extraordinary attendance.
We have created an NFS based solution, which allows nodes to
be added in a matter of
minutes without prior installation of gLite software. The
worker nodes are nearly
diskless: most part of the file system on each is served via
NFS root located on the
GG host. This supplies all the necessary applications and
configuration files to
operate in the Grid environment. Only temporary data files
and the host-specific
private keys and certificates are stored locally. The latter
two are required by the
PKI based authentication mechanism that is commonly used in
Grids such as the EGEE
Grid. It is also possible to deploy completely diskless
nodes where these are
downloaded from a secure network. The hosts may contain
large capacity disks, which
can be used not only for temporary storage for the worker
nodes, but as a storage
disk in a Disk Pool Manager (DPM) architecture. The
operating system is still served
by NFS or another possibility is to download a complete file
system image to a local
hard disk. With this realization we have created an easily
manageable site.
The hosts are connected with a Gigabit Ethernet network,
which also connects them to
an HP Scalable File Share (SFS) storage system of
approximately 3 Terabytes capacity.
To make both the GG and SE accessible from outside, each of
them has another network
interface which connects it to the Internet. The site is
part of the EGEE
infrastructure, and hence runs the gLite middleware. We have
also established a
Virtual Organization called “egeebme” and a local
Certificate Authority for testing
and educational purpose. This allows students and
researchers to become acquainted
with gLite without applying for a globally accepted
certificate and VO membership.
The homogeneous set of the HP servers makes it possible to
use one common kernel
image on each host without further configuration on the
hosts separately. Inserting
another identical HP ProLiant G2 server needs no special
action. For other hardware
configurations, only a different type of kernel image is
required, but the same NFS
root can be used. User authentication on the site is
provided by Kerberos and LDAP
running on the GG machine. The system on the SE contains the
client tools for the HP
SFS and is configured to make the storage space available to
the Grid infrastructure.
The SFS is based on the Lustre File System, and provides an
efficient administration
environment and a single point of management. The flexible
Lustre technology permits
a huge variety of configurations. Meta-data and object data
is stored on different
disks, providing separate scalability for both. This allows
grid administrators to
fine tune the system to meet the specific needs of the
different types of
applications. Files are stored on Object Storage Targets
(OST) and administrative
data is handled by the meta-data Server (MDS). File-data can
be striped across
multiple OSTs; allowing extreme file sizes and multiplying
file I/O performance.
SFS provides a network-independent solution with high
network performance,
redundancy, higher availability and transaction rates than
the standard solutions,
also offering compatibility between different types of
distributions and
architectures. It can be reached through the conventional
standard network file
systems. Network bandwidth and latency is improving rapidly,
which obsoletes current
storage technologies. HP SFS therefore supports the most
recent types of network
systems as interconnect. Gigabit Ethernet is only the
slowest possibility, but
InfiniBand or Myrinet can provide 770 Mbyte/s network
performance.
Robust computing nodes require an effective and reliable
access to the main file
system. While using NFS, we have been experiencing file I/O
malfunctions, so we have
been looking for a more sophisticated solution. By upgrading
from NFS to SFS based
centralized file systems random failures and network
dropouts can be eliminated.
In addition to being part of the European Grid, the site
serves as a computing
cluster for the computations at BME Faculty of Architecture.
The problem being solved
is calculating the prestressing strength of reinforced
concrete bars used in bridges.
This can be modeled as a Boundary Value Problem (BVP) that
is easy to parallelize by
parameter sweeping, i.e. dividing the parameter domain into
smaller subdomains each
node can work on separately.
Authors
Mr
Dénes Németh
(BME)
Mr
Péter Dóbé
(BME)
Co-author
Dr
Imre Szeberényi
(BME)