Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

CERN Computing Colloquium

High Performance AFS

by Hartmut Reuter (Max Planck Institut, Garching, Germany)

Europe/Zurich
500/1-001 - Main Auditorium (CERN)

500/1-001 - Main Auditorium

CERN

400
Show room on map
Description
It is known that AFS is rather slow compared to shared file systems such as Sanergy, GPFS and others, but also compared to NFS. In a high performance computing environment such as the large Regatta system at RZG this could be a knock-out criterium for the use of AFS.
Looking at the way Sanergy works, I got the idea some thing similar should also be possible for AFS. Sanergy exports file systems via NFS to client machines. The client machines, however, have direct SAN access to the disk so that NFS is used primarily for metadata operations and access control, while the data are read or written directly.
My approach for AFS is similar, but slightly easier to realize because it does not have to bother with low level disk access. Instead it uses underlying high performance shared files systems such as GPFS.
The fileserver's vicep-partitions are made visible to the client machines in the Regatta cluster. This, of course, should be done only in a trusted environment because unlike on normal AFS clients the root user would have no difficulty to access data in AFS stored in these partitions.

The modifications to AFS to achieve this goal are rather moderate:
1. The client has to identify visible vicep-partitions and to find the exporting fileserver. This has been done by storing the fileserver's sysid-file in the partition and transfering the uuid to kernel memory by means of a new subcall in the AFS system call.
2. Volumes with instances on fileservers with visible vicep-partitions are flagged that they can be accessed directly.
3. When a file on such a volume is opened the client does a new, special rpc to the fileserver to get the path of the file in the vicep-partition. This rpc, of course, does also all the access rights and quota checking.
4. On success, the client opens the vnode/dentry of the vicep-file and all further I/O is done directly. This avoids not only the rpc traffic over the network but also the whole AFS cache I/O.
5. Close after a write requires a dummy store-rpc to the fileserver in order to update the file's length in the metadata.

We are running such a configuration with a GPFS vicep-partition on our Regatta systems and see the "native" GPFS read and write performance with AFS. This feature is still not in real production at RZG.

Presently these modifications are available only for MR-AFS, but it should be easy to add these features also to the OpenAFS fileserver.
It is planned to use this technique in the DEISA project as well to provide shared file access between the different European HPC centers. In this case, one can take advantage of the fact that AFS can be used as secure protocol layer to access data in shared GPFS systems in a transparent way without having problems with the local uids at the different sites.

Biography
Hartmut Reuter is working since 1981 with RZG - the supercomputing center of the Max-Planck-Society in Garching in Germany. Since 1995 he is active in the further development of MR-AFS - the HSM-version of AFS. After OpenAFS was created in 2000 he contributed large file support and other features to OpenAFS and did the port for AIX 5.1 and 5.2.
Hartmut received his Dr. rer.nat. in physics from university Heidelberg in 1972 and worked since then in Heidelberg and Garching on the field of HSM systems and operating systems (HADES).
Video in CDS