Speaker
O. Tatebe
(GRID TECHNOLOGY RESEARCH CENTER, AIST)
Description
Gfarm v2 is designed for facilitating reliable file sharing and
high-performance distributed and parallel data computing in a Grid
across administrative domains by providing a Grid file system. A
Grid
file system is a virtual file system that federates multiple file
systems. It is possible to share files or data by mounting the
virtual file system. This paper discusses the design and
implementation of secure, robust, scalable and high-performance Grid
file system.
The most time-consuming, but also the most typical, task in data
computing such as high energy physics, astronomy, space exploration,
human genome analysis, is to process a set of files in the same way.
Such a process can be typically performed independently on every file
in parallel, or at least have good locality. Gfarm v2
supports high-performance distributed and parallel computing for such
a process by introducing a "Gfarm file", a new "file-affinity"
process
scheduling based on file locations, and new parallel file access
semantics. An arbitrary group of files possibly dispersed across
administrative domains can be managed as a single Gfarm file. Each
member file will be accessed in parallel in a new file view called
"local file view" by a parallel process possibly allocated by
file-affinity scheduling based on replica locations of the member
files. File-affinity scheduling and new file view enable the ``owner
computes'' strategy, or ``move the computation to data'' approach for
parallel and distributed data computing of member files of a Gfarm
file in a single system image.
Primary authors
N. Soda
(SRA)
O. Tatebe
(GRID TECHNOLOGY RESEARCH CENTER, AIST)
S. Matsuoka
(Tokyo Institute of Technology/National Institute of Informatics)
S. Sekiguchi
(Grid Technology Research Center, AIST)
Y. Morita
(KEK)