25–29 Sept 2006
CICG
Europe/Zurich timezone

Enabling bioinformatics applications to access files over the grid via a GFAL plugin to Parrot

26 Sept 2006, 14:15
10m
Conf. Room 2 (CICG)

Conf. Room 2

CICG

CICG, 17 rue de Varembé, CH - 1211 Geneva 20 Switzerland
Oral Users & Applications Life Sciences (NA4)

Speaker

Dr Giacinto Donvito (INFN-BARI)

Description

One of the problems encountered while porting, in the framework of the BIOINFOGRID EU project, bioinformatics applications to the GRID, concerns their input-output. Many of the widely used tools in bioinformatics, have been developed when the grid technology was not yet established, so they make their input out from the computer local disk. To port such an application on the grid, one has to handle case when the WN local disk has a local disk not sufficient to contain the job input output files as well as cases when the WM is completely disk less. In such cases it is difficult to change the application code, which generally was not developed by the researcher, and sometimes it also difficult to make use of specific libraries that could allow the remote file access. Parrot represents a possible solution to the problem since it allow the use of a variety of protocols to map the file-system call, such for example gridftp and http. We will report on a development to further improve the versatility of Parrot and its usability on the EGEE infrastructure. In particular we have developed a plugin for Parrot which uses the GFAL API directly. In this case the application can run using only the Logical File Name (LFN) of the required input output files and does not have to worry about the details of the underlying storage system use in the EGEE grid. Along the same line we have also developed a similar filesystem which will use GFAL API and FUSE. As FUSE is the most widely used system to use remote file services as a local file on linux. Fuse also has been integrated in linux kernel and this makes more easy to implement it on linux environment. Now we are involved in an intense testing of both the two implementation. The most important Grid services involved in this work are the Storage Services and the File Catalog Services. References: 1) Parrot: http://www.cse.nd.edu/~ccl/software/parrot/ 2) FUSE: http://fuse.sourceforge.net/

Author

Dr Giacinto Donvito (INFN-BARI)

Co-authors

Prof. Giorgio MAGGI (INFN-BARI and Politecnico Bari) Dr Vihang DUDALKAR (Università of Bari)

Presentation materials