25–29 Sept 2006
CICG
Europe/Zurich timezone

A GRID PLATFORM FOR ITALIAN BIOINFORMATICS

26 Sept 2006, 15:15
10m
Conf. Room 2 (CICG)

Conf. Room 2

CICG

CICG, 17 rue de Varembé, CH - 1211 Geneva 20 Switzerland
Oral Users & Applications Life Sciences (NA4)

Speaker

Dr Giacinto Donvito (INFN-BARI)

Description

The LIBI (International Laboratory for Informatics) is a project, leadED by PROF. CECILIA SACCONE of the Institute of Biomedical Technologies of the italian National Research Council (CNR) and supported by the Italian Minister for Research, which collects leading italian institutes in bioinformatics working together with technological partnerS with the aim to built a virtual Laboratory with a modern infrastructure supporting the life science research in Italy. For its HT, high throughput, applications, LIBI has adopted the gLite Middleware and the EGEE infrastructure: a farm with 28 CPU is already part of the EGEE grid branch managed by the italian ROC. The use of the grid technology by the LIBI, is dictated by the enormous computational resources required by particular applications. For example the GenoMiner application (Castrignanò et al., 2006) intends to carry out cross-genome comparisons with the aim to detect highly conserved sequences, LIKELY INVOLVED IN CODING OR REGULATORY ACTIVITY. A complex procedure has been ADOPTED in order to improve the speed of the comparison and to limit the search to selected parts of the genomes. However, the grid is needed to validate the entire procedure. For this reason a grid application has been set up to compares each tract of the human genome with the whole rat genome. An overall number of more of 2000 Million sequence comparison is required, where each of them can take up to 2-3 seconds. In order to keep track of the comparison correctly executed a "task-queue" schema based on Database server suitably implemented. The "task-queue" is capable of managing multiple dependencies between tasks, it keeps tracks of the grid-job-id to which is demanded the execution of a specific task. It keeps also track of the number of attemptS to execute a particular task in order to avoid the resubmission of jobs that "always" fail. There is also the possibility to choose the priority of some of the tasks in order to run them in the correct sequence. The DB server used by the task queue can also be used to monitor the status of the application. This feature gives the possibility to the user to control the status of all the jobs running and executed in a very easy way. The main Grid Services used by this application are the Storage Services, to collect the the huge amount of output date produced by the application, and the Workload Management System to choose the best farms with free CPU's that can run the application and to transport in a reliable way the crucial files needed on the WN. The challenge is actually running on the Italian gLite infrastructure (INFN-GRID): more then 450 thousands sequences comparisons have been performed in a month time. References: 1) Castrignano T, De Meo PD, Grillo G, Liuni S, Mignone F, Talamo IG, Pesole G. GenoMiner: a tool for genome-wide search of coding and non-coding conserved sequence tags. Bioinformatics. 2006 22(4):497-9. 2) LIBI: http://www.libi.it/ 3) CSTminer: http://nar.oxfordjournals.org/cgi/content/full/32/suppl_2/W624

Author

Dr Giacinto Donvito (INFN-BARI)

Co-authors

Dr Flavio MIGNONE (Università of Milano) Dr Giorgio Grillo (ITB CNR Bari) Prof. Giorgio MAGGI (INFN-BARI and Politecnico Bari) Prof. Graziano Pesole (Università of Milano) Dr Sabino Liuni (ITB CNR Bari) Dr Vito Flavio Licciulli (ITB CNR Bari)

Presentation materials