Speaker
Dr
Giacinto Donvito
(INFN-BARI)
Description
The LIBI (International Laboratory for Informatics) is a
project, leadED by PROF.
CECILIA SACCONE of the Institute of Biomedical Technologies
of the italian National
Research Council (CNR) and supported by the Italian
Minister for Research, which
collects leading italian institutes in bioinformatics
working together with
technological partnerS with the aim to built a virtual
Laboratory with a modern
infrastructure supporting the life science research in Italy.
For its HT, high throughput, applications, LIBI has adopted
the gLite Middleware and
the EGEE infrastructure: a farm with 28 CPU is already part
of the EGEE grid branch
managed by the italian ROC.
The use of the grid technology by the LIBI, is dictated by
the enormous computational
resources required by particular applications. For example
the GenoMiner application
(Castrignanò et al., 2006) intends to carry out
cross-genome comparisons with the
aim to detect highly conserved sequences, LIKELY INVOLVED IN
CODING OR REGULATORY
ACTIVITY. A complex procedure has been ADOPTED in order
to improve the speed of
the comparison and to limit the search to selected parts of
the genomes. However, the
grid is needed to validate the entire procedure. For this
reason a grid application
has been set up to compares each tract of the human genome
with the whole rat genome.
An overall number of more of 2000 Million sequence
comparison is required, where each
of them can take up to 2-3 seconds.
In order to keep track of the comparison correctly executed
a "task-queue" schema
based on Database server suitably implemented. The
"task-queue" is capable of
managing multiple dependencies between tasks, it keeps
tracks of the grid-job-id to
which is demanded the execution of a specific task. It keeps
also track of the number
of attemptS to execute a particular task in order to avoid
the resubmission of jobs
that "always" fail.
There is also the possibility to choose the priority of some
of the tasks in order to
run them in the correct sequence.
The DB server used by the task queue can also be used to
monitor the status of the
application. This feature gives the possibility to the user
to control the status of
all the jobs running and executed in a very easy way.
The main Grid Services used by this application are the
Storage Services, to collect
the the huge amount of output date produced by the
application, and the Workload
Management System to choose the best farms with free CPU's
that can run the
application and to transport in a reliable way the crucial
files needed on the WN.
The challenge is actually running on the Italian gLite
infrastructure (INFN-GRID):
more then 450 thousands sequences comparisons have been
performed in a month time.
References:
1) Castrignano T, De Meo PD, Grillo G, Liuni S, Mignone F,
Talamo IG, Pesole G.
GenoMiner: a tool for genome-wide search of coding and
non-coding conserved sequence
tags. Bioinformatics. 2006 22(4):497-9.
2) LIBI: http://www.libi.it/
3) CSTminer:
http://nar.oxfordjournals.org/cgi/content/full/32/suppl_2/W624
Author
Dr
Giacinto Donvito
(INFN-BARI)
Co-authors
Dr
Flavio MIGNONE
(Università of Milano)
Dr
Giorgio Grillo
(ITB CNR Bari)
Prof.
Giorgio MAGGI
(INFN-BARI and Politecnico Bari)
Prof.
Graziano Pesole
(Università of Milano)
Dr
Sabino Liuni
(ITB CNR Bari)
Dr
Vito Flavio Licciulli
(ITB CNR Bari)