25–29 Sept 2006
CICG
Europe/Zurich timezone

GPS@, Web interface for Protein Sequence Analysis on Grid

26 Sept 2006, 17:00
2h 30m
CICG

CICG

CICG, 17 rue de Varembé, CH - 1211 Geneva 20 Switzerland
Board: 4
Demo EGEE Activity Meetings Demo session

Speaker

Dr Christophe Blanchet (Institut de Biologie et Chimie des Protéines (IBCP UMR 5086); CNRS; Univ. Lyon 1;)

Description

Bioinformatics analysis of data produced by high-throughput biology, for instance genome projects [1], is one of the major challenges for the next years. Some of the requirements of this analysis are to access up-to-date databanks (of sequences, patterns, 3D structures, etc.) and relevant algorithms (for sequence similarity, multiple alignment, pattern scanning, etc.) [2]. Since 1998, we are developing the NPS@ Web server ([3], Network Protein Sequence Analysis), that provides the biologist with many of the most common resources for protein sequence analysis, integrated into a common workflow. These methods and data can be accessed through simple web browsing and HTTP connection, or througth high-level bioinformatics interface like MPSA program [4] or AntheProt [5]. Today, the computing resources available behind the NPS@ Web portal limit the capabilities of our server, as it is the case also for other genomics and proteomics Web portals. Indeed some methods are very computing-time and memory consuming. Our NPS@ portal is facing an increasing demand of CPU and disk resources and the management of numerous bioinformatics resources (algorithms, databanks). NPS@ [3] is providing biologist with a Web form to input their data (protein sequences) in order to run a BLAST analysis against a given protein sequence database. User pastes his sequence of protein in the corresponding field. Then he chooses the database that will be scan with the query sequence. All the protein databases available on NPS@ can be selected through a multi-valued list of the form. GPS@ grid web portal (Grid Protein Sequence Analysis, http://gpsa-pbil.ibcp.fr) is the grid release of the NPS@ bioinformatics portal. GPS@ hides the mechanisms required for submitting bioinformatics analyses on the grid infrastructure. Selecting the “EGEE” check-box will schedule the submission of the BLAST on the EGEE grid [6] when clicking on the “submit” button. The bioinformatics algorithms and databases available on GPS@ have been distributed and registered on the grid and GPS@ runs its own EGEE interface to the grid. First, the job description in the Web form is converted into a JDL file, that can then be submitted to the workload management system of EGEE. The GPS@ sub-process that have submitted the job, is also checking periodically the status of this job by querying the resource broker with the good commands. All steps are notified to the user through the Web page of the submission, indicating the time and the duration of the current step. When achieved, i.e. reaching the “Done” step, the GPS@ automat downloads the result file from BLAST. Then this raw result file in BLAST format is processed and converted into a HTML page showing, in a colored and graphical way, the list of similar protein sequences, and also graph and pairwise alignments of them. This formatting process is directly inherited from the original NPS@ portal, providing biologists with a well-known interface and way of displaying results. GPS@ portal makes the Bioinformatics job submission easier on the grid, and provide biologist with the benefit of the EGEE grid infrastructure to analyze large biological dataset: e.g. including several protein secondary structure predictions into a multiple alignment, or clustering a sequence set by analyzing, with BLAST or SSEARCH, each sequence against the others, … Acknowledgements This work has been funded by GriPPS project (ACI GRID PPL02-05), EGEE project (EU FP6, ref. INFSO-508833) and EMBRACE Network of Excellence (EU FP6, LHSG-CT-2004-512092). References [1] Bernal, A., Ear, U., Kyrpides, N. : Genomes OnLine Database (GOLD): a monitor of genome projects world- wide. NAR 29 (2001) 126-127 [2] G. Perrière, C. Combet, S. Penel, C. Blanchet, J. Thioulouse, C. Geourjon, J. Grassot, C. Charavay, M. Gouy, L. Duret and G. Deléage, Integrated databanks access and sequence/structure analysis services at the PBIL. Nucleic Acids Res., 31:3393-3399, 2003. [3] Combet, C., Blanchet, C., Geourjon, C. et Deléage, G. : NPS@: Network Protein Sequence Analysis. Tibs, 25 (2000) 147-150. [4] Blanchet, C., Combet, C., Geourjon, C. et Deléage, G. : MPSA: Integrated System for Multiple Protein Sequence Analysis with client/server capabilities. Bioinformatics, 16 (2000) 286-287. [5] Deleage, G, Combet, C, Blanchet, C, Geourjon, C. : ANTHEPROT: an integrated protein sequence analysis software with client/server capabilities. Comput Biol Med., 31 (2001) 259-267 [6] EGEE – Enabling Grid for E-science in Europe; http://www.eu-egee.org

Primary author

Dr Christophe Blanchet (Institut de Biologie et Chimie des Protéines (IBCP UMR 5086); CNRS; Univ. Lyon 1;)

Co-authors

Dr Christophe Combet (Institut de Biologie et Chimie des Protéines (IBCP UMR 5086); CNRS; Univ. Lyon 1;) Prof. Gilbert Deleage (Institut de Biologie et Chimie des Protéines (IBCP UMR 5086); CNRS; Univ. Lyon 1;) Mr Rémi Mollon (Institut de Biologie et Chimie des Protéines (IBCP UMR 5086); CNRS; Univ. Lyon 1;)

Presentation materials

There are no materials yet.