Speaker
Description
3. Impact
The GO and GOA repositories are updated in a monthly frequency improving the annotation quality but also increasing the number of annotated gene products.
The results show that most of the gene products from non-model organisms are poorly annotated and therefore were not considered within this search or produced low level informatin. For that reason the algorithm is highly dependent on new releases of the GO and GOA and the functional analogous search needs to be updated as frequent as possible. Only by using the GRID technology we are able to fulfill this need and are able to offer the best results to the scientific community by recalculating the whole search results using each new monthly release of GO and GOA.
Provide a set of generic keywords that define your contribution (e.g. Data Management, Workflows, High Energy Physics)
bioinformatics, life science, temporal data distribution,
4. Conclusions / Future plans
The algorithm is a very high data and data-access intensive application. To avoid the problem of concurrent accesses to the data, the system temporally distributes both the analysis tool and the data on WNs where the tool has to operate. The jobs were distributed over the EGEE grid infrastructure within the VO biomed using about 300 WNs. The input data is in the size of 600MB and the results in the order of 2GB. The process was terminated within a day instead of about 60 days using one CPU.
1. Short overview
Up to now, researchers have compared genes looking at their sequence similarity. However the correlation “sequence – function” is only partially applicable. Descriptive annotations, such the one provided by the Gene Ontology (GO) and its associations with the gene products (GOA), offer information for a way of comparing genes according to their functional description.
The application consists of an algorithm that uses the data of GO and GOA to find functional analogous gene products, i.e. gene