The understanding of the evolutionary history of genes, proteins, genomes,
and whole organisms is an important prerequisite in medical, biological,
and bioinformatics research. The essential tool to gain insights into the
history is the reconstruction of evolutionary trees, so-called
phylogenies, and has become a common task in biological and bioinformatics
research. Biological sequence data such as DNA or protein sequences
usually serve as input for phylogenetic analysis since they preserve
traces of the process of mutation and selection during their development.
Many problems of phylogenetic analysis are known to be NP-complete or even
NP-hard. Thus, all methods used nowadays for more than 10 sequences are in
fact heuristics. However, these heuristics still suffer from the ever
growing amounts of data in the public databases, which end in a humongous
amount of phylogenies increasing exponentially with the number of
sequences or species in the analysis.
Hence, since the middle of the 90ies, parallel and distributed computing
entered the field of phylogenetic analysis as a tool to reduce the running
time of the analysis. Among the most reliable methods currently in use are
those based on statistical principles such as maximum likelihood or
Bayesian statistics. However, these methods also belong to the
computationally most demanding.
Here we address a number of typical problems from the field of
evolutionary bioinformatics and phylogenetics and will exemplify
achievements parallelizing maximum likelihood methods. In this context we
will highlight the impact of different granularity. Furthermore, we will
present potential applications and current achievements using grid
technologies to further improve the performance.