“Phylogenies
represent our attempts to reconstruct the evolutionary history of
life” (Huelsenbeck and Ronquist, 2001), and while
computers have increased their speed and capacity, there are
more methods to infer phylogenetic trees, based on
distance and based on characters (Rizzo
and Rouchka, 2007). Maximum Parsimony (MP) (Fitch, 1971), Maximum Likelihood (ML) (Felsenstein, 1981) and Bayesian Inference (IB), are
three character-based methods very used currently
but MP is the only that does not find branch
lenghts, therefore, it has the problem of not
specifying the amount of change (Egan and Crandall, 2006). On
the other hand, ML finds branch lenghts but assumes that
model used is precise, thus, if the model
does not precise reflect the data set, the method is
inconsistent, causing problems, even when the
method is designed to be robust; also, the extensive
computation required and new evidence that suggests
multiple maximum likelihood points for a given phylogenetic tree, are
more disadvantages (Rizzo and Rouchka, 2007).
Now, we are going to focus in IB, a method with advanced techniques that allows to analyze more than 350 successful sequences using a moderate computational effort and the implementation of evolutionary models more complex and realistic than before (Huelsenbeck et al., 2001; Ronquist and Huelsenbeck, 2003). For this reason in this post I going to evaluate the effect of changes in the size of the matrices and the number of taxa used in a phylogenetic analysis with IB. My hypotheses is that resolution of topologies are more affected by the size of the matrix than by the number of taxa.
To
do this, I did a simulation of 3 topologies which I used
to simulate 54 matrices
in total, using HKY model, three
different character sizes (100, 500 and 2500), and two number of
taxa (12 and 48), each type of data had 3 replicas. Then, each
matrix was analyzed using Bayesian inference and I determined
the number of resolved nodes in each one to compared between
them. Scripts with methodology, software specifications
and parameters
that I used are detailed in the following link
None
of the topologies were completely resolved, nevertheless the best
resolution was presented in topologies whose matrices had 2500 characters. The number of resolved nodes was increasing as the number of characters increases but the change was more evident in analysis with 48 taxa when size of matrix went to 100 from 500. On the other hand, the change in resolution of topologies with 12 taxa was minimal in any of the three numbers of characters (Figure 1.).
In this case, only in sizes less than 500 characters, size of matrix had more impact on the resolution of the topology than the number of terminals. What indicates that in the phylogenetic analysis with Bayesian inference, increasing the number of characters would help to increase precision and number of resolved nodes, contrary to increase the number of taxa that have a lower impact. In other words, we can said that the estimator is consistent.
So
if I can choose a method to do phylogenetic reconstruction, I would
prefer IB, because this has several advantages like the ability to
incorporate prior knowledge, at this way, the
Bayesian framework offers a more direct expression of uncertainty,
including complete ignorance, what is suitable to create cumulative
knowledge. Also, some computational advantages, the capacity to
handle highly complex models efficiently and does not assume or
require normal distributions apart of the parameters of a model
(Schoot et al., 2014; Huelsenbeck and Ronquist, 2001). Of
course, the success of a good Bayesian analysis falls on the
objectivity and precision with which the priors are
assigned, however, assuming that this done correctly (as it should
be), all the advantages of a Bayesian analysis, make it the best
option to have a well-made phylogeny.
Fig.1. Mean of resolved nodes in 6 types of topologies. In the x axis we can see the different sizes of matrices and number of taxa. |
References
-Egan,
A.N., and Crandall, K.A. (2006). Theory of Phylogenetic Estimation,
in Evolutionary Genetics: Concepts and Case Studies. 1
ed. Oxford University Press, 426-436.
-Felsenstein, J.
(1981). Evolutionary trees from DNA sequences: a maximum
likelihood approach. J. Mol. Evol., vol, 17(6), 368-376.
-Fitch,
W. (1971). Toward defining the course of evolution: minimum change
for a specified tree topology. Syst. Zool., 20,
406-416.
-Huelsenbeck,
J.P., Ronquist, F., Nielsen, R. and Bollback, J.P. (2001). Bayesian
inference of phylogeny and its impact on evolutionary
biology, Science, 294, 2310–2314
-Huelsenbeck,
J.P. and Ronquist, F. (2001). MRBAYES: Bayesian
inference of phylogenetic trees, Bioinformatics, 17(8), 754-755.
-Rizzo,
J. and Rouchka E.C. (2007). Review of Phylogenetic Tree
Construction. University of Louisville Bioinformatics
Laboratory Technical Report Series, 2-7.
-Ronquist,
F., and Huelsenbeck, J.P. (2003). MrBayes 3: Bayesian
phylogenetic inference under mixed models. Bioinformatics, 19(12),
1572-1574.
-Schoot,
R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J. and
Aken, M. A. (2014). A Gentle Introduction to Bayesian Analysis:
Applications to Developmental Research. Child Dev, 85,
842-860.