When we perform a phylogenetic analysis, we
can find a lot of papers arguin the use (or not) of some phylogenetics software
and methods. Here I’m going to discuss about three of them: parsimony, maximum
likelihood (ML) and Bayesian inference (BI), in focus to evaluate empirically
what of them is the most appropriate to the phylogenic analysis in base of its
statistics consistency with distances between trees.
Some arguments that justify the method used in
phylogeny is statistics consistency, property discussed between parsimony and
ML advocates (Brower, 2017). Felsenstein (1978) said ML has the property of consistency
and parsimony have it too when probabilities of evolution change are small
(parsimony and ML behave the same), but no when probability of a tips
characters sets is higher than the others in way more data are added
(Felsenstein, 1973). Moreover, BI consistency is about posterior probability,
holding consistence at every point of a parameter space depending of its priors
(Goshal, 1998).
To test these assumptions, I simulated two matrices
of five taxa with 1000, 2000 and 4000 characters, with SeqGen (Rambaut &
Grassly, 1997) in R v.3.5.2, using the JC model and two different rooted trees
(one for each matrix) with all branch lengths fixed to 0.5. The parsimony analyses
were made using TNT v1.1 (Goloboff et al., 2008), with 30 replicates of SPR
search; to ML test I used PhyML v3.0 (Guindon & Gascuel, 2003) using the JC
evolution model; BI was realized in MrBayes v3.2.6 (Ronquist et al., 2012) under
JC model and the others parameters by default. Topological distances were
calculated with Robinson-Foulds (RF) distances (Robinson & Foulds 1981);
and branch score distances were calculate with the algorithm of Kuhner and
Felsenstein (1994), both in R.
Distances among resulting trees was 0 between
trees of the same DNA matrix, which they got the same relations (Fig. 1). Moreover,
distances between branch lengths of ML and BI trees showing a kind of patron,
being the trees with a smaller number of characters (ML phylogenies) those that
presented the highest distances comparing with trees with a greater number of
characters (Fig. 2). Also, ML of second matrix tree with 4000 characters (ML2_4000
in Fig. 2) showing short distances among trees (see also Distances branch
lengths in the GitHub link), compare to the others, mainly with ML1_1000 that
have the longest distances. For BI trees, distances show that among more data is
added closer are the branch lengths, corroborating the precision of this method
(Wiens & Moen, 2008), but not its accuracy.
Distances between trees is not the better way
to estimate consistency in BI, because it doesn’t compare among posterior
probabilities, because all methods rebuild the same relations between taxa.
However, that test will be consider in a posterior work.
In this way, BI, ML and parsimony can rebuild
the same relations between taxa no matter amount of characters is added, but if
we based in one characteristics of consistency (precision) BI show a better behave
compare to the others two methods.
Figure 1. Resulting trees: A) BI, first matrix; B)BI, second matrix; C) ML, first matrix; D) ML, second matrix; E) Parsimony, first matrix; and F) Parsimony, second matrix.
Figure 2. Summation of branch lengths distances of
each method with its characters amount.
References
Brower, A. (2017). Statistical
consistency and phylogenetic inference: a brief review. Cladistics, 0: 1-6.
Felsenstein, J. (1973). Maximum-likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics, 25:471-492.
Felsenstein, J. (1978). Cases in which Parsimony or Compatibility Methods Will be Positively Misleading. Systematic Zoology, 27(4): 401-410.
Goloboff, P. A., Farris, J. S., & Nixon, K. C. (2008). TNT, a free program for phylogenetic analysis. Cladistics, 24(5): 774-786.
Goshal, S. (1998). A
review of consistency and convergence of posterior distribution. Indian
Statistical Institute. Calcutta, India.
Guindon, S., Gascuel,
O. (2003). A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies
by Maximum Likelihood. Systematic Biology,
52(5): 696-704.
Kuhner, M. K. and Felsenstein, J. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology and Evolution, 11(3): 459–468.
Rambaut, A. and Grassly, N.C. (1997) “Seq-Gen: An Application for the
Monte Carlo Simulation of DNA Sequence Evolution along Phylogenetic Trees”, Computer
Applications In The Biosciences, 13(3):
235-238.Kuhner, M. K. and Felsenstein, J. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology and Evolution, 11(3): 459–468.
Robinson, D. and Foulds, L. (1981) Comparison of phylogenetic trees. Mathematical
Biosciences, 53(1): 131–147.
Ronquist, F.,
Teslenko, M., van der Mark, P., Ayres, D., Darling, A., Höhna, S., Larget, B.,
Liu, L., Suchard, M., Huelsenbeck, J. (2012). MrBayes 3.2: Efficient Bayesian
Phylogenetic Inference and Model ChoiceAcross
a Large Model Space. Systematic
Biology, 61(3): 539-542.
Script and data: https://github.com/DanielaP10/Post-2
Wiens,
J., Moen, D. (2008). Missing data and the accuracy of Bayesian phylogenetics. Journal of Systematics and Evolution, 46(3):
307-314.
Script and data: https://github.com/DanielaP10/Post-2
No hay comentarios:
Publicar un comentario