domingo, 7 de abril de 2019

Different branch length for DNA simulate: effects under phylogenetic rebuild

Obtain phylogenies with good resolution is some of the objectives when we perform a phylogenetic analysis for any use (Wortley, et al., 2005; Swenson, 2009), and in some cases this resolution is affected by type and/or amount of data (Scotland, et al., 2003; Huelsenbeck & Hills, 1993), methods (Huelsenbeck & Hills, 1993) or number of taxa using in the analysis (Hedtke, et al. 2006). However, some DNA data is obtained from simulations, which need reference topologies were taken branch lengths (BL) of trees as the probability of change of a site (pij) (Rambaut & Grassly, 1997). Here I’m going to evaluated cases when one branch of reference tree to simulated DNA data, is longer than the others (autapomorphies) and its effects in rebuild topologies with two phylogenetics methods: Maximum Parsimony (MP) and Maximum Likelihood (ML), expecting that MP could not resolve monophyly in resulting topologies.

For this work, I simulated DNA data with SeqGen (Rambaut & Grassly, 1997) in R v.3.5.2 with 3 different reference trees with equal (br=0.5) and one of six branches different BL (br=1) at a time, under HKY model, for four terminals and 2000 DNA characters. For each tree were simulated three replicas of DNA matrices. I analyzed the matrices in TNT v1.1 (Goloboff et al., 2008), with 100 replicas of SPR search; and PhyML v3.0 (Guindon & Gascuel, 2003), by default. To compare relations between resulting trees I evaluated distances among the ML and MP trees with its BL by Robinson-Foulds (RF) distances (Robinson & Foulds 1981) in R. For more details:

Results and discussion
In the simulation, some of the most important determinants are branch lengths (Huelsenbeck & Hills, 1993) but commonly effects of LBA in phylogeny are studied (Felsenstein, 1978; Bergsten, 2005) than the effect of the change in only one branch length. In parsimony LBA generate bias in relations between taxa, in this case, parsimony rebuild all topologies without polytomies different to ML which all trees have polytomies (Fig. 1) no matter if BL's are equal or different; surprisingly result, different than the expected assuming that parsimony takes branch length as amount of non-informative characters (Felsenstein, 1978). These results may due to parsimony being a simplistic method fit in the selection of the simplest hypothesis,  allowing in this case, a better resolution in rebuilding phylogeny, contrary to ML which take BL as a parameter (Goloboff, 2003). However, distances between trees show that taxa relations in ML trees are very different than the other trees (Fig. 2, A), having the highest distances, mainly trees with a BL of one, concordant with the previous result. Anyway, among methods distance present a lot of variation (Fig. 2, B) maybe because of comparison were between trees of a different matrix of data.

 Parsimony as a simplest phylogenetic method in some cases can rebuild phylogenies with good resolution when DNA data is simulated in base of topologies with equal or different branch length, adverse to the expected result. However, this may suggest that Parsimony, different from Maximum Likelihood can resolve in a better way phylogenies when the matrix data presents autapomorphies.

Figure 1. One of the resulting topologies of ML (A) and MP (B), showing polytomies in ML tree and monophyly in MP tree.

Figure 2. A) R-F distances comparison between ML and MP methods and its BL. BL of 0.5 and 1 are represented by 1 and 2, respectively; B) Boxplot of RF distances among methods, evidence little differences in distances of ML and MP.


Bergsten, J. (2005). A review of long-branch attraction. Cladistics, 21: 163-193.

Brower, A. (2017). Statistical consistency and phylogenetic inference: a brief review. Cladistics, 0: 1-6.

Felsenstein, J. (1978). Cases in which Parsimony or Compatibility Methods Will be Positively Misleading. Systematic Zoology, 27(4): 401-410. 

Felsenstein, J. (1988). Phylogenies from Molecular Sequences: Inference and Reliability. Annual Review of Genetics, 22: 521-65. 
Goloboff, P. (2003). Parsimony, likelihood, and simplicity. Cladistics, 19: 91-103.

Goloboff, P. A., Farris, J. S., & Nixon, K. C. (2008). TNT, a free program for phylogenetic analysis. Cladistics, 24(5): 774-786.

Guindon, S., Gascuel, O. (2003). A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood. Systematic Biology, 52(5): 696-704. 

Hedtke, S., Townsend, T., Hills, D. (2006). Resolution of Phylogenetic Conflict in Large Data Sets by Increased Taxon Sampling. Systematic Biologists, 55(3)522-529.

Huelsenbeck, J., Hills, D. (1993).  Success of Phylogenetic Methods in the Four-taxon Case. Systematic Biologists, 42(3):247-26.

Rambaut, A. and Grassly, N.C. (1997). Seq-Gen: An Application for the Monte Carlo Simulation of DNA Sequence Evolution along Phylogenetic Trees. Computer Applications In The Biosciences, 13(3): 235-238.

Robinson, D. and Foulds, L. (1981) Comparison of phylogenetic trees. Mathematical Biosciences, 53(1): 131–147.

Scotland, R., Olmstead, R., Bennet, J. (2003). Phylogeny Reconstruction: The Role of Morphology. Systematic Biologists, 52(4):539–548.

Wiens, J., Kuczynski, C., Smith, S., Mulcahy, D., Sites, J., Townsed, T., Reeder, T. (2008). Branch Lengths, Support, and Congruence: Testing the Phylogenomic Approach with 20 Nuclear Loci in Snakes. Systematic Biology, 57(3):420–431.

Wortley, A., Rudall, P., Harris, D., Scotland, R. (2005). How Much Data are Needed to Resolve a Difficult Phylogeny? Case Study in Lamiales. Systematic Biology, 54(5):697–709.