domingo, 17 de diciembre de 2017

Criteria for phylogenetic analyses

When performing a phylogenetic analysis, we encountered several obstacles. The first of these is the tree space; we cannot be sure if the tree obtained is the real tree or the most suitable oneThe second is the heterogeneity between the branches; since different characters change at different rates within and among evolutionary lineages (Gaut et al., 1992). The third is the character convergence; since they can suggest evolutionary similarities without the taxa being phylogenetically related (Felsenstein, 1988). The fourth is the lack of data, either of the terminals or of characters of genealogical importance. Because of this, researchers have proposed different criteria to reconstruct phylogenetic relationships among taxa. These criteria are parsimony, genetic distance, likelihood and Bayesian inference.


The distance methods calculate the total distance between pairs of taxa, taking into account the differences in their sequences (Sourdis & Nei, 1987). The criterion of Parsimony is based on Occam's Razor (Steel and Penny, 2000).This criterion, states that one should prefer simpler explanations (requiring fewer assumptions) over more complex, ad hoc ones. Then, the tree that requires fewer evolutionary events explains better the observed data (Steel and Penny, 2000). Likelihood and Bayesian inference are criteria of statistical inference. Likelihood seeks to find the tree topology that confers the highest probability on the observed characteristics of tip species (Sober, 2004). Maxi. Likelihood considers the fit between a model of the evolutionary process, the data and each of the possible phylogenetic trees to find the best tree (Salemi, 2009). On the other hand, Bayesian inference uses probability distributions to describe the uncertainty of all parameters unknowns, including the model parameter(s). In phylogeny tree topology and substitution model specify the statistical model of the data (Nascimento et al., 2017), some of the parameters are base frequency, exchange rates, heterogeneity.

Each of these approaches has advantages and disadvantages, each one has its own biases in the reconstruction. Therefore, defining what is the most appropriate criterion is an open question in phylogenetic analysis.

Distance methods assume that the most similar organisms are phylogenetically closer. So it can lead to convergences. One of the most common distance-based methods is UPGMA, which can lead to errors in phylogenetic reconstruction such as grouping species that are not closely related if heterogeneity exists in the rate of evolutionary change (Nei, 1991).

Parsimony is biased when the rate of change per branch is high and tends to reconstruct the wrong tree due to long-branch attraction, while likelihood does not suffer from these problems (Clemente et al., 2009). But also, maximum likelihood and Bayesian methods when the characters under study evolve at non-uniform rates over time have been shown to be inconsistent and perform worse than parsimony (Clemente et al., 2009).

Also, phylogenetic reconstructions performed by Bayesian inference and likelihood due to ambiguous characters or lack of evidence, either of characters or terminals, can produce errors such as misleading estimates of topology and branch lengths  (Alan et al., 2009). To itself, in Bayesian inference, the priors of branch lengths and the parameters of heterogeneity of the rates due to incomplete data can generate high deceitful posterior probabilities (Alan et al., 2009).

The above are just some of the arguments available in the literature to find the different criteria for the reconstruction of phylogenetic analysis. Then, given the above, it would not be possible to give the magic recipe or the indicated criterion for a good phylogenetic analysis, since of all the criteria proposed so far none addresses or resolves all the common problems that we find when reconstructing the phylogenetic relationships between the taxa.

Bibliography

  • Alan R. Lemmon, Jeremy M. Brown, Kathrin Stanger-Hall, Emily Moriarty Lemmon; The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference.Syst. Biol. 58(1):130–145, 2009. pp:130-145. https://doi.org/10.1093/sysbio/syp017.
  • Nascimento F, Reis M, Yang Z. (2017). A biologist’s guide to Bayesian phylogenetic analysis. Nature Ecology & Evolution.DOI:10.1038/s41559-017-0280-x
  • Clemente J, Ikeo K, Valinete G, et al (2009). Optimized ancestral state reconstruction using Sankoff parsimony. BMC Bioinformatics vol:10(1) pp:51.
  • Felsenstein J. (1988). Phylogenies from molecular sequences: Inferences and reliability. Annual Review of Genetics 22:521-565.
  • Gau, B.S., Muse S.V., Clark W.D. y Clegg M.T. 1992. Relative rates of nucelotide substitution at rbcL locus in moncotyldeonous plants. Journal of Molecular Evolution 35:292-303.
  • Goloboff,  P.  A.  (1999).Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima. Cladistics 15. 415-428
  • Nei M. (1991). Efficiencies of different tree-making methods for molecular data. En: Miyamoto M. M., Cracaft, J. Edrs. Phylogenetic analysis of DNA sequences, Oxford, New York. 90-128.
  • Salemi, M., Vandamme, A.-M., & Lemey, P. (2009).The phylogenetic handbook: a practical to phylogenetic analysis and hypothesis testing. Cambridge University Press.
  • Steel M, Penny D. (2000). Parsimony, Likelihood, and the Role of Models in Molecular Phylogenetics.  2000 Jun;17(6):839-50.
  • Sober E. (2004). The Contest Between Parsimony and Likelihood. Syst. Biol. 53(4): 644-653.
  • Sourdis, J & NEI, M. Relative Efficiencies of the Maximum Parsimony and Distance-Matrix Methods in Obtaining the Correct Phylogenetic Tree. Center for Demographic and Population Genetics, The University of Texas Health Science Center at Houston.,1987;14 páginas.
































No hay comentarios: