sábado, 9 de diciembre de 2017

Phylogenetic analyzes.

Phylogenetic analyzes.

Phylogenetic analysis is the means to estimate evolutionary relationships, such relationships are usually represented as a diagram known as a phylogenetic tree. Different methodologies have been proposed or exist with which phylogenetic analyses are carried out, each with advantages and disadvantages. But the important question is how to make a phylogenetic analysis well?. A large amount of data available for phylogenetic analyses is expanding at an exponential rate. As data sets have become larger, it has become increasingly critical to understand the advantages and disadvantages of using various methods of phylogenetic inference (Rosenberg & Kumar, 2001).

The most used methods we can find are Parsimony, Distance, Maximum Likelihood and Bayesian Inference. Generally, in the construction by the parsimony method look for the representation (cladogram) that has the least number of genetic changes with respect to the sequence of a common ancestor. Evaluate all the possible trees that represent evolution and look for the most optimal. Distance-based methods first calculate the total distance between all pairs of taxa, taking into account the differences in their sequences, and then calculate a tree based on those distances (Sourdis & Nei, 1987). Maximum Likelihood (ML) and Bayesian Inference (IB) are statistical methods, where a priori knowledge about the data is taken into account. The ML method estimates the probability of how well the character matrix is explained by phylogenetic trees (Felsenstein, 2004), while IB estimates the probability of how well phylogenetic trees are explained by the data (Huelsenbeck et al., 2001, Brooks et al., 2007). But these widely used methods have also been criticized; Maximum Parsimony is affected by the attraction of long branches (Felsenstein, 1978), which causes trees to reflect spurious phylogenetic relationships when the number of homoplastic characters overwhelms the homologous characters (Bergsten, 2005). Maximum Likelihood and IB is affected by the "repulsion" of sibling groups when they are located on long branches of trees (Siddall, 1998).

Personally I consider IB as the best method for phylogenetic reconstructions since it presents consistency regardless of the zone or branch length (Steel, 2013), besides other advantages such as computational efficiency (Ronquist & Huelsenbeck, 2009), integration of complex evolutionary models, incorporation of priors, in addition to an easy interpretation of the posterior probability (Yang, 2006).

The steps for a "good" phylogenetic analysis would be the following: Obtain the data (molecular or morphological), alignment, select the best substitution model (for nucleotide sequences, these range from the simple JC69 to the GTR complex, and for data morphological the Mk model (Nascimento et al., 2017), selection of priors (the prior should summarize the biologist's best knowledge about the model or parameters before the data are analyzed). The correlation must be ignored and independent priorities assigned for the parameters (Nascimento et al., 2017)), phylogenetic reconstruction by IB (The tree space is explored by visiting many combinations of parameter values in different trees by routes modeled by Monte Carlo Markov Chains (MCMC). Starting from a T1 tree in space, each new random address is regulated by the decision rules of the algorithm. In each step (generation) the value of some of the parameters is modified, and the posterior probability of the T2 tree is calculated under that new combination of parameters. The Markov Chain "visits" another combination of probable values of the parameters associated with another tree to evaluate if the posterior probability value is slightly lower, equal or higher with respect to the combination of parameters of the previous tree (De Luna et al., 2005). If you explore trees of many "generations" and repeat several chains, the analysis eventually stabilizes. Ideally, the MCMC would be long enough to obtain a reliable estimate of the subsequent distribution, but not as long as to waste computational resources. After reaching a point of equilibrium, MCMC visits trees frequently in proportion to their posterior probability (Huelsenbeck et al., 2002), visualization and understanding of the results.

Bibliography.
Bergsten J. 2005. A review of long-branch attraction. Cladistics, 21, 163–193.

Brooks D.R., J. Bilewitch, C. Condy, et al. 2007. Quantitative Phylogenetic Analysis in the 21st Century. Revista Mexicana de Biodiversidad, 78, 225–252.

De Luna, Efrain, Guerrero, José A., & Chew-Taracena, Tania. (2005). Sistemática biológica: avances y direcciones en la teoría y los métodos de la reconstrucción filogenética. Hidrobiológica15(3), 351-370.

Felsenstein J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Systematic Zoology, 27, 401–410.

Felsenstein J. 2004. Inferring phylogenies. Sinauer Associates, Sunderland, Massachusetts.

Huelsenbeck J. P., Ronquist F., Rasmus N., Bollback J. P. 2001. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science. 294 (5550) : 2310-2314


Huelsenbeck  J. P., Larget B., Miller  & Ronquist. 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Systematic Biology 51 (5): 673-688. 

Sourdis, J & NEI, M. Relative Efficiencies of the Maximum Parsimony and Distance-Matrix Methods in Obtaining the Correct Phylogenetic Tree. Center for Demographic and Population Genetics, The University of Texas Health Science Center at Houston. , 1987; 14 páginas.

Nascimento, F; Reis, M;Yang, Z. 2017.A biologist’s guide to Bayesian phylogenetic analysis. Nature Ecology & Evolution.DOI: 10.1038/s41559-017-0280-x

Siddall M.E. 1998. Success of Parsimony in the Four-Taxon Case: Long-Branch Repulsion by Likelihood in the Farris Zone. Cladistics, 14, 209–220.

Steel, M. (2013). Consistency of Bayesian inference of resolved phylogenetic trees. Journal of theoretical biology 336: 246–49.

Ronquist F., van der Mark P., Huelsenbeck J. P. 2009. Bayesian phylogenetic analysis using MrBayes. In: Lemey P., Salemi M., and Anne-Mieke V. (eds.) The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing. Cambridge University Press. 219-236.

Rosenberg, S & Kumar, S. 2001. Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well, Molecular Biology and Evolution, Volume 18, Issue 9, 1 September 2001, Pages 1823–1827.

Yang Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford, England

No hay comentarios: