miércoles, 24 de junio de 2015

Divergence time estimation using autocorrelated variation rate and independent rate, depending on the topology size using MULTIDIVTIME and BEAST.

Introduction

Estimating divergence times can be performed by methods based on variation of substitution rates. The AR method establish that the rates of change between ancestor and descendant are autocorrelated and the rates can follow a log-normal distribution with a mean equal to the parent rate (Thorne, Kishino and Painter, 1998; Kishino, Thorne and Bruno, 2001; Thorne and Kishino, 2002) or can be determined by a non-central χ 2 distribution (Lepage et al., 2006). Another method used in molecular clock analysis is the RR method which assigns random independent rates for each lineage and these rates are drawn from a single underlying parametric distribution such as an exponential or log-normal (Drummond et al., 2006; Rannala and Yang, 2007; Lepage et al., 2007). RR method has been implemented in BEAST while the AR method has been implemented in MULTIDIVTIME. In terms of the programs, other difference is the priors on node ages which MULTIDIVTIME uses a dirichlet distribution without an explicit assumption about the biological processes (Kishino, Thorne and Bruno, 2001; Thorne and Kishino, 2002) and in the other hand BEAST uses a Yule prior and the Birth-Death prior (Yule, 1924; Rannala and Yang, 1996; Yang and Rannala, 1997). These two methods have been widely used to dating phylogenies and it is recommended to use several calibration points; however, it is not common to find a number of fossil equivalent to the number of nodes in the topology. At the moment, the minimum number of constraints to achieve correctness divergence times, has not been established.

Objectives

General objective

  • To determine what the minimum number of calibration points is, related to the number of tips of a topology (size) using the Autocorrelated rate and Random rate methods.

Specifics objectives
  • To correlate the divergence time estimated with the divergence time simulated.
  • To determine the delta of variation when the number of points increase.
  • To assess the extent of the program to reconstruct the correct phylogeny.
  • To assess if the posterior probabilities is high in nodes where the divergence time estimated is approximately the same to the divergence time simulated.
Methods

1. Simulations

The trees will be simulated considering four different number of tips: 10, 25, 50 and 100, plus an age of 31Mya at the root, with the package phytools v.0.4-56 (Revell, 2012) in the R language and it will be replicated 10 times. Based on the these trees, the sequences will be simulated using the program Seq-gen v1.3.3 (Rambaut and Grass, 1997), under the model HKY, with base frequencies 0.30[T], 0.25[C], 0.30[A], 0.15[G]; a transition-transversion rate K=5, gamma parameter alpha = 0.5 (sensu Brown and Yang, 2010) and a length of 1000 bp.


2. Molecular clock analysis and calibration

The analyzes will be performed using the AR and RR methods respectively in MULTIDIVTIME v9.25.03 (Thorne and Kishino, 2002) (MDT) y BEAST v.1.8.2 (Drummond and Rambaut, 2007; Drummond, Suchard, Xie, and Rambaut, 2012). In MDT the divergence time will be calculated after 10⁶ generations employing a correlated relaxed lognormal clock. In BEAST the analyzes will be done under an uncorrelated relaxed lognormal clock, Yule speciation process, a normal distribution for the tmrca and different number of generations depending on the authors's recommendation about the relation between it and the number of tips. For both methods the calibration points will be chosen randomly, leaving the base node of the ingroup fixed to 30 Mya. The number of points will be replicated for every tree simulation.


3. Correlation and additional comparisons

The simulated times and the outcomes of these analyzes will be compared and related by a Pearson correlation for each topology of different size and replica. The values of common nodes will be correlated and the index of error in BEAST will be calculated for each reconstruction comparing the structure of the topology with the simulated one employing the Robinson & Foulds distance implemented in the package phangorn (Schliep, 2011) in the R language. Then, I will compare and graph the values of posterior probabilities of correct nodes with the probabilities of incorrect nodes.

Bibliography

Brown, R. P., & Yang, Z. (2010). Bayesian dating of shallow phylogenies with a relaxed clock. Systematic Biology, 59(2), 119–31. http://doi.org/10.1093/sysbio/syp082

Drummond, A. J., Ho, S. Y. W., Phillips, M. J., & Rambaut, A. (2006). Relaxed phylogenetics and dating with confidence. PLoS Biology, 4(5), e88. http://doi.org/10.1371/journal.pbio.0040088


Drummond, A. J., & Rambaut, A. (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 7(1), 214. http://doi.org/10.1186/1471-2148-7-214


Drummond, A. J., Suchard, M. A., Xie, D., & Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29(8), 1969–73. http://doi.org/10.1093/molbev/mss075


Kishino, H., Thorne, J. L., & Bruno, W. J. (2001). Performance of a Divergence Time Estimation Method under a Probabilistic Model of Rate Evolution. Molecular Biology and Evolution, 18(3), 352–361. http://doi.org/10.1093/oxfordjournals.molbev.a003811


Lepage, T., Bryant, D., Philippe, H., & Lartillot, N. (2007). A general comparison of relaxed molecular clock models. Molecular Biology and Evolution, 24(12), 2669–80. http://doi.org/10.1093/molbev/msm193


Lepage, T., Lawi, S., Tupper, P., & Bryant, D. (2006). Continuous and tractable models for the variation of evolutionary rates. Mathematical Biosciences, 199(2), 216–33. http://doi.org/10.1016/j.mbs.2005.11.002


R Core Team. (2014). R: A Language and Environment for Statistical Computing. Vienna, Austria. Retrieved from http://www.r-project.org/


Rambaut, A., & Grass, N. C. (1997). Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics, 13(3), 235–238. http://doi.org/10.1093/bioinformatics/13.3.235


Rannala, B., & Yang, Z. (1996). Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference. Journal of Molecular Evolution, 43(3), 304–311. http://doi.org/10.1007/BF02338839


Rannala, B., & Yang, Z. (2007). Inferring speciation times under an episodic molecular clock. Systematic Biology, 56(3), 453–66. http://doi.org/10.1080/10635150701420643


Revell, L. J. (2012). phytools: an R package for phylogenetic comparative biology (and other things). Methods in Ecology and Evolution, 3(2), 217–223. http://doi.org/10.1111/j.2041-210X.2011.00169.x


Schliep, K. P. (2011). phangorn: phylogenetic analysis in R. Bioinformatics (Oxford, England), 27(4), 592–3. http://doi.org/10.1093/bioinformatics/btq706


Thorne, J. L., & Kishino, H. (2002). Divergence time and evolutionary rate estimation with multilocus data. Systematic Biology, 51(5), 689–702. http://doi.org/10.1080/10635150290102456


Thorne, J. L., Kishino, H., & Painter, I. S. (1998). Estimating the rate of evolution of the rate of molecular evolution. Molecular Biology and Evolution, 15(12), 1647–1657. http://doi.org/10.1093/oxfordjournals.molbev.a025892


Yang, Z., & Rannala, B. (1997). Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. Molecular Biology and Evolution, 14(7), 717–724. http://doi.org/10.1093/oxfordjournals.molbev.a025811


Yule, G. U. (1925). A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213, pp. 21–87. Retrieved from http://www.jstor.org/stable/92117