miércoles, 10 de marzo de 2021

How the number of fossils and the type of molecular clock changes the estimation of ages

 

HOW THE NUMBER OF FOSSILS AND THE TYPE OF MOLECULAR CLOCK CHANGE THE ESTIMATION OF AGES? 

DATE: 10/03/2021

María Cristina Navas Serrano 2170058.


INTRODUCTION

In recent years, the use of DNA sequences is more common to estimate evolutionary moments of relevance, especially for the estimation of the divergence times of clades and species (Rutschmann, 2006). In 1965, Zuckerkandl and Pauling (1965), after compare the differences in the hemoglobin protein sequences of different species against the estimated ages of the species' fossils, they postulated that the differences between the DNA sequences of two species are a function of the time that they diverged apart, so they evolved at a rate constant over time, adapting to the postulates of Moto Kimura's neutral theory (Rutschmann, 2006; Bromham and Penny, 2003).

With the discovery of the molecular clock, various techniques different than a constant rate clock has been proposed. Langley and Fitch (1974) said that the evolutionary rates in primates were different from those of mammals, so a molecular clock with strict rates will be an imprecise method in some cases. Through time, it has been shown that constant rates of evolution may be the exception rather than the rule, and therefore species have different evolutionary rates, calling clocks that have variable rates “relax clocks” (Welch and Bromham, 2005).

Various techniques have been developed for the estimation of strict (constant rate clocks) and relax clocks, often using information of geological moments or the estimated ages of the fossils to calibrate the topologies. One of the simplest methods to estimate divergence times with one rate of change is the Langley-Fitch method (Langley and Fitch, 1974; Sanderson, 2003), which uses maximum likelihood to optimize the rate of substitutions in phylogenies with known branch lengths, recalculating the branch lengths and calculating the divergence times (Rutschmann, 2006). For the calculation of relax clocks, one of the methods that estimates divergence times incorporating heterogeneity in the rates is the nonparametric rate smoothing method, from the acronym NPRS, which estimates unknown divergence times at the same time as smooths the rate at which the rates change along the phylogeny, using a nonparametric function that penalizes the rates that change faster in the branches, as the rate of the tree itself changes (Sanderson, 1997, 2003; Rutschmann, 2006). A technique that combines the two mentioned methods is the penalized likelihood (PL), a semi-parametric method that uses a penalty value for smoothing, which can be estimated by methods based on the data, whose value leads to strict models if the value is high, or models with unconstrained smoothing such as NPRS if the smoothing value is low (Sanderson, 2002, 2003; Rutschmann, 2006). Other varieties of techniques also estimate divergence times by incorporating heterogeneous rates in their calculations, such as heuristic rate smoothing (AHRS), or the implementation of Bayesian models such as PHYBAYES (Rutschmann, 2006).

Through time, the increase in molecular information and DNA sequences, and the improvement of technology has allowed the development of programs that make these techniques, therefore, in this project, it will be tested the sensitivity of the estimates of divergence times depending on the number of fossils tips and depending of the type of molecular clock used, using the program r8s v1.81 (Sanderson, 2003) under the Langley-Fitch method (Langley and Fitch, 1974; Sanderson, 2003) for 6 topologies of turtles to the Pleurodira suborder, of the Pelomedusoides clade.

 

 

METHODOLOGY

Sampling

For the construction of the topologies, it was used the largest matrix of morphological characters of of extant and extinct turtles of the Pleurodira suborder currently known, made by Ferreira et al. (2018) (101 taxa x 245 characters), which was edited to select only 17 terminals with representatives of the 3 families of Pelomedusoides: Podocnemididae, Pelomedusoidea and Bothremididae. 7 terminals fossils and 10 existing terminals was chosen randomly, where it was ensured that this terminal had sufficient molecular information, including 2 outgroups: Chelus fimbriata, an outgroup of Pelomedusoides currently existing, and Proganochelys quenstedti, a fossil outgroup of Pleurodira.

For the existing terminals, a sampling of molecular information was performed in the public access database GenBank (Benson et al., 2015) of 8 loci, with the mitochondrial genes CYTB, COI, 12S and the nuclear genes RAG1, RAG2, R35, because it was possible to find sequences of this genes for the most terminals. When multiple sequences were available for a given species, the longest sequence was used. The gene sequences were individually aligned with the ClustalW algorithm (Wilm et al., 2007) in the program MEGA-X v11.0 (Kumar et al., 2018), and all the genes sequences were concatenated in an only one molecular matrix using MESQUITE v3.61 (Maddison and Maddison, 2019).  The program JmodelTest v2.1.10 (Darriba, et al. 2012) was used to infer adequate DNA substitution for the set of genes, using the Akaike information content (AIC).

The matrix of morphological characters was concatenated by hand together with the molecular matrix in a single matrix of total evidence consisted of 8493 morphological and molecular characters.

 

Phylogenetic Analyses

To see the effect of the different number of fossil terminals, that matrix with a total of 7 fossils was edited by hand to generate another 2 matrices, with 5 and 3 fossils respectively heuristic search, removing the fossil terminals Bothremys maghrebiana and Bairdemys thalassica for the matrix of 5 fossils, and Ummulisani rutgersensis and Caninemys tridentate for the matrix of 3 fossils. The fossil terminals removed were chosen randomly.

Subsequently, with the matrices, a maximum likelihood analysis in PAUP* v4.0a (Swofford, 2003) was performed, using a heuristic search of 100 replicates, applying the tree bisection and reconnection (TBR) algorithm, under the parameters of the molecular model that best adjust to the molecular data, GTR + I + G, without forcing a clock (relax clock) and forcing the clock (strict clock), obtaining only a single best tree for each one of the searches with branch lengths.


Divergence Time Analyses

For the estimation of the divergence times, all the topologies with branch lengths were analyzed in r8s v1.81 (Sanderson, 2003) where the trees made with a clock were analyzed according the parameters of the program, with parameters of ultrametric topologies using the ultrametric command, without using an algorithm that changed the branch lengths, because with this the uncalibrated ages are immediately available for the tree, and scales the times to the absolute age of one specific node in the tree (Sanderson, 2003). The trees made with the relax clock were edited to remove the length of the root, that had a value of 0, and the branch lengths of the outgroup Proganochelys quenstedti was used as the root, because PAUP roots the trees with the closest sister group, leaving a root of length 0, forcing a basal trichotomy. For fixing that error, it was used the outgroup Proganochelys quenstedti as an extra outgroup following the recommendations of the program. The topologies made with relax clock were analyzed under the Langley-Fitch method (Langley and Fitch, 1974; Sanderson, 2003), trying 3 initial points.

All topologies made with both clocks were calibrated with the fossil node of Acleistochelys maliensis with an approximate age of 60 Ma, and the topologies made with relax clock were given a root age range between 140 Ma to 160 Ma, according to the age estimates of the Pelomedusoides clade made by Ferreira et al. (2018).

All the trees resulting from the analyzes were visualized and edited in the Figtree v.1.4.4 program, and with the Ape package (Paradis et al., 2004) from R (R Core Team, 2020).

 

 

RESULTS

The best molecular model calculated by JmodelTest v2.1.10 (Darriba, et al. 2012) using the Akaike information content (AIC) was GTR with invariants and gamma (GTR + I + G).

The results of the phylogenetic analyses with maximum likelihood were one best tree of each one of the 6 searches for the different clocks.

In the topologies made with strict clock it is noted that Bairdemys thalassica is the sister taxon of the clade compound of taxa Carbonemys cofrinii to Podocnemis lewyana (Figure 1). In the topologies made with relax clock Bairdemys thalassica is the sister taxon of Caninemys tridentate, and they compound one single clade, sister of Podocnemis expansa (Figura2). It can also be seen the different relations of the family Podocnemididae. In the topologies with strict clock, the clade compound for Podocnemis expansa and Podocnemis unifilis is the sister clade of the clade compound of Podocnemis erythrocephala and Podocnemis lewyana. It is also observed that Erymnochelys madagascariensis and Peltocephalus dumerilianus made one single clade.

The results of the divergence time analysis for the trees made with a strict clock showed different time intervals with the different number of fossils used.

The topologies made forcing a strict clock shows an interval of time from 19.22 Ma, to 126.09 Ma with 7 fossils, form 18.31 Ma to 120.37 Ma with 5 fossils and 14.84 Ma to 97.97 Ma with 3 fossils (Figure 1), showing time intervals smallest that the shows the topologies made with relax clock and different estimations of time for all the nodes. All the estimations keep the age of 60 Ma of the node of Acleistochelys maliensis.

The topologies made with relax clock shows an interval of time from 11.49 Ma to 157.83 Ma with 7 fossils, from 13.78 Ma to 163.1 Ma with 5 fossils, and from 12.57 Ma to 163.48 Ma with 3 fossils (Figure 2), showing similar ages for the root, the node of Chelus fimbriata, and the nodes of the clade of the Pelomedusidae family: Pelomedusa subrufa, Pelusios castanoides and Pelusios castaneus. All the estimations keep the age of 60 Ma of the node of Acleistochelys maliensis, and the age range for the root of 140 Ma to 160 Ma.


Figure 1. Divergence time resulting from topologies made with strict clock and different number of fossils.


Figure 2. Divergence time resulting from topologies made with relax clock and different number of fossils.

 


DISCUSSION

It is possible to observe that the estimations made with a greater number of fossils are those that have the highest values ​​in the interval of ages of the estimation of divergence times (Figure 1 and 2). This can be explained by the extra information provided by the fossils in these analyzes, when a greater number of fossils, like in the estimation with 7 fossils, allows to have a more precise result then in both estimations with 3 fossils, when the most fossils allowing an increase in the total length of branches of the trees (Schwartz and Mueller, 2010).

 Of all the topologies, the topologies made with relax clock were the ones that showed an estimated maximum value of the time interval closest to the mean of the interval given to the root from 140 Ma to 160 Ma (Figure 2), allowing us to infer that r8s has a better behavior with topologies with a relax clock, that is, not ultrametric topologies, since initially it permits a more complete calibration, letting us to constrain complete nodes as it made with the root node; receive more information on the ages of the fossil nodes for dating, in addition to complementing the estimation using the Langley-Fitch model (Langley and Fitch, 1974; Sanderson, 2003) to re-estimate the branch lengths and ages, giving a parametric validation to the age estimations and a more precise calculation compared to that performed by r8s for the ultrametric trees (Figure 1), where it doesn’t use an estimation model of divergence ages, it just scales the actual branch lengths of the given topologies in ages, in relation with only one the age that is given to calibrate (Sanderson, 2002, 2003).

However, through times, the methods in which fossil evidence is used for estimates of divergence, and how fossils affect the estimates, have been controversial, in the estimations made, the node selected to calibrate the estimates, Acleistochelys maliensis, may have influenced the results (Ho and Phillips, 2009; Lukoschek et al. 2012, Saladin et al. 2017). The position of this fossil node in each topology, and the absence of the information of certain fossils could have generated changes in the calibrations, since this specific fossil in reality may not represent a specific node, but rather a point in the branch, and however, certain fossils that allowed more precise estimates may have been eliminated, such as Ummulisani rutgersensis and Caninemys tridentate in the topologies made with strict clock (Figure 1), or being affected by the precision of the estimation method, when the estimations made with 5 and 3 fossils of the topologies made with relax clock have close time intervals, so the information of the fossils removed from an estimation to other, Ummulisani rutgersensis and Caninemys tridentate, didn’t represent informative nodes (Figure 2) (Lukoschek et al. 2012, Saladin et al. 2017).

However, time-of-divergence analyzes can present various errors that can affect estimates, such as poor selection of the evolutionary model, poor molecular and morphological sampling and, as mentioned before, errors in the selection of calibration nodes (Ho and Phillips, 2009; Lukoschek et al. 2012, Tamura et al, 2012).

It is concluded then, that the divergence time estimates are sensitive to the number of fossils contained in the topology, and to the type of clock with which the topology is made, therefore, different numbers of fossils generate different age intervals, where the topologies with the highest number of fossils tend to be the most precise in the analysis, and the topologies that subsequently underwent a time-of-divergence analysis under the Langley-Fitch model (Langley and Fitch, 1974; Sanderson, 2003) were even more precise (Figure 1 and 2).

 

 

BIBLIOGRAPHY

·         Bromham, L. & Penny, D. (2003) The modern molecular clock. Nature Reviews Genetics, 4, 216–224.

·         Darriba, D., Taboada, G. L., Doallo, R., Posada, D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nature Methods, 9(8): 772.

·         Ferreira, G., Bronzati, M., Langer, M., y Sterli, J. (2018) Phylogeny, biogeography and diversification patterns of side-necked turtles (Testudines: Pleurodira). R. Soc. open sci, 5, 171-773.

·         Ho, S. Y. W., and Dechene, S. (2014). Molecular‐clock methods for estimating evolutionary rates and timescales. Molecular Ecology, 23: 5947-5965.

·         Ho, S., and Phillips, M. J. (2009). Accounting for Calibration Uncertainty in Phylogenetic Estimation of Evolutionary Divergence Times. Systematic Biology, 58(3): 367–380.

·         Kumar, S., Stecher, G., Li, M., Knyaz, C., y Tamura, K. (2018). MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Molecular Biology and Evolution, 35(6), 1547–1549.

·         Langley, C. H., and Fitch, W. M. (1974) An examination of the constancy of the rate of molecular evolution. J Mol Evol, 3, 161–177.

·         Larkin, M. A., Blackshields, G., Brown, NP., Chenna, R., McGettigan, P. A., McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., y Higgins, D. G. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948.

·         Lukoschek, V., Keogh, J. S., Avise, J. (2012). Evaluating Fossil Calibrations for Dating Phylogenies in Light of Rates of Molecular Evolution: A Comparison of Three Approaches. Systematic Biology, 61(1): 22.

·      Maddison, W. P. y D.R. Maddison. (2019). Mesquite: a modular system for evolutionary analysis. Version 3.61.  http://www.mesquiteproject.org.

·     Paradis, E., Claude, J., y Strimmer, K. (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics, 20, 289–290.

·         R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

·      Rutschman, F. (2006). Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times. Diversity Distrib, 12, 35–48.

·         Saladin, B., Leslie, A. B., Wüest, R. O. et al. (2017). Fossils matter: improved estimates of divergence times in Pinus reveal older diversification. BMC Evol Biol, 17, 95.

·         Sanderson, M. J. (2002) Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Molecular Biology and Evolution, 19, 101–109.

·         Sanderson, M. J. (2003) r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics, 19, 301–302.

·         Sanderson, M.J. (1997) A nonparameteric approach to estimating divergence times in the absence of rate constancy. Molecular Biology and Evolution, 14, 1218–1231.

·         Sauquet, H. (2013). A practical guide to molecular dating. Comptes Rendus Palevol, 12(6): 355-367.

·         Schwartz, R. S., and Mueller, R. L. (2010). Branch length estimation and divergence dating: estimates of error in Bayesian and maximum likelihood frameworks. BMC evolutionary biology, 10, 5.

·         Swofford, D. L. (2003) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods) [Computer Program]. Version 4. Sunderland: Sinauer Associates

·         Tamura, K., Battistuzzi, F. U., Billing-Ross, P., Murillo, O., Filipski, A., and Kumar, S. (2012). Estimating divergence times in large molecular phylogenies. National Academy of Sciences, 109(47): 19333-19338.

·       Welch, J. J., and Bromham, L. (2005) Molecular dating when rates vary. Trends Ecol Evol, 20(6): 320-7.

·       Zuckerkandl, E. & Pauling, L. (1965) Evolutionary divergence and convergence in proteins. Evolving genes and proteins (ed. By V. Bryson and H. Vogel), pp. 97–166. Academic Press, New York.

No hay comentarios: