HOW THE NUMBER OF FOSSILS AND THE TYPE OF MOLECULAR
CLOCK CHANGE THE ESTIMATION OF AGES?
DATE:
10/03/2021
María Cristina Navas Serrano – 2170058.
INTRODUCTION
In recent years, the use of DNA sequences is more common
to estimate evolutionary moments of relevance, especially for the estimation of
the divergence times of clades and species (Rutschmann, 2006). In 1965,
Zuckerkandl and Pauling (1965), after compare the differences in the hemoglobin
protein sequences of different species against the estimated ages of the
species' fossils, they postulated that the differences between the DNA
sequences of two species are a function of the time that they diverged apart,
so they evolved at a rate constant over time, adapting to the postulates of
Moto Kimura's neutral theory (Rutschmann, 2006; Bromham and Penny, 2003).
With the discovery of the molecular clock, various
techniques different than a constant rate clock has been proposed. Langley and
Fitch (1974) said that the evolutionary rates in primates were different from
those of mammals, so a molecular clock with strict rates will be an imprecise
method in some cases. Through time, it has been shown that constant rates of
evolution may be the exception rather than the rule, and therefore species have
different evolutionary rates, calling clocks that have variable rates “relax
clocks” (Welch and Bromham, 2005).
Various techniques have been developed for the
estimation of strict (constant rate clocks) and relax clocks, often using
information of geological moments or the estimated ages of the fossils to
calibrate the topologies. One of the simplest methods to estimate divergence
times with one rate of change is the Langley-Fitch method (Langley and Fitch,
1974; Sanderson, 2003), which uses maximum likelihood to optimize the rate of
substitutions in phylogenies with known branch lengths, recalculating the
branch lengths and calculating the divergence times (Rutschmann, 2006). For the
calculation of relax clocks, one of the methods that estimates divergence times
incorporating heterogeneity in the rates is the nonparametric rate smoothing
method, from the acronym NPRS, which estimates unknown divergence times at the
same time as smooths the rate at which the rates change along the phylogeny,
using a nonparametric function that penalizes the rates that change faster in
the branches, as the rate of the tree itself changes (Sanderson, 1997, 2003;
Rutschmann, 2006). A technique that combines the two mentioned methods is the
penalized likelihood (PL), a semi-parametric method that uses a penalty value
for smoothing, which can be estimated by methods based on the data, whose value
leads to strict models if the value is high, or models with unconstrained
smoothing such as NPRS if the smoothing value is low (Sanderson, 2002, 2003;
Rutschmann, 2006). Other varieties of techniques also estimate divergence times
by incorporating heterogeneous rates in their calculations, such as heuristic
rate smoothing (AHRS), or the implementation of Bayesian models such as
PHYBAYES (Rutschmann, 2006).
Through time, the increase in molecular information
and DNA sequences, and the improvement of technology has allowed the
development of programs that make these techniques, therefore, in this project,
it will be tested the sensitivity of the estimates of divergence times
depending on the number of fossils tips and depending of the type of molecular
clock used, using the program r8s v1.81 (Sanderson, 2003) under the Langley-Fitch
method (Langley and Fitch, 1974; Sanderson, 2003) for 6 topologies of turtles
to the Pleurodira suborder, of the Pelomedusoides clade.
METHODOLOGY
Sampling
For the construction of the topologies, it was used the
largest matrix of morphological characters of of extant and extinct turtles of
the Pleurodira suborder currently known, made by Ferreira et al. (2018) (101 taxa x 245 characters), which was edited to
select only 17 terminals with representatives of the 3 families of Pelomedusoides:
Podocnemididae, Pelomedusoidea and Bothremididae. 7 terminals fossils and 10
existing terminals was chosen randomly, where it was ensured that this terminal
had sufficient molecular information, including 2 outgroups: Chelus fimbriata, an outgroup of
Pelomedusoides currently existing, and Proganochelys
quenstedti, a fossil outgroup of Pleurodira.
For the existing terminals, a sampling of molecular information
was performed in the public access database GenBank (Benson et al., 2015) of 8
loci, with the mitochondrial genes CYTB, COI, 12S and the nuclear genes RAG1,
RAG2, R35, because it was possible to find sequences of this genes for the most
terminals. When multiple sequences were available for a given species, the
longest sequence was used. The gene sequences were individually aligned with
the ClustalW algorithm (Wilm et al., 2007) in the program MEGA-X v11.0 (Kumar
et al., 2018), and all the genes sequences were concatenated in an only one molecular
matrix using MESQUITE v3.61 (Maddison and Maddison, 2019). The program JmodelTest v2.1.10 (Darriba, et
al. 2012) was used to infer adequate DNA substitution for the set of genes, using
the Akaike information content (AIC).
The matrix of morphological characters was
concatenated by hand together with the molecular matrix in a single matrix of
total evidence consisted of 8493 morphological and molecular characters.
Phylogenetic Analyses
To see the effect of the different number of fossil
terminals, that matrix with a total of 7 fossils was edited by hand to generate
another 2 matrices, with 5 and 3 fossils respectively heuristic search,
removing the fossil terminals Bothremys
maghrebiana and Bairdemys thalassica
for the matrix of 5 fossils, and Ummulisani
rutgersensis and Caninemys tridentate
for the matrix of 3 fossils. The fossil terminals removed were chosen randomly.
Subsequently, with the matrices, a maximum likelihood analysis
in PAUP* v4.0a (Swofford, 2003) was performed, using a heuristic search of 100
replicates, applying the tree bisection and reconnection (TBR) algorithm, under
the parameters of the molecular model that best adjust to the molecular data, GTR
+ I + G, without forcing a clock (relax clock) and forcing the clock (strict
clock), obtaining only a single best tree for each one of the searches with
branch lengths.
Divergence Time Analyses
For the estimation of the divergence times, all the
topologies with branch lengths were analyzed in r8s v1.81 (Sanderson, 2003)
where the trees made with a clock were analyzed according the parameters of the
program, with parameters of ultrametric topologies using the ultrametric command, without using an
algorithm that changed the branch lengths, because with this the uncalibrated
ages are immediately available for the tree, and scales the times to the
absolute age of one specific node in the tree (Sanderson, 2003). The trees made
with the relax clock were edited to remove the length of the root, that had a
value of 0, and the branch lengths of the outgroup Proganochelys quenstedti was used as the root, because PAUP roots
the trees with the closest sister group, leaving a root of length 0, forcing a
basal trichotomy. For fixing that error, it was used the outgroup Proganochelys quenstedti as an extra
outgroup following the recommendations of the program. The topologies made with
relax clock were analyzed under the Langley-Fitch method (Langley and Fitch,
1974; Sanderson, 2003), trying 3 initial points.
All topologies made with both clocks were calibrated
with the fossil node of Acleistochelys
maliensis with an approximate age of 60 Ma, and the topologies made with
relax clock were given a root age range between 140 Ma to 160 Ma, according to
the age estimates of the Pelomedusoides clade made by Ferreira et al. (2018).
All the trees resulting from the analyzes were
visualized and edited in the Figtree v.1.4.4 program, and with the Ape package
(Paradis et al., 2004) from R (R Core Team, 2020).
RESULTS
The best molecular model calculated by JmodelTest
v2.1.10 (Darriba, et al. 2012) using the Akaike information content (AIC) was
GTR with invariants and gamma (GTR + I + G).
The results of the phylogenetic analyses with maximum
likelihood were one best tree of each one of the 6 searches for the different
clocks.
In the topologies made with strict clock it is noted
that Bairdemys thalassica is the
sister taxon of the clade compound of taxa Carbonemys
cofrinii to Podocnemis lewyana
(Figure 1). In the topologies made with relax clock Bairdemys thalassica is the sister taxon of Caninemys tridentate, and they compound one single clade, sister of
Podocnemis expansa (Figura2). It can
also be seen the different relations of the family Podocnemididae. In the
topologies with strict clock, the clade compound for Podocnemis expansa and Podocnemis
unifilis is the sister clade of the clade compound of Podocnemis erythrocephala and Podocnemis
lewyana. It is also observed that Erymnochelys
madagascariensis and Peltocephalus
dumerilianus made one single clade.
The results of the divergence time analysis for the
trees made with a strict clock showed different time intervals with the different
number of fossils used.
The topologies made forcing a strict clock shows an
interval of time from 19.22 Ma, to 126.09 Ma with 7 fossils, form 18.31 Ma to
120.37 Ma with 5 fossils and 14.84 Ma to 97.97 Ma with 3 fossils (Figure 1),
showing time intervals smallest that the shows the topologies made with relax
clock and different estimations of time for all the nodes. All the estimations
keep the age of 60 Ma of the node of Acleistochelys
maliensis.
The topologies made with relax clock shows an interval
of time from 11.49 Ma to 157.83 Ma with 7 fossils, from 13.78 Ma to 163.1 Ma
with 5 fossils, and from 12.57 Ma to 163.48 Ma with 3 fossils (Figure 2),
showing similar ages for the root, the node of Chelus fimbriata, and the nodes of the clade of the Pelomedusidae family:
Pelomedusa subrufa, Pelusios castanoides and Pelusios castaneus. All the estimations
keep the age of 60 Ma of the node of Acleistochelys
maliensis, and the age range for the root of 140 Ma to 160 Ma.
Figure 1. Divergence time resulting from topologies made with strict clock and different number of fossils.
Figure 2. Divergence time resulting from
topologies made with relax clock and different number of fossils.
DISCUSSION
It is possible to observe that the estimations made
with a greater number of fossils are those that have the highest values in
the interval of ages of the estimation of divergence times (Figure 1 and 2).
This can be explained by the extra information provided by the fossils in these
analyzes, when a greater number of fossils, like in the estimation with 7
fossils, allows to have a more precise result then in both estimations with 3
fossils, when the most fossils allowing an increase in the total length of
branches of the trees (Schwartz and Mueller, 2010).
Of all the topologies, the topologies made with relax clock
were the ones that showed an estimated maximum value of the time interval
closest to the mean of the interval given to the root from 140 Ma to 160 Ma (Figure
2), allowing us to infer that r8s has a better behavior with topologies with a
relax clock, that is, not ultrametric topologies, since initially it permits a
more complete calibration, letting us to constrain complete nodes as it made with
the root node; receive more information on the ages of the fossil nodes for
dating, in addition to complementing the estimation using the Langley-Fitch
model (Langley and Fitch, 1974; Sanderson, 2003) to re-estimate the branch
lengths and ages, giving a parametric validation to the age estimations and a
more precise calculation compared to that performed by r8s for the ultrametric
trees (Figure 1), where it doesn’t use an estimation model of divergence ages, it
just scales the actual branch lengths of the given topologies in ages, in
relation with only one the age that is given to calibrate (Sanderson, 2002,
2003).
However, through times, the methods in which fossil
evidence is used for estimates of divergence, and how fossils affect the
estimates, have been controversial, in the estimations made, the node selected
to calibrate the estimates, Acleistochelys
maliensis, may have influenced the results (Ho and Phillips, 2009;
Lukoschek et al. 2012, Saladin et al. 2017). The position of this
fossil node in each topology, and the absence of the information of certain
fossils could have generated changes in the calibrations, since this specific
fossil in reality may not represent a specific node, but rather a point in the
branch, and however, certain fossils that allowed more precise estimates may
have been eliminated, such as Ummulisani
rutgersensis and Caninemys tridentate
in the topologies made with strict clock (Figure 1), or being affected by the
precision of the estimation method, when the estimations made with 5 and 3
fossils of the topologies made with relax clock have close time intervals, so
the information of the fossils removed from an estimation to other, Ummulisani rutgersensis and Caninemys tridentate, didn’t represent informative
nodes (Figure 2) (Lukoschek et al. 2012, Saladin et al. 2017).
However, time-of-divergence analyzes can present
various errors that can affect estimates, such as poor selection of the
evolutionary model, poor molecular and morphological sampling and, as mentioned
before, errors in the selection of calibration nodes (Ho and Phillips, 2009;
Lukoschek et al. 2012, Tamura et al, 2012).
It is concluded then, that the divergence time
estimates are sensitive to the number of fossils contained in the topology, and
to the type of clock with which the topology is made, therefore, different
numbers of fossils generate different age intervals, where the topologies with
the highest number of fossils tend to be the most precise in the analysis, and
the topologies that subsequently underwent a time-of-divergence analysis under
the Langley-Fitch model (Langley and Fitch, 1974; Sanderson, 2003) were even
more precise (Figure 1 and 2).
BIBLIOGRAPHY
·
Bromham, L. & Penny, D. (2003) The modern molecular clock. Nature Reviews Genetics, 4, 216–224.
·
Darriba, D., Taboada, G. L., Doallo, R., Posada, D. 2012. jModelTest 2:
more models, new heuristics and parallel computing. Nature Methods, 9(8):
772.
·
Ferreira, G., Bronzati, M., Langer, M., y Sterli, J. (2018) Phylogeny,
biogeography and diversification patterns of side-necked turtles (Testudines:
Pleurodira). R. Soc. open sci, 5,
171-773.
·
Ho, S. Y. W., and Dechene, S. (2014). Molecular‐clock methods for
estimating evolutionary rates and timescales. Molecular Ecology, 23: 5947-5965.
·
Ho, S., and Phillips, M. J. (2009). Accounting for Calibration
Uncertainty in Phylogenetic Estimation of Evolutionary Divergence Times. Systematic Biology, 58(3): 367–380.
·
Kumar, S., Stecher, G., Li, M., Knyaz, C., y Tamura, K. (2018). MEGA X:
Molecular Evolutionary Genetics Analysis across Computing Platforms. Molecular Biology and Evolution, 35(6), 1547–1549.
·
Langley, C. H., and Fitch, W. M. (1974) An examination of the constancy
of the rate of molecular evolution. J Mol
Evol, 3, 161–177.
·
Larkin, M. A., Blackshields, G., Brown, NP., Chenna, R., McGettigan, P.
A., McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R., Thompson,
J.D., Gibson, T.J., y Higgins, D. G. (2007). Clustal W and Clustal X version
2.0. Bioinformatics, 23, 2947-2948.
·
Lukoschek, V., Keogh, J. S., Avise, J. (2012). Evaluating Fossil Calibrations
for Dating Phylogenies in Light of Rates of Molecular Evolution: A Comparison
of Three Approaches. Systematic Biology,
61(1): 22.
· Maddison, W. P. y D.R. Maddison. (2019). Mesquite: a modular system for
evolutionary analysis. Version 3.61. http://www.mesquiteproject.org.
· Paradis, E., Claude, J., y Strimmer, K. (2004). APE: analyses of phylogenetics
and evolution in R language. Bioinformatics,
20, 289–290.
·
R Core Team (2020). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. URL
https://www.R-project.org/.
· Rutschman, F. (2006). Molecular dating of phylogenetic trees: A brief
review of current methods that estimate divergence times. Diversity Distrib, 12, 35–48.
·
Saladin, B., Leslie, A. B., Wüest, R. O. et al. (2017). Fossils matter:
improved estimates of divergence times in Pinus reveal older diversification. BMC Evol Biol, 17, 95.
·
Sanderson, M. J. (2002) Estimating absolute rates of molecular evolution
and divergence times: a penalized likelihood approach. Molecular Biology and Evolution, 19, 101–109.
·
Sanderson, M. J. (2003) r8s: inferring absolute rates of molecular
evolution and divergence times in the absence of a molecular clock. Bioinformatics, 19, 301–302.
·
Sanderson, M.J. (1997) A nonparameteric approach to estimating
divergence times in the absence of rate constancy. Molecular Biology and Evolution, 14, 1218–1231.
·
Sauquet, H. (2013). A practical guide to molecular dating. Comptes Rendus Palevol, 12(6): 355-367.
·
Schwartz, R. S., and Mueller, R. L. (2010). Branch length estimation and
divergence dating: estimates of error in Bayesian and maximum likelihood
frameworks. BMC evolutionary biology,
10, 5.
·
Swofford, D. L. (2003) PAUP*: Phylogenetic Analysis Using Parsimony
(*and Other Methods) [Computer Program]. Version 4. Sunderland: Sinauer
Associates
· Tamura, K., Battistuzzi, F. U., Billing-Ross, P., Murillo, O., Filipski,
A., and Kumar, S. (2012). Estimating divergence times in large molecular phylogenies. National Academy of
Sciences, 109(47): 19333-19338.
· Welch, J. J., and Bromham, L. (2005) Molecular dating when rates vary. Trends Ecol Evol, 20(6): 320-7.
· Zuckerkandl, E. & Pauling, L. (1965) Evolutionary divergence and
convergence in proteins. Evolving genes and proteins (ed. By V. Bryson and H.
Vogel), pp. 97–166. Academic Press, New York.