Introduction
Estimating
divergence times can be performed by methods based on variation of
substitution rates. The AR method establishes
that the rates of change between ancestor and descendant are
autocorrelated and the rates can follow a log-normal distribution
with a mean equal to the parent rate (Thorne, Kishino and Painter,
1998; Kishino, Thorne and Bruno, 2001; Thorne and Kishino, 2002) or
can be determined by a non-central χ 2 distribution (Lepage et al.,
2007).
Another method used in molecular clock analysis is the RR method
which assigns random independent rates for each lineage and these
rates are drawn from a single underlying parametric distribution such
as an exponential or log-normal (Drummond et al., 2006; Rannala and
Yang, 2007; Lepage et al., 2007).
RR method has been implemented in
BEAST
the AR method has been implemented in MULTIDIVTIME.
In
terms of the programs, other difference is the priors on node ages,
where MULTIDIVTIME
uses
a dirichlet distribution without an explicit assumption about the
biological processes (Kishino, Thorne and Bruno, 2001; Thorne and
Kishino, 2002) and in the other hand, BEAST
uses a Yule prior and the Birth-Death prior (Yule, 1924; Rannala and
Yang, 1996; Yang and Rannala, 1997).
These
two methods have been widely used to dating phylogenies and it is
recommended to use several calibration points; however, it is not
common to find a number of fossil equivalent to the number of nodes
in the topology. At the moment, it has not been established what is
the minimum number of constraints to achieve correctness divergence
times.
Figure 1. Number of citations. Comparison between Bayesian inference and Maximum Likelihood programs.
Objectives
General
objective
To
determine what are the minimum number of calibration points related
to the number of tips of a topology using the Autocorrelated rate and
Random rate methods.
Specific
objectives
To
correlate the divergence time estimated with the divergence time
simulated.
To
determine the delta of variation when the number of points increase.
Methods
1.
Simulations
The
trees will be simulated considering four different number of tips:
10, 25, 50 and 100, plus an age of 31Mya at the root, with the
package phytools v.0.4-56 (Revell, 2012) in the R language and it
will be replicated 3
times. Based on the these trees, the sequences will be simulated
using the program Seq-gen v1.3.3 (Rambaut and Grass, 1997), under the
model HKY, with base frequencies 0.30[T], 0.25[C], 0.30[A], 0.15[G];
a transition-transversion rate K=5, gamma parameter alpha = 0.5 and a
length of 1000 bp.
2.
Molecular clock analysis and calibration
The
analyzes will be performed using the AR and RR methods respectively
in MULTIDIVTIME
v9.25.03 (Thorne and Kishino, 2002) (MDT) and BEAST
v.1.8.2 (Drummond and Rambaut, 2007; Drummond et al., 2012). In MDT
the divergence time will be calculated after 10⁶ generations
employing a correlated relaxed lognormal clock. In BEAST
the analyzes will be done under an uncorrelated relaxed lognormal
clock, Yule speciation process, a normal distribution for the tmrca
and different number of generations depending on the authors's
recommendation about the relation between it and the number of tips.
For both methods the calibration points will be chosen randomly,
leaving the base node of the ingroup fixed to 30 Mya. The number of
points will be replicated for every tree simulation.
3.
Correlation and additional comparisons
The
times simulated and the outcomes of these analyzes will be compared
and related by a Pearson correlation for each topology of different
size and replica. The standard
deviation among replicas was calculated and the delta between number
of points, with the aim of observe variance in the results.
Results
and Discussion
Accurate
in dating with AR and RR methods varies according to the number of
tips in the phylogeny; by increasing the number of tips the
correlation coefficient (R2)
decrease, thus the number of calibration points must be in proportion
to the number of tips.
It
is evidenced when for the two methods there is a point from which the
correlation decrease (Figure 2, a, b).
B |
Figura 2. Variation of the correlation coefficient after increase the number of calibrated nodes. A. Autocorrelated rate method. B. Random rate method. |
The
addition of the second point of calibration does not improve the
dating of the phylogeny in
approximately 90% of
the cases reviewed (Figure
2).
Similar
results were obtained by Battistuzi et
al., 2010,
they assess the effect of multiples calibrations into the credibility
Intervals (CrIs) size, and
they do not found differences in dating with one or two points. On
the other hand, the use of exact divergence times can generate this
behavior, in which differences between one and two points are not
perceived (Benton et
al., 2009).
Consequently,
the minimum number of calibration points is one, because
with just one point the value of the correlation coefficient is
greater than 0.89, in all cases. However, when I add three points,
the coeficient (R2)
was high and in almost 70% of the cases was the maximum value.
Standard
deviation among
replicas
was of 10⁻¹,
although cases of standard
deviation
of 10⁻³ were
recorded , it
demonstrated that there is no changes in
using different points and multiples
runs
of the programs. This is consistent with Hedges et
al., 2004,
who found that along many methods if the data and times are constant
the estimations will be similar. Both
methods evaluated in this work are based on Bayesian
Inference and the
priors play
a key role in the estimations of divergence time and in all the
analyses, but Brown and Yang in 2010 tested the influence of the
priors and
conclude that it affect the estimation, however, one can obtain
similar results.
Figure 3. Standard deviation among replicas for each number of tips. A. MultiDivTime. B. BEAST.
The
delta of increasing the number of points varied in the order of 10⁻¹
until 10⁻³. Thus,
demonstrating, that the differences among number of calibrations vary
in a little range.
Then,
when the position of the unique point was assessed, the position
imports, is not the same dating towards the tips that dating the base
node of the ingroup, but
the value of the coefficient of correlation is not below 0.75
in both cases (Figure 4). According to Battistuzi et
al., 2010,
there is no significant differences in the relationship of the
calibration nodes and the estimated nodes, “this suggests that the
relative positions of the two nodes do not affect the time
estimations as long as the two nodes are closely related (Battistuzi
et
al., 2010).”
In
contrast, with empirical data, the estimations depends on the
position, the method, and the dataset (Hug and Roger, 2007), idea
discussed by many other authors who assess this problem via
simulations and empirical data and found similar results among
different methods, or the parameters of the analysis.
Figure 4. Variation of the correlation coefficient according to different positions of the calibrated node. |
In
addition, to reduce the error by 5% and increase the coefficient of
correlation in larger topologies compared to that obtained in small
topologies, with a single point calibration, a number of points four
time major than the points evaluated are required. However, a
solution would be dated by small separate clades within the topology
in order to obtain a high coefficient correlation for all phylogeny.
Conclusions
Finally,
the number
of tips
influence
the estimation,
regardless the method used in estimating divergence times. Although
a point is the minimum number, when
the number
of tips
increase
with
the
error occurs
exactly the same.
References
Battistuzzi,
F. U., Filipski, A., Hedges, S. B., & Kumar, S. (2010).
Performance of relaxed-clock methods in estimating evolutionary
divergence times and their credibility intervals. Molecular
biology and evolution, 27(6),
1289-1300.
Brown,
R. P., & Yang, Z. (2010). Bayesian dating of shallow phylogenies
with a relaxed clock. Systematic
biology, 59(2),
119-131.
Donoghue,
P., Benton, M., Ziheng, Y., & Inoue, J. (2009). Calibrating
and constraining the molecular clock. Journal
of vertebrate paleontology, 29,
89A-89A.
Drummond,
A. J., Ho, S. Y., Phillips, M. J., & Rambaut, A. (2006). Relaxed
phylogenetics and dating with confidence. PLoS
biology, 4(5),
699.
Drummond,
A. J., & Rambaut, A. (2007). BEAST: Bayesian evolutionary
analysis by sampling trees. BMC
evolutionary biology, 7(1),
214.
Drummond,
A. J., Suchard, M. A., Xie, D., & Rambaut, A. (2012). Bayesian
phylogenetics with BEAUti and the BEAST 1.7. Molecular
biology and evolution,29(8),
1969-1973.
Hedges,
S. B., Blair, J. E., Venturi, M. L., & Shoe, J. L. (2004). A
molecular timescale of eukaryote evolution and the rise of complex
multicellular life. BMC
evolutionary biology, 4(1),
2.
Hug,
L. A., & Roger, A. J. (2007). The impact of fossils and taxon
sampling on ancient molecular dating analyses. Molecular
biology and evolution, 24(8),
1889-1897.
Kishino,
H., Thorne, J. L., & Bruno, W. J. (2001). Performance of a
divergence time estimation method under a probabilistic model of rate
evolution.Molecular
Biology and Evolution, 18(3),
352-361.
Lepage,
T., Bryant, D., Philippe, H., & Lartillot, N. (2007). A general
comparison of relaxed molecular clock models. Molecular
biology and evolution,24(12),
2669-2680.
Rambaut,
A., & Grass, N. C. (1997). Seq-Gen: an application for the Monte
Carlo simulation of DNA sequence evolution along phylogenetic
trees.Computer
applications in the biosciences: CABIOS, 13(3),
235-238.
Rannala,
B., & Yang, Z. (1996). Probability distribution of molecular
evolutionary trees: a new method of phylogenetic inference. Journal
of molecular evolution, 43(3),
304-311.
Rannala,
B., & Yang, Z. (2007). Inferring speciation times under an
episodic molecular clock. Systematic
Biology, 56(3),
453-466.
Revell,
L. J. (2012). phytools: an R package for phylogenetic comparative
biology (and other things). Methods
in Ecology and Evolution, 3(2),
217-223.
Thorne,
J. L., Kishino, H., & Painter, I. S. (1998). Estimating the rate
of evolution of the rate of molecular evolution. Molecular
Biology and Evolution,15(12),
1647-1657.
Thorne,
J. L., & Kishino, H. (2002). Divergence time and evolutionary
rate estimation with multilocus data.Systematic
Biology, 51(5),
689-702.
Yang,
Z., & Rannala, B. (1997). Bayesian phylogenetic inference using
DNA sequences: a Markov chain Monte Carlo method. Molecular
biology and evolution, 14(7),
717-724.
Yule,
G. U. (1924). An introduction to the theory of statistics. Bull.
Amer. Math. Soc. 30 (1924), 465-466
No hay comentarios:
Publicar un comentario