Bootstrap versus Posterior Probability… Are their values related?
The phylogenetic methods use to
reconstruct the historical relationship among organism and molecular entities
have been widely explore. The relations in the data can be observed in the
topology obtained like a mechanism of summary of the results. However, a
researcher cannot affirm that the relationships in the topology are correct
just with the visualization of the tree. In order to avoid the subjectivity,
the statistics appears to be the best way to asses if the inferences are almost
well done, in terms of confidence, reliability, support, or robustness. This
step is important because is a measure of how supported is the inference given
the model and the data. One of these measure is the non parametric bootstrap,
is a technique that uses data resampling to estimate the values. The proportion
of times a particular phylogentic relationship is observed can be interpreted
as a probability, used like a measure of “confidence in” or “support for” (Cummings et al., 2003). Other measure is the posterior
Probabilities, result of the Bayesian analysis when it coupled Markov chain
Monte Carlo. This approach determines this probability given prior probability,
likelihood function, and data. Both measures are widely used like support of
the relationships obtained in the topology (Cummings et al., 2003). Even when the theory of each measure
is independent, there are some people that claim the equivalence between it (Efron, H. and Holmes, 1996). In previous work researchers have
demonstrated that posterior probability is overestimated (Erixon, Svennblad, Britton, & Oxelman, 2003). The purpose of this work was to compare
bootstrap replicates and the posterior probability when the characters length and
the number of taxa is different.
MATERIALS AND METHODS
The reference trees were designed in
based of three different number of taxa (6, 12 and 24 terminals). Based on
these initial topologies, I obtained the simulated DNA sequences, which vary in
length (1000 and 4000), using the software Seq-gen (Rambaut and Grassly, 1997),
each of them following the evolutionary model GTR. The tree search was done using
two different analyses: Maximum Likelihood (ML), and Bayesian analysis. The
analysis of ML was performed using PhyML (Guindon et al., 2009) under the model
in which were simulated the sequences, NNI and SPR swaping and 100 bootstrap
replicates. The Bayesian analysis was performed in MrBayes v.3.2.4. (Ronquist et al., 2012) with two Markov chain Monte-Carlo,
1.00.000 generations (p reach a value
below 10-3), GTR model and burnin of 0.5. Three replicates of the
analysis were performed. The correlation and the linear regression between pairs
of topologies with the same length, taxa and common nodes were performed in the
software R 3.1.1 (R Core Team, 2014). The graphs were performed in the
same software. In order to perform the experiment, I wrote the scripts using
the command language of BASH, for Ubuntu 14.04 (see supplementary material).
RESULTS AND DISCUSSION
Figure 1. Topologies
obtained in each analysis. In the superior part the trees with 6 taxa and the
variation of the support in de common nodes (in red). And in the inferior part the trees with 12 taxa and the
support of the common nodes (in yellow).
Figure 2. Linear
regression of Bootstrap and Posterior probability for the common nodes in one of the
three replicates. r2 = 0,3667 an p-value 2.88* 10-7.
After the comparisons of the topologies
in the three replicates, in general, the support increases when more data is aggregate
to the analysis, this might suggest that the method or the result is consistent. But
when the number of taxa increases the support in the basal nodes decreases.
This results are similar to the obtain in (Cummings et al., 2003) who said that “Given sufficient data, the appropriate
likelihood model, and the appropriate search strategies, the values of Pboot and P(τ |
D) are similar”. An interestingly thing is the
apparent overestimation of the posterior probabilities in all trees and
replicates. Maximum likelihood does not reconstruct the same topology of
Bayesian and both have errors to reconstruct the original tree.
After the linear regression I obtain
a significant relation for the two support measure. The determination
coefficient was 0.366, it value in literature is considered like a moderate
relation. In most cases when Bootstrap (BS) was high near to 100 the posterior
probability (PP) was high too. But in some cases when the BS was low the PP
kept high. The same was obtained in (Erixon et al., 2003), in this work they proof the overestimation
of the posterior probability that reject the universally equivalence with
bootstrap value. These affirmations are against the theory that affirms the
similarity and equivalence of both measures. But here are those who support the idea of that PP of Bayesian methods is a
better measure of support given the high
sensitivity in the short internodes (Alfaro, Zoller, and
Lutzoni, 2003) and claim that bootstrap require more data for reach the
goal.
REFERENCES
Alfaro, M. E., Zoller, S., & Lutzoni, F. (2003). Bayes or
bootstrap? A simulation study comparing the performance of Bayesian Markov
chain Monte Carlo sampling and bootstrapping in assessing phylogenetic
confidence. Molecular Biology and Evolution, 20(2), 255–266.
Cummings, M. P., Handley, S. A., Myers, D. S., Reed, D. L., Rokas, A.,
& Winka, K. (2003). Comparing bootstrap and posterior probability values in
the four-taxon case. Systematic Biology, 52(4), 477–487.
Efron, B., Halloran, E., & Holmes, S. (1996). Bootstrap confidence
levels for phylogenetic trees. Proceedings of the National Academy of
Sciences, 93(23), 13429.
Erixon, P., Svennblad, B., Britton, T., & Oxelman, B. (2003).
Reliability of Bayesian posterior probabilities and bootstrap frequencies in
phylogenetics. Systematic Biology, 52(5), 665–673.
R Core Team. (2014). R: A Language and Environment for Statistical
Computing. Vienna, Austria.
Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A.,
Höhna, S., … Huelsenbeck, J. P. (2012). MrBayes 3.2: efficient Bayesian
phylogenetic inference and model choice across a large model space. Systematic
Biology, 61(3), 539–42. doi:10.1093/sysbio/sys029
1 comentario:
Then, may I read your findings as a strong support for bootstrap?
Publicar un comentario