Introduction
The
phylogenetic reconstruction through Bayesian Inference (BI)
represents a significant advance whit advantages like the possibility
of incorporate prior information, easy
interpretation of the results, computational efficiency and involve
evolution complex models (Huelsenbeck et al., 2001;
Ronquist & Huelsenbeck, 2003). Also, due to it uses the
likelihood function, it shares its efficiency and consistency whit
Maximum Likelihood (Huelsenbeck et al., 2002).
Nevertheless, BI has its critical site, since it is limited due its inferences always remain within the support of its priors. Therefore, what we would call “the truth” it must be partially believed before it can be made known, thus, the appropriate prior specifications becomes crucial in this type of analysis (Gelman & Shalizi, 2012; Schoot et al., 2014).
For this reason, I am going to estimate the sensitivity of BI, in terms of accuracy and precision, to changes in the criterion of the bases frequencies prior. The hypotheses that I am going to test are: (1)Using fixed frequencies like prior, BI is more precise than when we use Dirichlet like bases frequencies prior, and (2) BI estimations using fixed frequencies are more accuracy than estimations using Dirichlet like prior.
Methodology
Using
RStudio I simulated 3 topologies of 12 and 48 terminals, for a total
of 6 different topologies, which I used to simulate in Seq-Gen 108
matrices with the HKY and JC models, and 3 character sequence lengths
(100-500-2500 ). For each particular matrix I made three replicas.
Then, I analysed all matrices with BI using MrBayes, once for fixed
frequencies and once with a flat dirichlet of 1,1,1,1. Due to my
interest was focused on the topology and not on the branch length, I
calculated the Robinson-Foulds distance to indirectly estimate
precision and accuracy in the recovery of trees, under the two cases
that I analyzed.
Scripts with detailed methodology, parameters that I used and software specifications, can be found in the following link: https://github.com/IndiraMG/Trabajo_final_BioComparada-I
Results
and Discussion
Figure
1 and 2, show us very similar trends in the mean and standard
deviation of RF in all the scenarios we observed. This probably
happens due to the use of a flat Dirichlet value = (1,1,1,1), which
is appropriate if we want to estimate these parameters from the data,
assuming that we do not have prior knowledge about their values
(Ronquist et al., 2011 ). Therefore,
it is logical to think that the program estimated the value of the
prior
and this was very close to the values that had been set in in
fixed frequencies.
On
the other hand, the effect of the size of the matrix influenced the
precision in which both estimates generated the topologies but when
model used was HKY, precision increased from 500 characters. contrary
to JC model which was more variable, therefore less precise. It
probably happens because JC is the most simple model (Matthew et
al., 2005) and
that what could reduce the probability of a minor adjustment to
the data, decreasing its precision. About the increasing in precision
when character also increase, is important highlight that the changes
in the number of taxa did not have any impact, but to be able to
conclude something more general, more differences in this aspect are
needed to be able to contrast them.
The
Oxford dictionary defines accuracy like "the degree to which the
result of a measurement, calculation, or specification conforms to
the correct value or a standard.", and
the problem with the determination of the accuracy of an estimator,
is the ignorance of the reference value with which a result should be
compared (Schoot et
al., 2014). In this
case, it is not possible to determine the accuracy of BI because, due
to a small number of replicas, there is no way to guarantee that
topologies estimated in either case (Dirichlet or fixed frequencies),
will be influenced by the generation of matrices in seq-gen. For this
reason, I can not test my second hypothesis without committing a
large bias when comparing topologies recovered against initial
topologies.
Conclusion
With this work I did not generate a response to which prior is more accurate or precise since with
the data that was available there was no difference between the stimates using Dirichlet or
fixed frequencies. But it is clear that both are consistent because their precision increases when we work with large data sets.
References
-Gelman,
A., & Shalizi, C.R. (2012). Philosophy and the practice of
Bayesian statistics. British Journal of Mathematical and
Statistical Psychology, 66(1), 8-38.
-Huelsenbeck,
J. P., Larget, B., Miller, R. E., & Ronquist, F. (2002).
Potential applications and pitfalls of Bayesian inference of
phylogeny. Systematic biology, 51(5), 673-688.
-Huelsenbeck,
J.P., Ronquist, F., Nielsen, R. and Bollback, J.P. (2001). Bayesian
inference of phylogeny and its impact on evolutionary
biology, Science, 294, 2310–2314.
-Matthew,
S., Edward, S., Andrew, J. R., (2005) Likelihood, Parsimony, and
Heterogeneous Evolution, Molecular
Biology and Evolution,
22(5),1161–1164.
-Schoot,
R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J. and
Aken, M. A. (2014). A Gentle Introduction to Bayesian Analysis:
Applications to Developmental Research. Child Dev, 85,
842-860.
-Ronquist,
F., Huelsenbeck, J., & Teslenko, M. (2011). Draft MrBayes version
3.2 manual: tutorials and model summaries. Distributed
with the software from http://brahms. biology. rochester.
edu/software. html.
-
The Oxford Dictionary. Taken from: https://en.oxforddictionaries.com
No hay comentarios:
Publicar un comentario