## lunes, 3 de marzo de 2008

### On Bayesian Inference and Support

Bayesian Inference

The Bayesian inference (BI) in Systematics (Rannala & Yang, 1996; Yang & Rannala, 1997) is a most frecuently used methods in phylogenetic analysis in the last decade. The speed of BI in comparision with other methods such as Parsimony and Maximum Likelihood (ML). Further, the posteriori probability is a attractive concept to show certainty of the results in an analysis. However, BI posses several problems and mistakes in the phylogenetic Systematics (Alfaro et al, 2003; Erixon et al, 2003; Wheeler & Pickett, 2008).

A problem in BI is the absence of convergence in the results when new data are adhered to data set. This parameter, the consistency (same groups recovered in different run and data set), is a adequate parameter to estimate the "efficiency" of a phylogenetic method. The "efficiency" is measured as the amount of nodes recovered to adhere more data in an statistical analysis.

Evenly, the a priori assumptions in BI (priors) has been debated because its subjectivity and inffluence in the results (Huelsenbeck et al, 2001; Rannala, 2002). The priors are parameters with probability and characteristics a priori. This assumptions are treated as random variables. So, the rate of substitution, the substitution model, rate of evolution, and the prior probability of the initial trees are asigned before the analysis.

In the common works, priors are estimated frequently using previous studies or subjectively. In the same way, is posible that the posteriori probability be influenced by the asumptions a priori, namely,the bayesian calculation is sensible to the estimate of priors.

Finally, the result of BI is a "topology" that show groupings of clades supposedly. Nevertheless, this “topologies” are not phylogenetic trees, but they are a representations of groups with "high probability" of to be sampled. So, the phylogenetic relationship are not recovered in the bayesian analysis. Too is interesting the frequentist vision of Bayesian Inference, where the more probable grouping of taxa is the correct. This approach is inappropriated because the hypothesis in Phylogenetics Systematics must be corroborated (in Popperian sense), a approach more adequate to scientific objective of Systematics.

Support

Some authors (Alfaro et al, 2003; Douady et al, 2003; Cummings et al, 2003) claims that the results of BI is equall to parametric bootstrapping (however, this statement is not very strong!). Nevertheless, others authors states that the BI has advantages on parametric bootstrap because its easy intepretation and speed. The parametric bootstrapping (one kind of support in phylogenetic systematics) generates new data from the initial topology and makes a new search using the new data and model. While the a posteriori probability of a bayesian topology is based in the more high frecuencies of the nodes recovered. So, the Bi is not similar to parametric bootstrapping, although yours results be equals in some studies.

Other methods of support as no parametric Bootstrap (Felsenstein, 1985) Bremer support (Bremer, 1994), Jackknife (1996), and Bremer support (Relative Bremer support) sensu Goloboff & Farris (2001), are considered approaches to estimate the support of nodes in a topology. However, an ideal measure of support must can estimate support using the evidence in favor and against of the resulting groups (nodes). So, i believe that the measure of support more adequate in phylogenetic Systematics is the Bremer support sensu Goloboff & Farris (2001). Furthermore, the measures of support that uses evidence in favor only can be considered as frecuentist, in the same way as BI.

Bibliography

• Alfaro, M. E., Zoller, S., & Lutzoni, F. (2003) Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol. Biol. Evol., 20, 255-266.

• Bremer, K. (1994) Branch support and tree stability. Cladistics, 10, 295-304.

• Cummings, M. P., Handley, S. A., Myers, D. S., Reed, D. L., Rokas, A., & Winka, K. (2003) Comparing Bootstrap and Posterior Probability Values in the Four-Taxon Case. Systematic Biology, 52, 477-487.

• Douady, C. J., Delsuc, F., Boucher, Y., Doolittle, W. F., & Douzery, E. J. P. Comparison of Bayesian and Maximum Likelihood Bootstrap Measures of Phylogenetic Reliability. Mol. Biol. Evol., 20, 248-254.

• Erixon, P., Svennblad, B., Britton, T., & Oxelman, B. (2003) Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Systematic Biology, 52, 665-673.

• Farris, J. S., Albert, V. A., Kallersjo, M., Lipscomb, D., & Kluge, A. G. (1996) PARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR-JOINING. Cladistics, 12, 99-124.

• Goloboff, P. A., & Farris, J. S. (2001) Methods for quick consensus estimation. Cladistics, 17, 26-34.

• Huelsenbeck, J. P., Ronquist, F., Nielsen, R., & Bollback, J. P. (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294, 2310–2314.

• Rannala, B. (2002) Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51, 754-760.

• Rannala, B., & Yang, Z. (1996) Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference.

• Wheeler, W. C., & Pickett, K. M. (2008) Topology-Bayes versus Clade-Bayes in Phylogenetic Analysis. Mol. Biol. Evol., 25, 447-453.

• Yang, Z., & Rannala, B. (1997) Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo Method. Mol. Biol. Evol., 14, 717-724.

#### 2 comentarios:

must dijo...

It is very difficult to study Markov chain topic. Not many good reference textbooks to study Markov chain.

I use Markov Chains and Stochastic Stability to study. This is good reference textbook.

Do you have any other good Markov Chains related textbooks recommend?

Andy ^_^

Salva dijo...

Hola Sergio

Quiero hacer una aclaración, no se si es porque malinterprete la primera frase, pero análisis bayesiano NO es más rápido que parsimonia, programas de parsimonia como TNT, analizan muchos más rearreglos que programas de Bayes (como Mr.Bayes) en menor tiempo, además de 'converger' más rápido a valores optimos. Claro, si el punto de comparación es PAUP, la cosa es diferente...

Para mi siempre ha sido un misterio como un set de datos simulado, puede mostrar el apoyo de un set de datos real... eso es lo que hace el Boot_parametrico, que creo, es una de las peores ideas que han visto la investigación sobre el apoyo en filogenética.

Más en la idea de mirar evidencia a favor y en contra, la encuentras en Goloboff et al. 2003 (Cladistics 19: 324), uno de los mejores papers sobre el tema de soportes con remuestreos.