sábado, 9 de diciembre de 2017

How to make a good phylogenetic analysis?

Definitions such as homology, synapomorphy, phylogenetic relationships, homoplasy, have always been the focus of almost all discussions about phylogeny. But it is worth mentioning and discussing the eternal discussion of which method is the best, and why. These last heated discussions always end in divided opinions and even conflicts among researchers. This is why it is important to define the three most used methods; Parsimony, Maximum Likelihood and Bayesian Inference.

Currently there are few articles in which the effectiveness of the three most commonly used methods is compared (Merl et al, 2005), however, that does not prevent this conflict from arising in the conversations, in which different opinions are always based on the philosophy of different methods, even defend them according to the time in which each has been proposed and the time it has been used.

However, from the results which of the three results better? According to Merl et al, (2005) there is very little consistency in the results of the three methods. However, according to the results obtained in their analysis the methods of Maximum Likelihood and Bayesian Inference do not present many differences, nevertheless, their analysis is based on comparing not only the methods, but the software in which they do the runs, so their results may be biased more toward the software than the method.

Because the parsimony method does not allow to see lengths of branches since it provides ultrametric trees, that is, all the branches have the same length (Kim, 1996), it is not useful in cases that want to take into account quantity of change or time. Likewise, it should be noted that despite having equal branch lengths, this is not a method that is exempt from the long-path attractors in this case (Swofford, 2001). And since the length of the branches is the same, it is even more difficult to determine which groups are affected by this phenomenon. However, it is important to highlight the ease of use of the method and its speed.

In the case of Maximum Likelihood and Bayesian Inference, Berli (2006) proposes that the Bayesian inference method should be preferred over Maximum Likelihood because in its population size analyzes, the Maximum Likelihood method has difficulties in recovering the expected values when the data set is very variable, however, Bayesian inference with the same data set is closer to the "true" values for all scenarios (PP> 95%). However, it should be noted that Berli (2006) suggests that other methods should be taken into account despite having obtained better results with Bayesian Inference.

However, Lemmon et al, (2009) compare the two methods with respect to ambiguous or missing data proposing that both methods are affected in the same way with respect to this type of data. Smith et al, (1987) propose according to their results that Bayesian should be used on Maximum Likelihood because this method is less sensitive to the modification of its parameters, nevertheless it highlights the importance of having well-defined priors as this could affect the analysis not only in its "veracity" but in the time spent computationally.

REFERENCES:

- Berli, P. (2006). Comparison of Bayesian and maximum-likelihood inference of population genetic parameters. Bioinformatics Vol. 22 no. 3, pages 341–345
- Kim , J. 1996. General inconsistency conditions for maximum parsimony: Effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45:363±374.
- Lemmon et al, (2009). The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference. Syst. Biol. 58(1):130–145.
- Merl et al, (2005). Comparison of Bayesian, maximum likelihood and parsimony methods for detecting positive selection. University of California Santa Cruz.
- Smith et al, (1987) A Comparison of Maximum Likelihood and Bayesian Estimators for the Three-parameter Weibull Distribution. Appl. Statist. 36, No. 3, pp. 358-369.
- Swofford et al, (2001). Bias in Phylogenetic Estimation and Its Relevance to the Choice between Parsimony and Likelihood Methods. Syst. Biol. 50(4):525-539.


No hay comentarios: