martes, 22 de febrero de 2011

MAXIMUM LIKELIHOOD AND MAXIMUM PARSIMONY UNDER A SIMPLE MODEL.

Jiménez- Silva C. L.
Universidad Industrial de Santander.
Laboratorio de Sistemática y Biogeografía

INTRODUCTION

Stochastic models for nucleotide substitution are becoming increasingly important as a foundation for inferring phylogenetic trees from genetic sequence data. Such models allow for tree reconstruction through either maximum likelihood-based approaches or the _tting of transformed functions of the data to trees (see Swo_ord et al. (Swo_ord et al., 1996) for a recent survey). The models are also useful for analysing the performance of other, more conventional tree reconstruction methods, which are not explicitly based on such models, such as the popular maximum parsimony method (Fitch, 1971). Maximum parsimony (MP) is a popular technique for phylogeny reconstruction. However, MP is often criticized as being a statistically unsound method and one that fails to make explicit an underlying ‘‘model’’ of evolution (Steel and Penny, 2000). Parsimony does not make explicit assumptions about the evolutionary process. Some authors argue that parsimony makes no assumptions at all and that, furthermore, phylogenies should ideally be inferred without invoking any assumptions about the evolutionary process (Wiley 1981). Others point out that it is impossible to make any inference without a model; that a lack of explicit assumptions does not mean that the method is ‘assumption-free’ as the assumptions may be merely implicit; that the requirement for explicit specification of the assumed model is a strength rather than weakness of the model-based approach since then the fit of the model to data can be evaluated and improved (e.g. Felsenstein 1973).

METHODS

In the analysis were simulated sequences of 1000 bp under JC69 model and all branches on the tree are assumed to have the same length, were 10 replicates of each simulation under Seq-Gen v1.3.2 program (Rambaut & Grassly, 1997). For different topologies, 3 taxa, 4 taxa, 6 taxa and 12 taxa. The sequences generated were analyzed in parsimony using TNT (Goloboff et. al, 2008) program and Winclada and NONA (Goloboff, 1999; Nixon, K. C. 1999) program to 3 taxa. Also, the sequences were analyzed in Maximum likelihood using PhyML (Guindon & Gascuel, 2005), to check an equivalence between parsimony and likelihood under a particular model, this is JC69. The nucleotide models evaluated were JC69 in PhyML, For each simulation was performed the same procedure and finally the topologies generated were compared with Tree C program (Arias & Miranda-Esquivel) assuming on equal an exactly equal nodes. It obtains eventually a total of 200 comparisons were made and the process was automated by constructing scripts in bash.

RESULTS

When, I compared the phylogenetic reconstructions, data showed equivalence between parsimony and likelihood under a JC69 model. Equivalence here means that the most parsimonious tree and the ML tree under the said model are identical in every possible data set. But, this result was only present with the data set of a few terminals, ie 3 and 4 taxa.

JC model was assumed for comparison because I refer to it, as the fully symmetric model since it makes no distinction between any of the character states and being with each sequence being 1000 nt long. As a first approximation, there is no selection at any of the sites, and therefore it is more ‘‘parsimonious’’ to assume one common mechanism for all sites rather than 1000 different mechanisms, one for each site.

That parsimony and likelihood trees used for working with the JC model, sometimes Called the Neyman model with four states. It assumes rates of evolution on the branch of the tree each freely Vary from site to site. In this case, we have some underlying constraints on the type of substitution model (ie, Jukes-Cantor type), but no constraints on the edge parameters from site to site. This is even more general than the type of approach considered by Olsen (see Swofford et al. 1996, p. 443) in which the rate at which a site evolves can vary freely from site to site, but the ratios of the edge lengths are equal across the sites. (Steel and Penny, 2000). On the methodology used in this work, a free parameters model was assumed. For this purpose, was assigned the custom model option in Phyml. When the custom model option was selected, also it is possible to Give to the program a user-defined nucleotide frequency distribution at equilibrium, where calculated parameters are given by the data. Based on this, it is proposed that this type of Underlying model Almost Certainly is too flexible, because it allows many new parameters for each edge. It might be regarded as the model one might start with if one knew virtually nothing about any common underlying mechanism

linking the evolution of different characters on a tree. (Steel and Penny, 2000).

For the data set of 6 and 12 taxa the results were different; between both methods they were obtained under three equal nodes. This difference of equal nodes based on the number of terminals, in concordance with Hendy and Penny (1989) showed that with four species and binary characters evolving under the clock, parsimony is always consistent in recovering the tree, although ML and parsimony do not appear to be equivalent under the model. With five or more species evolving under the clock, it is known that parsimony can be inconsistent in estimating the trees (Hendy and Penny 1989; Zharkikh and Li 1993). Thus it is not equivalent to likelihood.

While, you can consider parsimony and likelihood to be equivalent under the JC69 model, those studies often used small trees with three to six taxa. The cases for much larger trees are not known. However, it appears easier to identify cases of inconsistency of parsimony on large trees than on small trees (Kim 1996; Huelsenbeck and Lander 2003), suggesting that likelihood and parsimony are in general not equivalent on large trees.

REFERENCES

Arias J. S., Miranda-Esquivel D. M. 2007. Tree C.

W. M. Fitch. Toward de_ning the course of evolution: minimum change for a speci_c tree topology.Systematic Zoology, 20:406{416, 1971.

D. L. Swo_ord, G. J. Olsen, P. J. Waddell, and D. M. Hillis. Phylogenetic inference. In D. M. Hillis, C. Moritz, and B. K. Marble, editors, Molecular Systematics, chapter 11, pages 407{514. Sinauer Associates, 2nd edition, 1996.

Golobo, P., 1999. NONA (No Name) ver. 2. Published by the author, Tucuman, Argentina

Felsenstein, J. 1973b. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst. Zool. 22:240–249.

Hendy, M. D. and Penny, D. 1989. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38:297–309.

Huelsenbeck, J. P. and Lander, K. M. 2003. Frequent inconsistency of parsimony under a simple model of cladogenesis. Syst Biol 52:641–648.

Kim, J. 1996. General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45:363–374.

Nixon, K. C. 1999. Winclada (BETA) ver. 0.9.9 PUBLISHED BY THE AUTHOR, ITHACA, NY. I have become weary of Clados generated trees being published without citation. Please cite the program

M. Steel and D. Penny, Parsimony, likelihood and the role of models in molecular phyloge-netics. Molecular Biology and Evolution 17 839{850 (2000).

Wiley, E. O. 1981. Phylogenetics. The Theory and Practice of Phylogenetic Systematics. John Wiley & Sons, New York.

Zharkikh, A. and Li,W. -H. 1993. Inconsistency of the maximum parsimony method: the case of five taxa with a molecular clock. Syst. Biol. 42:113–125.