domingo, 28 de agosto de 2011
Phylogeny of Tabaninae: A critique to Abu El-Hassan et al. (2010)
Materials and methods
under implicit weights
Concavity value | Average of the shared consensus nodes |
1 | 0,5789 |
2 | 0,5789 |
3 | 0,6316 |
4 | 0,6842 |
5 | 0,6316 |
6 | 0,6316 |
7 | 0,6316 |
8 | 0,7895 |
9 | 0,8421 |
10 | 0,7895 |
EVALUATION OF THE GEOGRAPHIC STRUCTURE IN DENGUE VIRUS TYPE 1 FROM A PHYLOGENETIC AND BIOGEOGRAPHIC APPROACH
METHODS
Phylogenetic analysis of 50 DENV-1 E gene sequences were assess from the Bayesian inference criterion using BEAST v1.6.2 program (Drummond & Rambaut, 2007), under a General Time Reversible model of nucleotide substitution (Rodriguez et al.,1990) with gamma-distributed rate variation and a proportion of invariable sites (GTR + G + I) were selected and two runs of 4 chains were run for ten millions of generations. Sequences were sampled in American counties, including islands in the Atlantic and Pacific Oceans
Finally, the geographic patterns were evaluated following the method of track compatibility by Craw (1988a, 1989a). The areas used were those postulated in this work and the biotic components of Latin America and the Caribbean compiled by Morrone (2004). under the level of large regions and provinces.
RESULTS AND DISCUSSION
The phylogenetic relationchips from American sequences seems to be structured by geographics patterns. According with this, five areas were proposed corresponding to Pacific, Caribbean, southern South America, central América and Northern south America. These components were determined following the geographic information available to each viral isolated. Intuitively, central and Northern south America were taken as independent unities.
Figure 1. Maximum clade credibility tree in Bayesian analysis of E
gene sequences representing Latin America strains. Posterior probabilities are shown for key nodes.Figure 2. Phylogeographic patterns between genotypes and postulated areas in Dengue virus type 1
Tracks compatibility analysis resulted in a clique (based in regions) representing a pattern that related Mexican transition area with Neotropical Region, which is congruent with the relationship between SAN and CA areas in phylogeographic analysis. This is probably due to the magnitud of the areas which includes a higher proportion of distributions and strains that are distribuited in intermediate areas. Areas delimited as Provinces by Morrone (2004) and phylogeographic areas delimited here, do not showed compatible traks.
Figure 3. Traks compatibility analysis. a) Areas proposed in this study. Biotic components of Latin America and the Caribbean b) Provinces c) Regions.
CONCLUSION
Phylogenetic and Biogeographic analysis in dengue virus can reflect a similar geographic pattern however is necessary to know the level in which both approaches can be congruent. In this study, Central America and northern South America form a large unit that corresponds to the clique found in the track compatibility analysis, which supports the close relationship between the Mexican transition area and the Neotropical region. Obviously, the use of geopolitical units in the assessment of geographical structure in shaping the phylogenetic relationships dengue is not the most accurate and dengue virus strains behave as a large dispersive population connecting large areas in America.
REFERENCES
Carvalho SE, Martin DP, Oliveira LM, Ribeiro BM, Nagata T (2010) Comparative analysis of American Dengue virus type 1 full-genome sequences. Virus Genes 40: 60–66.
CRAW, R. C. 1988. Continuing the synthesis between panbiogeography, p
hylogenetic systematics and geology as illustrated by empirical studies on the biogeography of New Zealand and the Cha tham Islands. Systematic Zoology 37: 291-310.
CRAW, R. C. 1989a. New Zealand biogeography: A panbiogeographic approach. New Zealand Journal of Zoology 16: 527-547
Drummond AJ & Rambaut A (2007) "BEAST: Bayesian evolutionary analysis by sampling trees." BMC Evolutionary Biology 7, 214
Samuel, A. R., Knowles, N. J. 2001. Foot-and-mouth disease type O viruses exhibit genetically and geographically distinct evolutionary lineages (topotypes). Journal of General Virology 74, 2281-2285.
domingo, 22 de mayo de 2011
Measure support branches
Gualdrón-Diaz J. C.
Once it has obtained cladograms; it is important to know how strong is the evidence that supports a node. There are different ways to interpret the support (Stability, confidence levels and reliability) and different methods to asses it; the most popular are the resampling methods such as Bootstrap and Jackknife and those linked to relative optimality values such as Bremer support (Wheeler, 2010). For this must be a clear distinction in some terms. According Goloboff et al. (2003); Brower (2006, 2010) support and stability are logically different, support for a given branch in a tree is a measure of the net amount of evidence that favors the appearance of that branch in a most parsimonious topology and stability is the persistence of a given branch in the face of the addition, deletion, or reweighting of characters, taxa, or both from the data matrix as in bootstrap and jackknife approaches. Likewise, strong statistical assumptions are necessary to interpret jacknife or bootstrap as confidence levels (Felsenstein, 1985). Another way to measure the support for individual branches of a cladogram is Bremer support, also referred as the “decay index”(Bremer, 1994). It is measured by comparing the fit of the data to optimal and suboptimal trees. This support measure two different aspects of group support. The absolute bremer estimated amount of favorable evidence (Bremer, 1994) and relative bremer (Goloboff and Farris, 2001) estimated the ratio between favorable and contradictory evidence (Goloboff et al., 2003). Both support and stability are attributes have proven to be particularly tricky to measure in a direct manner, due to the complexity of character interactions in homoplastic data (Goloboff and Farris, 2001). Nevertheless, these measure serves as a means to discern groups that are plausible from those that are dubious,and can act as a guide to the generation of additional data to refine and improve the hypothesis (Brower, 2006).
Jackknifing and bootstrapping sometimes produce incoherent results. Uninformative characters and characters irrelevant to the monophyly of a group can influence the values of support for Jacknife and Bootstrapp, to solve this Farris et al. (1996) proposed to assign equal probabilities of deletion to individual characters. Similarly Goloboff et al. (2003) suggest a Poisson-based sampling regime for bootstrapping that also alleviates this problem. One clear advantage of the jackknife over the bootstrap is that the values on branches are less affected when there are characters with homoplasy(Freudenstein and Davis, 2010). Another wrong conclusion with regard to support both for Jackknife and Bootstrapp is when some characters have differents weights or costs, producing either under or overestimations of the actual support (Goloboff et al., 2003).This influence of the weight can be eliminated by symmetric resampling, done that the probability of increasing the weight of the character equals the probability of decreasing it (Goloboff et al., 2003); so, given the above, this explains the differences in the error produced by jackknife and bootstrap.
Bremer support rather than being an estimate based on pseudoreplicated subsamples of the data (like bootstrapping and jackknifing) is a statistical parameter of a particular data set and thus is not dependent on the data matching a particular assumed distribution; an advantage of bremer support that it never hits a maximum value (such as 100%), and continues to increase as character support for a particular branch in the tree accumulates (Brower, 2006). A defect of that method is that it does not always take into account the relative amounts of evidence contradictory and favorable to the group. This problem is diminished if the support for the group is calculated as the ratio between the amounts of favorable and contradictory evidence (Goloboff and Farris, 2001). This method is known as relative bremer and its potential advantages are that their values vary between 0 and 1 and they provide an approximate measure of the amount of favorable/contradictory evidence. Under weighting methods the bremer supports may be hard to interpret, but the relative supports for different weighting strengths are directly comparable (Goloboff and Farris, 2001). A disadvantage of the relative supports is that the values of in different pairs of trees must be calculated carefully.
An important extension of bremer support was the discovery by Baker and DeSalle (1997) is Partitioned Branch Support (PBS). The PBS value for a particular branch for a given data partition is determined by subtracting the length of the data partition on the MP tree(s) from the length of the data partition on the MP anticonstraint tree(s) for that branch (Brower, 2006). Thus, given partition may contribute positively, be neutral or conflict with the weight of the evidence that supports a particular branch in combined analysis.PBS allows exploration of partition incongruence within a total evidence framework (Brower et al., 1996). This ability to localize incongruence to a single partition for a single branchs has the potential to reveal both interesting evolutionary processes, such as selection on a particular gene. Partitioning data is a potentially useful way to explore incongruence of signal among characters from different sources (Brower, 2006). PBS has the advantage that parameters calculated are using the complete data matrix and may be for any combination of partitions. One of the problems with PBS is that it is sensitive to missing data, and can shift dramatically among partitions as missing data are filled into the matrix (Brower, 2006). Much of the critism of support measures is focused upon their employment of reanalyses of data subsets or partitions as though they were separate sources of evidence, but as have pointed out Goloboff et al. (2003), no measure of clade quality yet developed is immune to certain cases conceivable.
References
RH Baker and R DeSalle. Multiple sources of character information and the phylogeny of hawaiian drosophilids. 1997.
Kare Bremer. Branch support and tree stability. Cladistics, 10(3):295–304, 1994. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1994.tb00179.x. URL http://dx.doi.org/10.1111/j.1096-0031.1994.tb00179.x.
A. V. Z. Brower, R. DeSalle, and A. Vogler. Gene trees, species trees, and systematics: A cladistic perspective. Annual Review of Ecology and Systematics, 27(1):423–450, 1996. doi: 10.1146/annurev.ecolsys.27.1.423. URL http://www.annualreviews.org/doi/abs/10.1146/annurev.ecolsys.27.1.423.
Andrew V. Z. Brower. The how and why of branch support and partitioned branch support, with a new index to assess partition incongruence. Cladistics, 22(4):378–386, 2006. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2006.00113.x. URL http://dx.doi.org/10.1111/j.1096-0031.2006.00113.x.
Andrew V. Z. Brower. Stability, replication, pseudoreplication, support and consensus a reply to brower. Cladistics, 26(1):112–113, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00319.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00319.x.
James S. Farris, Victor A. Albert, Mari KAllersjA¶, Diana Lipscomb, and Arnold G. Kluge.Parsimony jackknifing outperforms neighbor-joining. Cladistics, 12(2):99–124, 1996. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1996.tb00196.x. URL http://dx.doi.org/10.1111/j.1096-0031.1996.tb00196.x.
Joseph Felsenstein. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 39(4):783–791, 1985. ISSN 00143820. doi: 10.2307/2408678. URL http://dx.doi.org/10.2307/2408678.
John V. Freudenstein and Jerrold I. Davis. Branch support via resampling: an empirical study. Cladistics, 26(6):643–656, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00304.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00304.x.
Pablo A. Goloboff and James S. Farris. Methods for quick consensus estimation. Cladistics, 17(1):S26–S34, 2001. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2001.tb00102.x. URL http://dx.doi.org/10.1111/j.1096-0031.2001.tb00102.x.
Pablo A Goloboff, James S Farris, Mari Kallersj, Bengt Oxelman, M J Ramirez, and Claudia A Szumik. Improvements to resampling measures of group support. Cladistics, 19(4):324–332, 2003. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2003.tb00376.x. URL http://dx.doi.org/10.1111/j.1096-0031.2003.tb00376.x.
Ward C. Wheeler. Distinctions between optimal pected support. Cladistics, 26(6):657–663, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00308.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00308.x.
Branch Support: confidence, stability, credibility?
One way of assessing whether a clade present in a phylogenetic reconstruction really is part of the true configuration in the phylogeny, is evaluating its support, which may be established by estimating confidence intervals based on sampling methods (Bootstrap and Jackknife), and Bremer support, based on the length difference of trees as a stability measure. Even if, this approaches are not independent of the search strategy given that they are sensitive to its effectiveness (Freudenstein and Davis, 2010). Therefore a highly weighted clade, not necessarily means it is real, maybe is just the kind of response that fits to the resources used (e. g. search strategy). Posterior probabilities in Bayesian analyses have been used as a probabilistic measure of support (e. g. Goloboff et. al, 2003; Pickett and Randle, 2005), because it quantifies credibility, how likely a certain clade is to be correct, given the data, model and priors (Huelsenbeck et al., 2002). Comparision between Bayesian and nonoparametric Bootstrapping was proposed by Efron et al. (1996), where the bootstrap confidence level can be thought as the assessments of error for the estimated tree. However, posterior probabilities are sensitive to the prior for internal branch lengths (Yang Z., Rannala 2005), and are significantly higher than corresponding nonparametric bootstrap frequencies when the models used for analyses are underparameterized (Goloboff et. al, 2003). Despite have been several the attempts to come close the different approaches under certain conditions, this approaches are not freely assessable under all phylogenetic criteria given some restrictions not only methodological but conceptual.
This last, although the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. In the same way, the above does not mean that a clade is or is not monophyletic depending on its support, this just points out the certainty with which you can find a particular node in the topology. If this node are not in the Bootstrap consensus, it could means there is a polytomy due to multiple nodes’ resolutions maybe by incongruence between characters. Mort et al (2000), compared Bootstrap and Jackknife, their findings show the relation between the bootstrap’s values and the deletion proportion chosen in Jackknife. However, in favor of Jackknife, it has been proposed as a rapid and efficient method to identify strongly supported clades (Farris et al. 1996) and the assigment of equal deletion probabilities to characters, it reduces the problem of competition bewteen informative and noninformative characters (Freudenstein and Davis, 2010).
References
- Efron B., Halloran E., and Holmes S. 1996. Bootstrap confidence levels for phylogenetic trees. Recherche, 93(14):7085–7090.
- Erixon, P B. Svennblad, T. Britton y B. Oxelman. 2003. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylo- genetics. Systematic Biology 52: 665-673
- Farris, J.S., 1996. Jac. Computer Program Distributed by the Author. Moleky-larsystematiska laboratoriet, Naturhistoriska riksmuseet, Stockholm, Sweden.
- Felsestein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783–791.
- Freudenstein J. V., Davis J., I. 2010 Branch support via resampling: an empirical study. Cladistics, 26:1–14.
- Goloboff P. A., Farris J. S., K Mari, J Ram, and C. A. Szumik. Cladistics Improvements to resampling measures of group support. Cladistics, 19:324–332, 2003. doi: 10.1016/S0748-3007(03)00060-4.
- Grant, T., Kluge, A. G. 2003. Data exploration in phylogenetic inference: scientific, heuristic, or neither. Cladistics 19, 379–418.
- Grant, T., Kluge A. G. 2008. Cladistics Clade support measures and their adequacy. Cladistics, 24:1051–1064, 2008.
- Holmes S. 2003. Bootstrapping Phylogenetic Trees :. October, 18(2):241–255, 2003.
- Huelsenbeck, J. P B. Larget , R. E. Miller, and F. Ronquist. 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51:673–688.
- Mort, M.E., Soltis, P Soltis, D.E., Mabry, M.L., 2000. Comparison of three. S., methods for estimating internal support on phylogenetic trees. Syst. Biol. 49, 160–171.
- Pattengale N. D., Masoud Alipour, Olaf R. P. Bininda-emonds, Bernard Memoret, and Alexandros Stamatakis. 2009. How Many Bootstrap Replicates Are Necessary ? (i):184–200.
- Pickett, C.P Randle. 2005. Strange bayes indeed: uniform topological priors imply non-uniform clade priors, Molecular Phylogenetics and Evolution 34.
- Sanderson, M.J., 1995. Objections to bootstrapping: a critique. Syst. Biol. 44, 299–320.
- Wilkinson, M., Lapointe, F.-J., Gower, D.J., 2003. Branch lengths and support. Syst. Biol. 52, 127–130.
- Yang Z., Rannala B. 2005. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54(3), 455-70.
viernes, 22 de abril de 2011
Bayesian Inference in Phylogenetic Analysis
The growing peak of Bayesian methods in phylogenetic inference in the last two decades is the result of the implementation of Markov chain Monte Carlo algorithms (MCMC; which include Monte Carlo method) and Metropolis-Hastings algorithm, on the estimation of posterior probability distributions and the exploration of more parameter-rich evolutionary models (Nylander, et, al. 2004). Statistically, the Bayesian inference calculates the probability that a hypothesis be true given the posterior probability based on priors probabilities and the likelihood under each hypothesis. Here the probability is used to represent uncertainty about the phylogeny and in the parameters of the model and not as expected frequency of occurrence like in classical or frequentist statistics (Yang, 2006). Under a philosophical context, the Bayesian and Maximum Likelihood approaches are inference methods, which are based in calculation of probabilities of evolutionary transformations of characters (for example, according to a nucleotide substitution model) instead of evaluation of possible synapomorphies, also any homologous characters (apomorphic and plesiomorphic) can be used for inferring phylogenies and they assume that evolution may be reticulated and not always dichotomous (Lukhtanov, 2010).
At present, is possible to identify six key topics related with Bayesian approach in phylogenetic reconstruction: (1) Integration of complex evolutionary models, (2) heterogeneity across the sites (3) heterogeneity across the data in analysis of combined data (Huelsenbeck et, al. 2001), (4) computational efficiency (Ronquist & Huelsenbeck, 2003), (5) posterior probability for a tree or clade has an easy interpretation(Yang, 2006) and (6) incorporation of priors values. The development of Bayesian MCMC algorithms is the associated cause with the increase in computational efficiency making possible to analyze more complex and realistic evolutionary models. This does not mean that parameters rich models are appropriated for all data set because we may make the mistake of overparameterization and more complex evolutionary models are associated with more topological uncertainty (Nylander, et, al. 2004). However, the interesting is that it open up the possibility of exploring more realistic models and complex (but a real model can be simple) to recognize heterogeneity across the sites and combined data.
In term of computational efficiency, although Maximum Likelihood analysis has gained ground with the improvement and development of new algorithms implemented in software such PhyML (Guindon and Gascuel, 2003) and RAxML (Stamatakis, 2007) and now is possible perform moderately fast and accurate bootstrapping to determine confidence, in Bayesian inference the interpretation of posterior probability is easier, it is the probability that the tree or clade is correct given the data, model and priors (Yang, 2006), whereas the interpretation of bootstrap although tend to be more conservative has been controversial specially with model misspecification and when the signal is only detectable at some sites (Ronquist and Deans, 2009). Despite the above, Bayesian posterior probability according to Yang (2006) can be spuriously high due to lack of convergence, poor mixing, misspecification of the likelihood substitution model and misspecification and sensitivity of the priors. About this last, it allows to incorporate prior knowledge about a particular hypothesis or to use vague or uninformative priors when the little information is available or do not want to build the analysis on any previous (Ronquist et, al. 2008). Since, priors can be subjective, is convenient to assess the influence of the priors on the posterior probability. Pickett and Randle (2005) pointed out that with a uniform prior the posterior probability of a clade depends on the both the size on the clade and the number of species.
Regarding to the lack of convergence and poor mixing in Bayesian approach them leading to spuriously high support for the trees visited in the chain (Yang & Rannala, 2005), multiple long chains from different starting points could be a possible solution when the data set is very large. However, Bayesian inference to face to another problem, this is inconsistency. Under Bayes´ criterion the tree topologies are estimated without estimating branch lengths this is integrating branch lengths for a given tree topology over a distribution of possible values (Goloboff & Pol, 2005), then statistical consistency can be lose and Bayesian inference is biased in favor of topologies that group long branches together, even when the true model and priors distributions of evolutionary parameters over a group of phylogenies are known. According to Kolaczkowski and Thornton (2009), this bias becomes more severe as more data are analyzed and sequences sites evolve heterogeneously and is relatively weak when the true model is simple. So, Bayesian inference is less efficient and less robust to the use of an incorrect evolutionary model than ML. Despite the above, Bayesian MCMC inference is a promising approach to phylogenetic analysis and computational biology that has been in progress with the development of algorithms, methods and software, nevertheless more research is needed on how to incorporate previous knowledge into appropriate informative priors and how to deal with complex models and large data sets without falling into inconsistency.
References
Goloboff P., Pol D. Parsimony and Bayesian phylogenetics. 2005. In: Albert V. (ed). Parsimony, phylogeny, and genomics. Oxford University Press. 210-266.
Guindon S., Gascuel O. 2003. PhyML: " A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 2003 52(5):696-704.
Huelsenbeck J. P., Ronquist F., Rasmus N., Bollback J. P. 2001. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science. 294 (5550) : 2310-2314
Kolaczkowski B., Thornton J. W. 2009. Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics. PLoS One. 4:e7891. doi:10.1371/journal.pone.0007891
Lukhtanov V. A. 2010. From Haeckel’s Phylogenetics and Hennig’s Cladistics to the Method of Maximum Likelihood: Advantages and Limitations of Modern and Traditional Approaches to Phylogeny Reconstruction. Entomological Review. Vol. 90(3) : 299-310
Nylander J. A. A., Ronquist F., Huelsenbeck J. P., Nieves-Aldrey J. L. 2004. Bayesian Phylogenetic Analysis of Combined Data. Syst Biol. 53(1) : 47-67
Pickett K. M., Randle C. P. 2005. Strange Bayes indeed: Uniform topological priors imply non-uniform clade priors. Mol. Phyl. Evol. 34:203-211
Ronquist F., Deans A. R. 2009.Bayesian Phylogenetics and Its Influence on Insect Systematics. Systematic Entomology 35 (3): 349-378.
Ronquist F., van der Mark P., Huelsenbeck J. P. 2009. Bayesian phylogenetic analysis using MrBayes. In: Lemey P., Salemi M., and Anne-Mieke V. (eds.) The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing. Cambridge University Press. 219-236.
Stamatakis, A., Blagojevic, F., Nikolopoulos, D., Antonopoulos, C. 2007 Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell. The Journal of VLSI Signal Processing. 48 : 271–286
Yang Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford, England
Yang, Z., and B. Rannala. 2005. Branch-length prior influences Bayesian posterior probability of phylogeny. Systematic Biology 54: 455-470.
martes, 22 de febrero de 2011
MAXIMUM LIKELIHOOD AND MAXIMUM PARSIMONY UNDER A SIMPLE MODEL.
Jiménez- Silva C. L.
Universidad Industrial de Santander.
Laboratorio de Sistemática y Biogeografía
INTRODUCTION
Stochastic models for nucleotide substitution are becoming increasingly important as a foundation for inferring phylogenetic trees from genetic sequence data. Such models allow for tree reconstruction through either maximum likelihood-based approaches or the _tting of transformed functions of the data to trees (see Swo_ord et al. (Swo_ord et al., 1996) for a recent survey). The models are also useful for analysing the performance of other, more conventional tree reconstruction methods, which are not explicitly based on such models, such as the popular maximum parsimony method (Fitch, 1971). Maximum parsimony (MP) is a popular technique for phylogeny reconstruction. However, MP is often criticized as being a statistically unsound method and one that fails to make explicit an underlying ‘‘model’’ of evolution (Steel and Penny, 2000). Parsimony does not make explicit assumptions about the evolutionary process. Some authors argue that parsimony makes no assumptions at all and that, furthermore, phylogenies should ideally be inferred without invoking any assumptions about the evolutionary process (Wiley 1981). Others point out that it is impossible to make any inference without a model; that a lack of explicit assumptions does not mean that the method is ‘assumption-free’ as the assumptions may be merely implicit; that the requirement for explicit specification of the assumed model is a strength rather than weakness of the model-based approach since then the fit of the model to data can be evaluated and improved (e.g. Felsenstein 1973).
METHODS
In the analysis were simulated sequences of 1000 bp under JC69 model and all branches on the tree are assumed to have the same length, were 10 replicates of each simulation under Seq-Gen v1.3.2 program (Rambaut & Grassly, 1997). For different topologies, 3 taxa, 4 taxa, 6 taxa and 12 taxa. The sequences generated were analyzed in parsimony using TNT (Goloboff et. al, 2008) program and Winclada and NONA (Goloboff, 1999; Nixon, K. C. 1999) program to 3 taxa. Also, the sequences were analyzed in Maximum likelihood using PhyML (Guindon & Gascuel, 2005), to check an equivalence between parsimony and likelihood under a particular model, this is JC69. The nucleotide models evaluated were JC69 in PhyML, For each simulation was performed the same procedure and finally the topologies generated were compared with Tree C program (Arias & Miranda-Esquivel) assuming on equal an exactly equal nodes. It obtains eventually a total of 200 comparisons were made and the process was automated by constructing scripts in bash.
RESULTS
When, I compared the phylogenetic reconstructions, data showed equivalence between parsimony and likelihood under a JC69 model. Equivalence here means that the most parsimonious tree and the ML tree under the said model are identical in every possible data set. But, this result was only present with the data set of a few terminals, ie 3 and 4 taxa.
JC model was assumed for comparison because I refer to it, as the fully symmetric model since it makes no distinction between any of the character states and being with each sequence being 1000 nt long. As a first approximation, there is no selection at any of the sites, and therefore it is more ‘‘parsimonious’’ to assume one common mechanism for all sites rather than 1000 different mechanisms, one for each site.
That parsimony and likelihood trees used for working with the JC model, sometimes Called the Neyman model with four states. It assumes rates of evolution on the branch of the tree each freely Vary from site to site. In this case, we have some underlying constraints on the type of substitution model (ie, Jukes-Cantor type), but no constraints on the edge parameters from site to site. This is even more general than the type of approach considered by Olsen (see Swofford et al. 1996, p. 443) in which the rate at which a site evolves can vary freely from site to site, but the ratios of the edge lengths are equal across the sites. (Steel and Penny, 2000). On the methodology used in this work, a free parameters model was assumed. For this purpose, was assigned the custom model option in Phyml. When the custom model option was selected, also it is possible to Give to the program a user-defined nucleotide frequency distribution at equilibrium, where calculated parameters are given by the data. Based on this, it is proposed that this type of Underlying model Almost Certainly is too flexible, because it allows many new parameters for each edge. It might be regarded as the model one might start with if one knew virtually nothing about any common underlying mechanism
linking the evolution of different characters on a tree. (Steel and Penny, 2000).
For the data set of 6 and 12 taxa the results were different; between both methods they were obtained under three equal nodes. This difference of equal nodes based on the number of terminals, in concordance with Hendy and Penny (1989) showed that with four species and binary characters evolving under the clock, parsimony is always consistent in recovering the tree, although ML and parsimony do not appear to be equivalent under the model. With five or more species evolving under the clock, it is known that parsimony can be inconsistent in estimating the trees (Hendy and Penny 1989; Zharkikh and Li 1993). Thus it is not equivalent to likelihood.
While, you can consider parsimony and likelihood to be equivalent under the JC69 model, those studies often used small trees with three to six taxa. The cases for much larger trees are not known. However, it appears easier to identify cases of inconsistency of parsimony on large trees than on small trees (Kim 1996; Huelsenbeck and Lander 2003), suggesting that likelihood and parsimony are in general not equivalent on large trees.
REFERENCES
Arias J. S., Miranda-Esquivel D. M. 2007. Tree C.
W. M. Fitch. Toward de_ning the course of evolution: minimum change for a speci_c tree topology.Systematic Zoology, 20:406{416, 1971.
D. L. Swo_ord, G. J. Olsen, P. J. Waddell, and D. M. Hillis. Phylogenetic inference. In D. M. Hillis, C. Moritz, and B. K. Marble, editors, Molecular Systematics, chapter 11, pages 407{514. Sinauer Associates, 2nd edition, 1996.
Goloboff, P., 1999. NONA (No Name) ver. 2. Published by the author, Tucuman, Argentina
Felsenstein, J. 1973b. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst. Zool. 22:240–249.
Hendy, M. D. and Penny, D. 1989. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38:297–309.
Huelsenbeck, J. P. and Lander, K. M. 2003. Frequent inconsistency of parsimony under a simple model of cladogenesis. Syst Biol 52:641–648.
Kim, J. 1996. General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45:363–374.
Nixon, K. C. 1999. Winclada (BETA) ver. 0.9.9 PUBLISHED BY THE AUTHOR, ITHACA, NY. I have become weary of Clados generated trees being published without citation. Please cite the program
M. Steel and D. Penny, Parsimony, likelihood and the role of models in molecular phyloge-netics. Molecular Biology and Evolution 17 839{850 (2000).
Wiley, E. O. 1981. Phylogenetics. The Theory and Practice of Phylogenetic Systematics. John Wiley & Sons, New York.
Zharkikh, A. and Li,W. -H. 1993. Inconsistency of the maximum parsimony method: the case of five taxa with a molecular clock. Syst. Biol. 42:113–125.