Filosofía, especie y sistemática: 2011

domingo, 28 de agosto de 2011

A geometrical approach to know the distributional pattern/structure of the neotropical species of Staphylinidae: Plochionocerus Dejean & Agrodes Nordmann.

Daniel Felipe Silva Tavera

Introduction

The beetles species of the genus Plochionocerus, and Agrodes, have been recently subjected of phylogenetic and Biogeographic analysis[1][2]. The species of these genus have caracteristic large body size and metallic colorations; as a result of their systematic revision, several synonyms were detected, mainly for species of Plochionocerus, which currently comprise 18 species and Agrodes with 2 species[1]. a Track analysis of these sister taxa were implemented using the Croizat`s manual reconstruction[2]. Three generalized tracks were identified from 15 individual tracks. This track analysis provides further species supporting the primary biogeographic homology of the 3 detected generalized tracks, which correspond to 3 major biotic components. Two of the generalized tracks are in the Caribbean subregion and a third in the Amazonian subregion[3]. In order to avoid the ambiguity and the subjective factor that lies on the traditional track analysis[4], in this opportunity is implemented a geometrical approach to know the distributional pattern/structure of the species of Plochionocerus & Agrodes, and answer the question: Are the generalized tracks representing the general patterns of distribution in the neotropical species of Plochionocerus and Agrodes?.

Methods

The distributional information of 13 of 18 Plochionocerus species and the 2 species of Agrodes are considered here. 279 record were used for build the input file with the distributional data, to be used on MartiTracks[4]. 38 of these records come of GBIF(accessed through GBIF data portal, Entomology Collection, http://data.gbif.org/datasets/resource/7911), 2 records from CENTO-UIS, the rest from the revision work of ASIAIN et al in 2007 and Herman Lee in 2001[5]. The parameters values implemented are show in the commands1.txt file(below).

Results & Discussion

From 14 original (individual) tracks, were proposed the hipotesis primary of biogeographic homology, represented by 6 generalized track(fig 1). four are in the Amazonic subregion and two are in the Caribbean subregion (1 on the mesoamerican domain and the other in the Northeast South American domain). the 4 amazonic generaliced tracks are based on the individual tracks of A. conicicollis, A. elegans, P. janthinus, P. igneus, P.fulgens, P.splendens and the 2 Caribbean generalized tracks are based on the individual tracks of P.discedens, P.simplicicollis, P.ashei, P.humeralis, P.impressipennis, P.marquezi, P.puncticeps, A. elegans. The geographical distribution of P.newtonorum and P.pronotalis does not coincide with any of the generalized tracks obtained. From my geometrical approach to know the distributional pattern of these staphylinids , the hipotesis primary of homology biogeographic shown by Asiain et al (3 general tracks), is reevaluated, considerating the six general tracks proposed above. Nine species have been recorded exclusively from South America, 2 exclusively from Central America and 4 are shared between both areas. However, these results allow corroboration of previous biogeographic hypotheses about the mesoamerican and southamerican tracks from other component of the staphylinid biota[6].

Conclusion.

The implementation of a geometrical tool represent an unambiguous pangiogeographic approach to know the distributional pattern of these taxa.

[1] Asiain, J., J. Márquez and J. J. Morrone. 2007. Phylogenetic systematics of the genera Plochionocerus Dejean and Agrodes Nordmann (Coleoptera: Staphylinidae: Xantholinini).

Zootaxa 1584:1-53

[2] Asiain, J., J. Márquez and J. J. Morrone.2010. Track analysis of the species of Agrodes and Plochionocerus (Coleoptera:Staphylinidae). Revista Mexicana de Biodiversidad 81: 177- 181, 2010

[3] Morrone, J. J. 2006. Biogeographic areas and transition zones of Latin America and the Caribbean islands based on panbiogeographic and cladistic analyses of the entomofauna.

Annual Review of Entomology 51:467-494.

[4] Echeverría-Londoño, S. & Miranda-Esquivel, D. R.2011. MartiTracks: a geometrical approach for identifying geographical patterns of distribution. PLoS ONE, 6(4), 0018460

[5] Herman, L.2001. Catalog of the Staphylinidae (Insecta: Coleoptera). 1758 to the end of the second millennium. VI. Staphylinine Group (Part 3). Staphylininae: Staphylinini (Quediina, Staphylinina, Tanygnathinina, Xanthopygina), Xantholinini. Staphylinidae Incerta Sedis Fossils, Protactinae. Bulletin of the American Museum of Natural History,

265, 3021–3840.

[6] Márquez, J. and J. J. Morrone. 2003. Análisis panbiogeográfico de las especies de Heterolinus y Homalolinus (Coleoptera, Staphylinidae, Xantholinini). Acta Zoológica Mexicana (nueva serie) 90:15-25

commands1.txt

sset cv 0.25

set lmin 0.5

set lmax 0.75

set maxline 1

set ci 0.8

kmlgen

croizat0

bash: croizat0.sh

#!/bin/bash

wine mt05-win32.exe test1.dat test1.dat.kml commands1.txt

Phylogeny of Tabaninae: A critique to Abu El-Hassan et al. (2010)

Introduction

Tabanidae is a Diptera famyly , which has been reconized the monophyly on basis of molecular information (Wiegmann et al. 2000; Morita, 2008). However, relationships within the family have not been resolved. Abu El-Hassan et al. (2010) based on morphological characters, perform the phylogeny of this family. They did not present a formal phylogenetic analysis, their characters are ambiguous and how to perform the analysis is not adequate. The objective of this study is to evaluate the results obtained by Abu El-Hassan et al. (2010) and compared by a phylogenetic analysis using parsimony criteria.

Materials and methods

For phylogenetic analysis were used 20 terminal taxa and 91 morphological characters recoded from the matrix proposed by Abu El-Hassan et al. (2010), all based on adult morphology. The cladistic analyses, it was implied weights search (Goloboff 1993). with differents concavity values from one to ten using TNT version 1.0 (Goloboff et al 2004). The tree search strategy was an traditional search using tree bisection reconnection randomizing the addition sequence 100 times. Then, made a tree search after jackknife 37%; and, calculated the number of initial groups (those without resampling) recovered after jackknife (Goloboff 1997). Analyzed the character distribution made with WINCLADA 1.00.08 (Nixon 2002).

Results and discussion

All characters presented by Abu El-Hassan et al. (2010) were binary characters, and many of them had ambiguous coding. Most of the characters were recoded binary characters to multistate characters as antennal scape color, antennal pedicel and antennal shaped. The most of the groups recover was implicit weight search with the concavity value of nine. Using this concavity value, we obtained 1 trees (Fit k=9= 8.533). The recovered nodes with each concavity value used are shown in figure 1.

Figure 1. Average number of recovered nodes based on repeating ten runs,

after Jackknife resampling with integer concavity values from one to ten
under implicit weights

Concavity value	Average of the shared consensus nodes
1	0,5789
2	0,5789
3	0,6316
4	0,6842
5	0,6316
6	0,6316
7	0,6316
8	0,7895
9	0,8421
10	0,7895

The phylogenetic analysis support monophyly of Tabaninae, however the internal relationships are no resolved. The Atylotus genera appears as monophyletic, contrary to the results presented by Abu El-Hassan et al. (2010) , This relationship is supported by one character, upper and middle calli separated. The character distribution are shown in figure 2, Finally it is recommended to repeat the analysis by expandind the number of taxa (ingroup and outgroup) and characters, as well as review and coding characters.

Figure 2. Analyzed the character distribution made with WINCLADA 1.00.08, Jacknife 37%, k=9.

References

Abu El-Hassan, Gawhara M. M, Haitham B. M. Badrawy, Salwa K. Mohammad and Hassan H. Fadl (2010). Cladistic analysis of Egyptian horse flies (Diptera: Tabanidae) based on morphological data. Egypt. Acad. J. biolog. Sci., 3 (2): 51- 62.

Goloboff, P. A. (1993) Estimating character weights during tree search. Cladistics 9: 83–92.

Goloboff, P. A. (1997) Self-weighted optimization: tree searches and character state reconstructions under implied transformation cost. Cladistics 13: 225-245.

Goloboff, P. A., Farris, J. S. & Nixon, K. (2004) T. N. T:Tree Analysis Using New Technology, Version 1.0. Program and documentation, available from www.zmuck.dk/public/phylogeny/TNT

Morita, S.I. 2008. A phylogeny or long-tongued horse flies(Philoliche, Diptera:Tabanidae) with the first cladistic evaluation of higher relationships within the family. Invertebrate Systematics, 22(3): 311-327.

Nixon, K. C. (2002) WinClada Version 1.008. Sofware implementation. Published by the author. Ithaca. New York. Available from www.cladistics.com

EVALUATION OF THE GEOGRAPHIC STRUCTURE IN DENGUE VIRUS TYPE 1 FROM A PHYLOGENETIC AND BIOGEOGRAPHIC APPROACH

INTRODUCTION

Phylogenetic relationships amongst strains of dengue virus often can show a strong structure associated with geography and temporality (Gray et al. 2011; Carvalho et al. 2009), however geography seems to be the main component in modeling these phylogenetic reconstructions. Likewise, global comparisons of lineages and their geographic location have allowed further classifications of isolates from the same serotype into new genotypes known as topotypes (Samuel and Knowles, 2001). However, due to the poor georeferencing of the isolates in public databases, sometimes to make inferences about geographic patterns is hard and doubtful because the management of country´s political division can be biogeographically inadequate and little detailed. Based on the above, the aim in this work was to assess the congruence between geographic patterns found from phylogenetic and biogeographic approaches in dengue virus type I circulating in America.

METHODS

Phylogenetic analysis of 50 DENV-1 E gene sequences were assess from the Bayesian inference criterion using BEAST v1.6.2 program (Drummond & Rambaut, 2007), under a General Time Reversible model of nucleotide substitution (Rodriguez et al.,1990) with gamma-distributed rate variation and a proportion of invariable sites (GTR + G + I) were selected and two runs of 4 chains were run for ten millions of generations. Sequences were sampled in American counties, including islands in the Atlantic and Pacific Oceans

From the topology (maximum clade credibility tree) obtained, in the Phylogeographic analysis were identified possible genotypes according to five areas intuitively postulated on the basis of geographic information contained in each clade. The criteria used were monophyletic clades and posterior probabilities values above 0.80. Results were constrasted with the subclusters found by Carvalho et al.2010.

Finally, the geographic patterns were evaluated following the method of track compatibility by Craw (1988a, 1989a). The areas used were those postulated in this work and the biotic components of Latin America and the Caribbean compiled by Morrone (2004). under the level of large regions and provinces.

RESULTS AND DISCUSSION

The phylogenetic relationchips from American sequences seems to be structured by geographics patterns. According with this, five areas were proposed corresponding to Pacific, Caribbean, southern South America, central América and Northern south America. These components were determined following the geographic information available to each viral isolated. Intuitively, central and Northern south America were taken as independent unities.

Figure 1. Maximum clade credibility tree in Bayesian analysis of E

gene sequences representing Latin America strains. Posterior probabilities are shown for key nodes.

Phylogeographic analysis pointed the same pattern like phylogenetic analysis, also SAN and CA were closely related. The strong geography associated structure posibbly indicates the continous viral movement between different countries and in differents directions. On the other hand, the viral exchange seems to be limited and uneven among areas, even though they are geographycally close, as with the Caribbean and Central America.

Figure 2. Phylogeographic patterns between genotypes and postulated areas in Dengue virus type 1

Tracks compatibility analysis resulted in a clique (based in regions) representing a pattern that related Mexican transition area with Neotropical Region, which is congruent with the relationship between SAN and CA areas in phylogeographic analysis. This is probably due to the magnitud of the areas which includes a higher proportion of distributions and strains that are distribuited in intermediate areas. Areas delimited as Provinces by Morrone (2004) and phylogeographic areas delimited here, do not showed compatible traks.

Figure 3. Traks compatibility analysis. a) Areas proposed in this study. Biotic components of Latin America and the Caribbean b) Provinces c) Regions.

CONCLUSION

Phylogenetic and Biogeographic analysis in dengue virus can reflect a similar geographic pattern however is necessary to know the level in which both approaches can be congruent. In this study, Central America and northern South America form a large unit that corresponds to the clique found in the track compatibility analysis, which supports the close relationship between the Mexican transition area and the Neotropical region. Obviously, the use of geopolitical units in the assessment of geographical structure in shaping the phylogenetic relationships dengue is not the most accurate and dengue virus strains behave as a large dispersive population connecting large areas in America.

REFERENCES

Carvalho SE, Martin DP, Oliveira LM, Ribeiro BM, Nagata T (2010) Comparative analysis of American Dengue virus type 1 full-genome sequences. Virus Genes 40: 60–66.

CRAW, R. C. 1988. Continuing the synthesis between panbiogeography, p

hylogenetic systematics and geology as illustrated by empirical studies on the biogeography of New Zealand and the Cha tham Islands. Systematic Zoology 37: 291-310.

CRAW, R. C. 1989a. New Zealand biogeography: A panbiogeographic approach. New Zealand Journal of Zoology 16: 527-547

Drummond AJ & Rambaut A (2007) "BEAST: Bayesian evolutionary analysis by sampling trees." BMC Evolutionary Biology 7, 214

Gray, R. R., Pybus, O. G. and Salemi, M. (2011), Measuring the temporal structure in serially sampled phylogenies. Methods in Ecology and Evolution. doi: 10.1111/j.2041-210X.2011.00102.x

Morrone, Juan J. 2004. Panbiogeografía,componentes bióticos y zonas de transición. Fonte: Rev. bras. entomol;48(2):149-162

Samuel, A. R., Knowles, N. J. 2001. Foot-and-mouth disease type O viruses exhibit genetically and geographically distinct evolutionary lineages (topotypes). Journal of General Virology 74, 2281-2285.

domingo, 22 de mayo de 2011

Measure support branches

Gualdrón-Diaz J. C.

Once it has obtained cladograms; it is important to know how strong is the evidence that supports a node. There are different ways to interpret the support (Stability, confidence levels and reliability) and different methods to asses it; the most popular are the resampling methods such as Bootstrap and Jackknife and those linked to relative optimality values such as Bremer support (Wheeler, 2010). For this must be a clear distinction in some terms. According Goloboff et al. (2003); Brower (2006, 2010) support and stability are logically different, support for a given branch in a tree is a measure of the net amount of evidence that favors the appearance of that branch in a most parsimonious topology and stability is the persistence of a given branch in the face of the addition, deletion, or reweighting of characters, taxa, or both from the data matrix as in bootstrap and jackknife approaches. Likewise, strong statistical assumptions are necessary to interpret jacknife or bootstrap as confidence levels (Felsenstein, 1985). Another way to measure the support for individual branches of a cladogram is Bremer support, also referred as the “decay index”(Bremer, 1994). It is measured by comparing the fit of the data to optimal and suboptimal trees. This support measure two different aspects of group support. The absolute bremer estimated amount of favorable evidence (Bremer, 1994) and relative bremer (Goloboff and Farris, 2001) estimated the ratio between favorable and contradictory evidence (Goloboff et al., 2003). Both support and stability are attributes have proven to be particularly tricky to measure in a direct manner, due to the complexity of character interactions in homoplastic data (Goloboff and Farris, 2001). Nevertheless, these measure serves as a means to discern groups that are plausible from those that are dubious,and can act as a guide to the generation of additional data to refine and improve the hypothesis (Brower, 2006).

Jackknifing and bootstrapping sometimes produce incoherent results. Uninformative characters and characters irrelevant to the monophyly of a group can influence the values of support for Jacknife and Bootstrapp, to solve this Farris et al. (1996) proposed to assign equal probabilities of deletion to individual characters. Similarly Goloboff et al. (2003) suggest a Poisson-based sampling regime for bootstrapping that also alleviates this problem. One clear advantage of the jackknife over the bootstrap is that the values on branches are less affected when there are characters with homoplasy(Freudenstein and Davis, 2010). Another wrong conclusion with regard to support both for Jackknife and Bootstrapp is when some characters have differents weights or costs, producing either under or overestimations of the actual support (Goloboff et al., 2003).This influence of the weight can be eliminated by symmetric resampling, done that the probability of increasing the weight of the character equals the probability of decreasing it (Goloboff et al., 2003); so, given the above, this explains the differences in the error produced by jackknife and bootstrap.

Bremer support rather than being an estimate based on pseudoreplicated subsamples of the data (like bootstrapping and jackknifing) is a statistical parameter of a particular data set and thus is not dependent on the data matching a particular assumed distribution; an advantage of bremer support that it never hits a maximum value (such as 100%), and continues to increase as character support for a particular branch in the tree accumulates (Brower, 2006). A defect of that method is that it does not always take into account the relative amounts of evidence contradictory and favorable to the group. This problem is diminished if the support for the group is calculated as the ratio between the amounts of favorable and contradictory evidence (Goloboff and Farris, 2001). This method is known as relative bremer and its potential advantages are that their values vary between 0 and 1 and they provide an approximate measure of the amount of favorable/contradictory evidence. Under weighting methods the bremer supports may be hard to interpret, but the relative supports for different weighting strengths are directly comparable (Goloboff and Farris, 2001). A disadvantage of the relative supports is that the values of in different pairs of trees must be calculated carefully.

An important extension of bremer support was the discovery by Baker and DeSalle (1997) is Partitioned Branch Support (PBS). The PBS value for a particular branch for a given data partition is determined by subtracting the length of the data partition on the MP tree(s) from the length of the data partition on the MP anticonstraint tree(s) for that branch (Brower, 2006). Thus, given partition may contribute positively, be neutral or conflict with the weight of the evidence that supports a particular branch in combined analysis.PBS allows exploration of partition incongruence within a total evidence framework (Brower et al., 1996). This ability to localize incongruence to a single partition for a single branchs has the potential to reveal both interesting evolutionary processes, such as selection on a particular gene. Partitioning data is a potentially useful way to explore incongruence of signal among characters from different sources (Brower, 2006). PBS has the advantage that parameters calculated are using the complete data matrix and may be for any combination of partitions. One of the problems with PBS is that it is sensitive to missing data, and can shift dramatically among partitions as missing data are filled into the matrix (Brower, 2006). Much of the critism of support measures is focused upon their employment of reanalyses of data subsets or partitions as though they were separate sources of evidence, but as have pointed out Goloboff et al. (2003), no measure of clade quality yet developed is immune to certain cases conceivable.

According Brower (2006) there are no objetive means to set a criterion of rejection of support or stability for a particular branch in a particular cladogram. Nevertheless the support for the current data does not necessarily imply that this will be robust to addition of taxa and characters: support today is no guarantee of stability in the future. For this reason, measurements that imply a confidence interval like bootstrap values are potentially misleading; By contrast bremer support, because it has no upper bound, is more direct and way to document the accumulation of character support for a particular branch as additional data are incorporated in a particular phylogenetic hypothesis (Brower, 2006).

References

RH Baker and R DeSalle. Multiple sources of character information and the phylogeny of hawaiian drosophilids. 1997.

Kare Bremer. Branch support and tree stability. Cladistics, 10(3):295–304, 1994. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1994.tb00179.x. URL http://dx.doi.org/10.1111/j.1096-0031.1994.tb00179.x.

A. V. Z. Brower, R. DeSalle, and A. Vogler. Gene trees, species trees, and systematics: A cladistic perspective. Annual Review of Ecology and Systematics, 27(1):423–450, 1996. doi: 10.1146/annurev.ecolsys.27.1.423. URL http://www.annualreviews.org/doi/abs/10.1146/annurev.ecolsys.27.1.423.

Andrew V. Z. Brower. The how and why of branch support and partitioned branch support, with a new index to assess partition incongruence. Cladistics, 22(4):378–386, 2006. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2006.00113.x. URL http://dx.doi.org/10.1111/j.1096-0031.2006.00113.x.

Andrew V. Z. Brower. Stability, replication, pseudoreplication, support and consensus a reply to brower. Cladistics, 26(1):112–113, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00319.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00319.x.

James S. Farris, Victor A. Albert, Mari KAllersjA¶, Diana Lipscomb, and Arnold G. Kluge.Parsimony jackknifing outperforms neighbor-joining. Cladistics, 12(2):99–124, 1996. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1996.tb00196.x. URL http://dx.doi.org/10.1111/j.1096-0031.1996.tb00196.x.

Joseph Felsenstein. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 39(4):783–791, 1985. ISSN 00143820. doi: 10.2307/2408678. URL http://dx.doi.org/10.2307/2408678.

John V. Freudenstein and Jerrold I. Davis. Branch support via resampling: an empirical study. Cladistics, 26(6):643–656, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00304.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00304.x.

Pablo A. Goloboff and James S. Farris. Methods for quick consensus estimation. Cladistics, 17(1):S26–S34, 2001. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2001.tb00102.x. URL http://dx.doi.org/10.1111/j.1096-0031.2001.tb00102.x.

Pablo A Goloboff, James S Farris, Mari Kallersj, Bengt Oxelman, M J Ramirez, and Claudia A Szumik. Improvements to resampling measures of group support. Cladistics, 19(4):324–332, 2003. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2003.tb00376.x. URL http://dx.doi.org/10.1111/j.1096-0031.2003.tb00376.x.

Ward C. Wheeler. Distinctions between optimal pected support. Cladistics, 26(6):657–663, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00308.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00308.x.

Branch Support: confidence, stability, credibility?

By Susana Ortiz

One way of assessing whether a clade present in a phylogenetic reconstruction really is part of the true configuration in the phylogeny, is evaluating its support, which may be established by estimating confidence intervals based on sampling methods (Bootstrap and Jackknife), and Bremer support, based on the length difference of trees as a stability measure. Even if, this approaches are not independent of the search strategy given that they are sensitive to its effectiveness (Freudenstein and Davis, 2010). Therefore a highly weighted clade, not necessarily means it is real, maybe is just the kind of response that fits to the resources used (e. g. search strategy). Posterior probabilities in Bayesian analyses have been used as a probabilistic measure of support (e. g. Goloboff et. al, 2003; Pickett and Randle, 2005), because it quantifies credibility, how likely a certain clade is to be correct, given the data, model and priors (Huelsenbeck et al., 2002). Comparision between Bayesian and nonoparametric Bootstrapping was proposed by Efron et al. (1996), where the bootstrap confidence level can be thought as the assessments of error for the estimated tree. However, posterior probabilities are sensitive to the prior for internal branch lengths (Yang Z., Rannala 2005), and are significantly higher than corresponding nonparametric bootstrap frequencies when the models used for analyses are underparameterized (Goloboff et. al, 2003). Despite have been several the attempts to come close the different approaches under certain conditions, this approaches are not freely assessable under all phylogenetic criteria given some restrictions not only methodological but conceptual.

Bootstrap and Jackknife are resampling techniques from the original data to infer variability of the estimate, in this case the phylogeny. The variation among trees provide an adequate indication of the uncertainty (Felsestein, 1985). Nevertheless, Bootstrap has also been proposed as a tool to assess robustness with regard to small changes in data (Holmes, 2003), it is not a test of how accurate is a topology but provides information about its stability, as well as to assess whether the data are adecuate to validate the topology (Berry and Gascuel, 1996). As for repeteability unless it is a perfectly Hennigian data set (Felsestein, 1985), is expected to have variations between replicas, so one might think that many replicates would mean a greater precition regarding the idea of which groups are monophyletic, but according to Pattengale et al. (2009), rather small number of Bootstrap replicates (typically after 100–500 replicates) producing support values that correlate at better than 99.5% with the reference values on the best ML trees.

This last, although the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. In the same way, the above does not mean that a clade is or is not monophyletic depending on its support, this just points out the certainty with which you can find a particular node in the topology. If this node are not in the Bootstrap consensus, it could means there is a polytomy due to multiple nodes’ resolutions maybe by incongruence between characters. Mort et al (2000), compared Bootstrap and Jackknife, their findings show the relation between the bootstrap’s values and the deletion proportion chosen in Jackknife. However, in favor of Jackknife, it has been proposed as a rapid and efficient method to identify strongly supported clades (Farris et al. 1996) and the assigment of equal deletion probabilities to characters, it reduces the problem of competition bewteen informative and noninformative characters (Freudenstein and Davis, 2010).

Bremer support (Bremer, 1984) is another alternative to measure support, although only under Parsimony criterion. This method measures the diference between the most parsimonious cladogram and suboptimal that lacks of interes clade (Grant and Kluge, 2008). So in Bremer a strongly supported branch means a large increment in the length of the suboptimal trees. The absolute (Bremer, 1984) and relative Bremer support (Goloboff and Farris, 2001) are variants depending on the type of evidence that it takes into account. The firts measures the absolute amount of favorable evidence, and second the ratio between favorable and contradictory evidence to the group, and both represent two aspects of support that can vary independently (Goloboff et al., 2003). Bremer support as a support measure has been interpreted as a stability measure, so independent to the influence to autapomorphies and lower frequencies for better supported groups, however, have raised objections to this vision, such that stability depends of the specific scenario as noted Goloboff et al. (2003) “a group stable under additions of characters may be very unstable under addition of taxa or under recoding of some charactes” but bremer as support only is based on the available evidence.

Homoplasy is another factor affecting the estimation of support, clades delimited by “unique and unreversed” or relatively less homoplastic character states are often considered more strongly supported (Grant and Kluge, 2008), although all support aproaches are not equally sensitive. According to Freudenstein and Davis (2010) The values on branches not affected by homoplasy are slightly higher for the bootstrap than the jackknife, but the addition of homoplastic characters caused support on branches affected by homoplasy to drop substantially more, as measured by the bootstrap than as measured by the jackknife different to Bremer support which takes the distribution of homoplasy into account (Sanderson, 1995). Incongruence between characters, the proportion of homoplastic characters versus homologous, additivity, and character weighing (in bootstrap) are key topics in the evaluation of support. Number of nonhomoplastic synapomorphies supporting a clade provides a numerical estimate of the support of a hypothesis but maybe it does not provide evidence than favor a hypothesis over some another alternative (Wilkinson et al. 2003). I agree with Grant and Kluge (2003) about support measures do not test phylogenetic hypotheses, they evaluate the relative degree or strength of evidence.

References

- Berry, V. and Gascuel, O. (1996). On the interpretation of bootstrap trees: Appropriate threshold of clade selection and induced gain. Molecular Biology and Evolution 13 999–1011.
- Efron B., Halloran E., and Holmes S. 1996. Bootstrap confidence levels for phylogenetic trees. Recherche, 93(14):7085–7090.
- Erixon, P B. Svennblad, T. Britton y B. Oxelman. 2003. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylo- genetics. Systematic Biology 52: 665-673
- Farris, J.S., 1996. Jac. Computer Program Distributed by the Author. Moleky-larsystematiska laboratoriet, Naturhistoriska riksmuseet, Stockholm, Sweden.
- Felsestein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783–791.
- Freudenstein J. V., Davis J., I. 2010 Branch support via resampling: an empirical study. Cladistics, 26:1–14.
- Goloboff P. A., Farris J. S., K Mari, J Ram, and C. A. Szumik. Cladistics Improvements to resampling measures of group support. Cladistics, 19:324–332, 2003. doi: 10.1016/S0748-3007(03)00060-4.
- Grant, T., Kluge, A. G. 2003. Data exploration in phylogenetic inference: scientific, heuristic, or neither. Cladistics 19, 379–418.
- Grant, T., Kluge A. G. 2008. Cladistics Clade support measures and their adequacy. Cladistics, 24:1051–1064, 2008.
- Holmes S. 2003. Bootstrapping Phylogenetic Trees :. October, 18(2):241–255, 2003.
- Huelsenbeck, J. P B. Larget , R. E. Miller, and F. Ronquist. 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51:673–688.
- Mort, M.E., Soltis, P Soltis, D.E., Mabry, M.L., 2000. Comparison of three. S., methods for estimating internal support on phylogenetic trees. Syst. Biol. 49, 160–171.
- Pattengale N. D., Masoud Alipour, Olaf R. P. Bininda-emonds, Bernard Memoret, and Alexandros Stamatakis. 2009. How Many Bootstrap Replicates Are Necessary ? (i):184–200.
- Pickett, C.P Randle. 2005. Strange bayes indeed: uniform topological priors imply non-uniform clade priors, Molecular Phylogenetics and Evolution 34.
- Sanderson, M.J., 1995. Objections to bootstrapping: a critique. Syst. Biol. 44, 299–320.
- Wilkinson, M., Lapointe, F.-J., Gower, D.J., 2003. Branch lengths and support. Syst. Biol. 52, 127–130.
- Yang Z., Rannala B. 2005. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54(3), 455-70.

viernes, 22 de abril de 2011

Bayesian Inference in Phylogenetic Analysis

The growing peak of Bayesian methods in phylogenetic inference in the last two decades is the result of the implementation of Markov chain Monte Carlo algorithms (MCMC; which include Monte Carlo method) and Metropolis-Hastings algorithm, on the estimation of posterior probability distributions and the exploration of more parameter-rich evolutionary models (Nylander, et, al. 2004). Statistically, the Bayesian inference calculates the probability that a hypothesis be true given the posterior probability based on priors probabilities and the likelihood under each hypothesis. Here the probability is used to represent uncertainty about the phylogeny and in the parameters of the model and not as expected frequency of occurrence like in classical or frequentist statistics (Yang, 2006). Under a philosophical context, the Bayesian and Maximum Likelihood approaches are inference methods, which are based in calculation of probabilities of evolutionary transformations of characters (for example, according to a nucleotide substitution model) instead of evaluation of possible synapomorphies, also any homologous characters (apomorphic and plesiomorphic) can be used for inferring phylogenies and they assume that evolution may be reticulated and not always dichotomous (Lukhtanov, 2010).

At present, is possible to identify six key topics related with Bayesian approach in phylogenetic reconstruction: (1) Integration of complex evolutionary models, (2) heterogeneity across the sites (3) heterogeneity across the data in analysis of combined data (Huelsenbeck et, al. 2001), (4) computational efficiency (Ronquist & Huelsenbeck, 2003), (5) posterior probability for a tree or clade has an easy interpretation(Yang, 2006) and (6) incorporation of priors values. The development of Bayesian MCMC algorithms is the associated cause with the increase in computational efficiency making possible to analyze more complex and realistic evolutionary models. This does not mean that parameters rich models are appropriated for all data set because we may make the mistake of overparameterization and more complex evolutionary models are associated with more topological uncertainty (Nylander, et, al. 2004). However, the interesting is that it open up the possibility of exploring more realistic models and complex (but a real model can be simple) to recognize heterogeneity across the sites and combined data.

In term of computational efficiency, although Maximum Likelihood analysis has gained ground with the improvement and development of new algorithms implemented in software such PhyML (Guindon and Gascuel, 2003) and RAxML (Stamatakis, 2007) and now is possible perform moderately fast and accurate bootstrapping to determine confidence, in Bayesian inference the interpretation of posterior probability is easier, it is the probability that the tree or clade is correct given the data, model and priors (Yang, 2006), whereas the interpretation of bootstrap although tend to be more conservative has been controversial specially with model misspeciﬁcation and when the signal is only detectable at some sites (Ronquist and Deans, 2009). Despite the above, Bayesian posterior probability according to Yang (2006) can be spuriously high due to lack of convergence, poor mixing, misspecification of the likelihood substitution model and misspecification and sensitivity of the priors. About this last, it allows to incorporate prior knowledge about a particular hypothesis or to use vague or uninformative priors when the little information is available or do not want to build the analysis on any previous (Ronquist et, al. 2008). Since, priors can be subjective, is convenient to assess the influence of the priors on the posterior probability. Pickett and Randle (2005) pointed out that with a uniform prior the posterior probability of a clade depends on the both the size on the clade and the number of species.

Regarding to the lack of convergence and poor mixing in Bayesian approach them leading to spuriously high support for the trees visited in the chain (Yang & Rannala, 2005), multiple long chains from different starting points could be a possible solution when the data set is very large. However, Bayesian inference to face to another problem, this is inconsistency. Under Bayes´ criterion the tree topologies are estimated without estimating branch lengths this is integrating branch lengths for a given tree topology over a distribution of possible values (Goloboff & Pol, 2005), then statistical consistency can be lose and Bayesian inference is biased in favor of topologies that group long branches together, even when the true model and priors distributions of evolutionary parameters over a group of phylogenies are known. According to Kolaczkowski and Thornton (2009), this bias becomes more severe as more data are analyzed and sequences sites evolve heterogeneously and is relatively weak when the true model is simple. So, Bayesian inference is less efficient and less robust to the use of an incorrect evolutionary model than ML. Despite the above, Bayesian MCMC inference is a promising approach to phylogenetic analysis and computational biology that has been in progress with the development of algorithms, methods and software, nevertheless more research is needed on how to incorporate previous knowledge into appropriate informative priors and how to deal with complex models and large data sets without falling into inconsistency.

References

Goloboff P., Pol D. Parsimony and Bayesian phylogenetics. 2005. In: Albert V. (ed). Parsimony, phylogeny, and genomics. Oxford University Press. 210-266.

Guindon S., Gascuel O. 2003. PhyML: " A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 2003 52(5):696-704.

Huelsenbeck J. P., Ronquist F., Rasmus N., Bollback J. P. 2001. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science. 294 (5550) : 2310-2314

Kolaczkowski B., Thornton J. W. 2009. Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics. PLoS One. 4:e7891. doi:10.1371/journal.pone.0007891

Lukhtanov V. A. 2010. From Haeckel’s Phylogenetics and Hennig’s Cladistics to the Method of Maximum Likelihood: Advantages and Limitations of Modern and Traditional Approaches to Phylogeny Reconstruction. Entomological Review. Vol. 90(3) : 299-310

Nylander J. A. A., Ronquist F., Huelsenbeck J. P., Nieves-Aldrey J. L. 2004. Bayesian Phylogenetic Analysis of Combined Data. Syst Biol. 53(1) : 47-67

Pickett K. M., Randle C. P. 2005. Strange Bayes indeed: Uniform topological priors imply non-uniform clade priors. Mol. Phyl. Evol. 34:203-211

Ronquist F., Deans A. R. 2009.Bayesian Phylogenetics and Its Inﬂuence on Insect Systematics. Systematic Entomology 35 (3): 349-378.

Ronquist F., van der Mark P., Huelsenbeck J. P. 2009. Bayesian phylogenetic analysis using MrBayes. In: Lemey P., Salemi M., and Anne-Mieke V. (eds.) The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing. Cambridge University Press. 219-236.

Stamatakis, A., Blagojevic, F., Nikolopoulos, D., Antonopoulos, C. 2007 Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell. The Journal of VLSI Signal Processing. 48 : 271–286

Yang Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford, England

Yang, Z., and B. Rannala. 2005. Branch-length prior influences Bayesian posterior probability of phylogeny. Systematic Biology 54: 455-470.

martes, 22 de febrero de 2011

MAXIMUM LIKELIHOOD AND MAXIMUM PARSIMONY UNDER A SIMPLE MODEL.

Jiménez- Silva C. L.
Universidad Industrial de Santander.
Laboratorio de Sistemática y Biogeografía

INTRODUCTION

Stochastic models for nucleotide substitution are becoming increasingly important as a foundation for inferring phylogenetic trees from genetic sequence data. Such models allow for tree reconstruction through either maximum likelihood-based approaches or the _tting of transformed functions of the data to trees (see Swo_ord et al. (Swo_ord et al., 1996) for a recent survey). The models are also useful for analysing the performance of other, more conventional tree reconstruction methods, which are not explicitly based on such models, such as the popular maximum parsimony method (Fitch, 1971). Maximum parsimony (MP) is a popular technique for phylogeny reconstruction. However, MP is often criticized as being a statistically unsound method and one that fails to make explicit an underlying ‘‘model’’ of evolution (Steel and Penny, 2000). Parsimony does not make explicit assumptions about the evolutionary process. Some authors argue that parsimony makes no assumptions at all and that, furthermore, phylogenies should ideally be inferred without invoking any assumptions about the evolutionary process (Wiley 1981). Others point out that it is impossible to make any inference without a model; that a lack of explicit assumptions does not mean that the method is ‘assumption-free’ as the assumptions may be merely implicit; that the requirement for explicit specification of the assumed model is a strength rather than weakness of the model-based approach since then the fit of the model to data can be evaluated and improved (e.g. Felsenstein 1973).

METHODS

In the analysis were simulated sequences of 1000 bp under JC69 model and all branches on the tree are assumed to have the same length, were 10 replicates of each simulation under Seq-Gen v1.3.2 program (Rambaut & Grassly, 1997). For different topologies, 3 taxa, 4 taxa, 6 taxa and 12 taxa. The sequences generated were analyzed in parsimony using TNT (Goloboff et. al, 2008) program and Winclada and NONA (Goloboff, 1999; Nixon, K. C. 1999) program to 3 taxa. Also, the sequences were analyzed in Maximum likelihood using PhyML (Guindon & Gascuel, 2005), to check an equivalence between parsimony and likelihood under a particular model, this is JC69. The nucleotide models evaluated were JC69 in PhyML, For each simulation was performed the same procedure and finally the topologies generated were compared with Tree C program (Arias & Miranda-Esquivel) assuming on equal an exactly equal nodes. It obtains eventually a total of 200 comparisons were made and the process was automated by constructing scripts in bash.

RESULTS

When, I compared the phylogenetic reconstructions, data showed equivalence between parsimony and likelihood under a JC69 model. Equivalence here means that the most parsimonious tree and the ML tree under the said model are identical in every possible data set. But, this result was only present with the data set of a few terminals, ie 3 and 4 taxa.

JC model was assumed for comparison because I refer to it, as the fully symmetric model since it makes no distinction between any of the character states and being with each sequence being 1000 nt long. As a first approximation, there is no selection at any of the sites, and therefore it is more ‘‘parsimonious’’ to assume one common mechanism for all sites rather than 1000 different mechanisms, one for each site.

That parsimony and likelihood trees used for working with the JC model, sometimes Called the Neyman model with four states. It assumes rates of evolution on the branch of the tree each freely Vary from site to site. In this case, we have some underlying constraints on the type of substitution model (ie, Jukes-Cantor type), but no constraints on the edge parameters from site to site. This is even more general than the type of approach considered by Olsen (see Swofford et al. 1996, p. 443) in which the rate at which a site evolves can vary freely from site to site, but the ratios of the edge lengths are equal across the sites. (Steel and Penny, 2000). On the methodology used in this work, a free parameters model was assumed. For this purpose, was assigned the custom model option in Phyml. When the custom model option was selected, also it is possible to Give to the program a user-defined nucleotide frequency distribution at equilibrium, where calculated parameters are given by the data. Based on this, it is proposed that this type of Underlying model Almost Certainly is too flexible, because it allows many new parameters for each edge. It might be regarded as the model one might start with if one knew virtually nothing about any common underlying mechanism

linking the evolution of different characters on a tree. (Steel and Penny, 2000).

For the data set of 6 and 12 taxa the results were different; between both methods they were obtained under three equal nodes. This difference of equal nodes based on the number of terminals, in concordance with Hendy and Penny (1989) showed that with four species and binary characters evolving under the clock, parsimony is always consistent in recovering the tree, although ML and parsimony do not appear to be equivalent under the model. With five or more species evolving under the clock, it is known that parsimony can be inconsistent in estimating the trees (Hendy and Penny 1989; Zharkikh and Li 1993). Thus it is not equivalent to likelihood.

While, you can consider parsimony and likelihood to be equivalent under the JC69 model, those studies often used small trees with three to six taxa. The cases for much larger trees are not known. However, it appears easier to identify cases of inconsistency of parsimony on large trees than on small trees (Kim 1996; Huelsenbeck and Lander 2003), suggesting that likelihood and parsimony are in general not equivalent on large trees.

REFERENCES

Arias J. S., Miranda-Esquivel D. M. 2007. Tree C.

W. M. Fitch. Toward de_ning the course of evolution: minimum change for a speci_c tree topology.Systematic Zoology, 20:406{416, 1971.

D. L. Swo_ord, G. J. Olsen, P. J. Waddell, and D. M. Hillis. Phylogenetic inference. In D. M. Hillis, C. Moritz, and B. K. Marble, editors, Molecular Systematics, chapter 11, pages 407{514. Sinauer Associates, 2nd edition, 1996.

Goloboﬀ, P., 1999. NONA (No Name) ver. 2. Published by the author, Tucuman, Argentina

Felsenstein, J. 1973b. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst. Zool. 22:240–249.

Hendy, M. D. and Penny, D. 1989. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38:297–309.

Huelsenbeck, J. P. and Lander, K. M. 2003. Frequent inconsistency of parsimony under a simple model of cladogenesis. Syst Biol 52:641–648.

Kim, J. 1996. General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45:363–374.

Nixon, K. C. 1999. Winclada (BETA) ver. 0.9.9 PUBLISHED BY THE AUTHOR, ITHACA, NY. I have become weary of Clados generated trees being published without citation. Please cite the program

M. Steel and D. Penny, Parsimony, likelihood and the role of models in molecular phyloge-netics. Molecular Biology and Evolution 17 839{850 (2000).

Wiley, E. O. 1981. Phylogenetics. The Theory and Practice of Phylogenetic Systematics. John Wiley & Sons, New York.

Zharkikh, A. and Li,W. -H. 1993. Inconsistency of the maximum parsimony method: the case of five taxa with a molecular clock. Syst. Biol. 42:113–125.