domingo, 22 de mayo de 2011

Measure support branches

Gualdrón-Diaz J. C.

Once it has obtained cladograms; it is important to know how strong is the evidence that supports a node. There are different ways to interpret the support (Stability, confidence levels and reliability) and different methods to asses it; the most popular are the resampling methods such as Bootstrap and Jackknife and those linked to relative optimality values such as Bremer support (Wheeler, 2010). For this must be a clear distinction in some terms. According Goloboff et al. (2003); Brower (2006, 2010) support and stability are logically different, support for a given branch in a tree is a measure of the net amount of evidence that favors the appearance of that branch in a most parsimonious topology and stability is the persistence of a given branch in the face of the addition, deletion, or reweighting of characters, taxa, or both from the data matrix as in bootstrap and jackknife approaches. Likewise, strong statistical assumptions are necessary to interpret jacknife or bootstrap as confidence levels (Felsenstein, 1985). Another way to measure the support for individual branches of a cladogram is Bremer support, also referred as the “decay index”(Bremer, 1994). It is measured by comparing the fit of the data to optimal and suboptimal trees. This support measure two different aspects of group support. The absolute bremer estimated amount of favorable evidence (Bremer, 1994) and relative bremer (Goloboff and Farris, 2001) estimated the ratio between favorable and contradictory evidence (Goloboff et al., 2003). Both support and stability are attributes have proven to be particularly tricky to measure in a direct manner, due to the complexity of character interactions in homoplastic data (Goloboff and Farris, 2001). Nevertheless, these measure serves as a means to discern groups that are plausible from those that are dubious,and can act as a guide to the generation of additional data to refine and improve the hypothesis (Brower, 2006).

Jackknifing and bootstrapping sometimes produce incoherent results. Uninformative characters and characters irrelevant to the monophyly of a group can influence the values of support for Jacknife and Bootstrapp, to solve this Farris et al. (1996) proposed to assign equal probabilities of deletion to individual characters. Similarly Goloboff et al. (2003) suggest a Poisson-based sampling regime for bootstrapping that also alleviates this problem. One clear advantage of the jackknife over the bootstrap is that the values on branches are less affected when there are characters with homoplasy(Freudenstein and Davis, 2010). Another wrong conclusion with regard to support both for Jackknife and Bootstrapp is when some characters have differents weights or costs, producing either under or overestimations of the actual support (Goloboff et al., 2003).This influence of the weight can be eliminated by symmetric resampling, done that the probability of increasing the weight of the character equals the probability of decreasing it (Goloboff et al., 2003); so, given the above, this explains the differences in the error produced by jackknife and bootstrap.


Bremer support rather than being an estimate based on pseudoreplicated subsamples of the data (like bootstrapping and jackknifing) is a statistical parameter of a particular data set and thus is not dependent on the data matching a particular assumed distribution; an advantage of bremer support that it never hits a maximum value (such as 100%), and continues to increase as character support for a particular branch in the tree accumulates (Brower, 2006). A defect of that method is that it does not always take into account the relative amounts of evidence contradictory and favorable to the group. This problem is diminished if the support for the group is calculated as the ratio between the amounts of favorable and contradictory evidence (Goloboff and Farris, 2001). This method is known as relative bremer and its potential advantages are that their values vary between 0 and 1 and they provide an approximate measure of the amount of favorable/contradictory evidence. Under weighting methods the bremer supports may be hard to interpret, but the relative supports for different weighting strengths are directly comparable (Goloboff and Farris, 2001). A disadvantage of the relative supports is that the values of in different pairs of trees must be calculated carefully.

An important extension of bremer support was the discovery by Baker and DeSalle (1997) is Partitioned Branch Support (PBS). The PBS value for a particular branch for a given data partition is determined by subtracting the length of the data partition on the MP tree(s) from the length of the data partition on the MP anticonstraint tree(s) for that branch (Brower, 2006). Thus, given partition may contribute positively, be neutral or conflict with the weight of the evidence that supports a particular branch in combined analysis.PBS allows exploration of partition incongruence within a total evidence framework (Brower et al., 1996). This ability to localize incongruence to a single partition for a single branchs has the potential to reveal both interesting evolutionary processes, such as selection on a particular gene. Partitioning data is a potentially useful way to explore incongruence of signal among characters from different sources (Brower, 2006). PBS has the advantage that parameters calculated are using the complete data matrix and may be for any combination of partitions. One of the problems with PBS is that it is sensitive to missing data, and can shift dramatically among partitions as missing data are filled into the matrix (Brower, 2006). Much of the critism of support measures is focused upon their employment of reanalyses of data subsets or partitions as though they were separate sources of evidence, but as have pointed out Goloboff et al. (2003), no measure of clade quality yet developed is immune to certain cases conceivable.

According Brower (2006) there are no objetive means to set a criterion of rejection of support or stability for a particular branch in a particular cladogram. Nevertheless the support for the current data does not necessarily imply that this will be robust to addition of taxa and characters: support today is no guarantee of stability in the future. For this reason, measurements that imply a confidence interval like bootstrap values are potentially misleading; By contrast bremer support, because it has no upper bound, is more direct and way to document the accumulation of character support for a particular branch as additional data are incorporated in a particular phylogenetic hypothesis (Brower, 2006).

References

RH Baker and R DeSalle. Multiple sources of character information and the phylogeny of hawaiian drosophilids. 1997.

Kare Bremer. Branch support and tree stability. Cladistics, 10(3):295–304, 1994. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1994.tb00179.x. URL http://dx.doi.org/10.1111/j.1096-0031.1994.tb00179.x.

A. V. Z. Brower, R. DeSalle, and A. Vogler. Gene trees, species trees, and systematics: A cladistic perspective. Annual Review of Ecology and Systematics, 27(1):423–450, 1996. doi: 10.1146/annurev.ecolsys.27.1.423. URL http://www.annualreviews.org/doi/abs/10.1146/annurev.ecolsys.27.1.423.

Andrew V. Z. Brower. The how and why of branch support and partitioned branch support, with a new index to assess partition incongruence. Cladistics, 22(4):378–386, 2006. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2006.00113.x. URL http://dx.doi.org/10.1111/j.1096-0031.2006.00113.x.

Andrew V. Z. Brower. Stability, replication, pseudoreplication, support and consensus a reply to brower. Cladistics, 26(1):112–113, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00319.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00319.x.

James S. Farris, Victor A. Albert, Mari KAllersjA¶, Diana Lipscomb, and Arnold G. Kluge.Parsimony jackknifing outperforms neighbor-joining. Cladistics, 12(2):99–124, 1996. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1996.tb00196.x. URL http://dx.doi.org/10.1111/j.1096-0031.1996.tb00196.x.

Joseph Felsenstein. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 39(4):783–791, 1985. ISSN 00143820. doi: 10.2307/2408678. URL http://dx.doi.org/10.2307/2408678.

John V. Freudenstein and Jerrold I. Davis. Branch support via resampling: an empirical study. Cladistics, 26(6):643–656, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00304.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00304.x.

Pablo A. Goloboff and James S. Farris. Methods for quick consensus estimation. Cladistics, 17(1):S26–S34, 2001. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2001.tb00102.x. URL http://dx.doi.org/10.1111/j.1096-0031.2001.tb00102.x.

Pablo A Goloboff, James S Farris, Mari Kallersj, Bengt Oxelman, M J Ramirez, and Claudia A Szumik. Improvements to resampling measures of group support. Cladistics, 19(4):324–332, 2003. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2003.tb00376.x. URL http://dx.doi.org/10.1111/j.1096-0031.2003.tb00376.x.

Ward C. Wheeler. Distinctions between optimal pected support. Cladistics, 26(6):657–663, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00308.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00308.x.


1 comentario:

Joe Felsenstein dijo...

Just a few comments:

* Bremer support is an interesting, and possibly useful descriptive measure, but it has no stastistical theory.

* Bootstrap and delete-half-jackknife measures have some statistical meaning under the assumption of statistically independent evolution of characters (sites). A bootstrap (or delete-half-jackknife) support of P indicates that the probability that this much support would occur without the group being there is less than or equal to 1-P (it is often much less than that as these methods are very conservative). See the Kishino-Felsenstein 1993 paper or my book.

* The effect of invariant characters on the bootstrap is very slight, contrary to the impression given here. John Harshman's 1994 paper showed that it scarcely matters whether or not you include invariant characters. I know this seems counterintuitive but consider: if you add invariant characters you create the need to sample the characters more times, but the fraction of times you choose the original non-invariant characters also goes down. So if you have 100 characters, and add another 100 invariant ones, then you need to sample 200 times, only half of which are likely to be from the original characters. They still get sampled about 100 times.

* J.S. Farris's suggestion of using a jackknife that deletes a fraction 1/e of the characters deletes too few. Of course the jackknife then shows too much support for groups, and therefore people like it. They would be even happier if they deleted only 1% of the characters, for then almost all groups would be supported. In my book (in the bootstrap chapter) I provide evidence that 1/2 is a much better fraction to delete than 1/e.