jueves, 13 de marzo de 2008

Physically linked genes and “dubium - signatum clade” (Enallagma – Zygoptera) relationships

Enallagma is a worldwide distributed genus of damselflies (Zygoptera: Coenagrionidae), which have been recognized by shared plesiomorphies and the absence of characters that distinguish related genera (May, 1997), so species relationships have been difficult to discern. Eventhough, the “dubium - signatum clade” is a stable and well supported group among different analysis using different kind of evidence (Brown etal., 2000; May, 2000). Previous analyses assign the relationships among the group as: ((pollotum,(dubium,signatum)),(sulcatum,vesperum)).

The objective of this work is to evaluate the effect of differential evaluation of genes physically linked in the phylogenetic reconstruction of the “dubium - signatum clade” (Enallagma – Zygoptera) under parsimony and maximum likelihood inferences.

For this purpose sequences of a mithocondrial region which contains the COI, tRNA and COII genes were taken from the genbank (AF064995, AF064992, AF065038, AF065034 , AF065033, AF065028, AF65013), and two morphological data matrices were checked (Brown etal., 2000; May, 2002). The shared characters were put only once. Some characters were split because were referring to two characters (i.e presence of the structure and the form (see May, 2002., character 19)) or the character states of a character were the combination of different independent characters recognisable by topological correspondence (Rieppel & Kearney, 2002) (see Brown. etal., 2000, characters 1, 20 and 22. The re-codification of the characters did not affect the relationships among the species of the group suggested in previous analyses. The data was analysed by partitions and in different merges, in order to evaluate the influence in the groups and the evolutionary model, under parsimony and Maximum likelihood criteria. The parsimony analysis was done with the software NONA (Goloboff, 1999) and the characters were mapped with the program winclada (Nixon, 2002). The evolutionary models were selected using hLRT as is implemented in modeltest (Posada & Crandall, 2001) and the MH searches were done with POUP* (Swofford, 2002)

To check the performance of different models of evolution into the same sequence with a real topology as reference, data matrices under different models and with different lengths were generated using the softawe Seq-gen version 1.3.2 (Rambaud, 2007). The data matrices were analysed with parsimony and MH as in the previous part.



Results

Real data

Gene

Model

Base rate

Alfa (G distribution)

COI

HKY + G

0.4322 0.2264 0.0721

0.1220

COII

TrN + G

0.3729 0.1737 0.1521

0.1582

tRNA

K80

equal


COI-tRNA

HKY+G

0.3856 0.2315 0.1138

0.0930

tRNA-COII

TrN+G

0.3652 0.1783 0.1584

0.1553

COI-tRNA.COII

TrN+G

0.3762 0.1846 0.1467

0.1630

Simulations

Matrix

Length

Simulated model

Calculated model

Nodes recovered by likelihood

Nodes recovered by parsimony

1

200

HKY

K2P

all

all

2

600

HKY

HKY

all

all

3

600

F81

F81

all

all

1 + 2

200 + 600


HKY

all

all

1 + 3

200 + 600


HKY

all

all

4

120

F81 + G (0.0075)

F81 + G + I

none

none

5

680

HKY + G (0.0591)

HKY + G + I

all

all

6

60

JC

F81

all

all

7

60

JC + I

JC

two

all

4 + 5

120 + 680


HKY + G + I

all

all

4 + 5 + 6

120 + 680 + 60


TrN + G + I

all

all

4 + 5 + 7

120 + 680 + 60


HKY + G + I

all

all


As other authors have exposed (Brown etal., 2002, May,2002) the group showed to be monophyletic unden both phylogenetic inferences (parsimony an ML) also with high jakknife support values, which shows that the amounts of favorable evidence is greater than the contradictory evidence (Goloboff etal., 2003). The inner relationships of the group were again the same under both inference methods but the dissent with the previous hypothesis in two nodes. In the present analysis the clade (pollutum,(dubium,signatum)) was recoved just for the morphological analysis (not shown), instead dubium appears as the sister group of the rest “dubium – signatum clade” and pollutum-signatum as a group. This result could be due complex morphology among the genera Enallagma, an specially to the lack of clear character definition (i.e states that could be anything, as other colour).


Although, the different partitions gave different results these were not contradictory (except for the morpology that was congruent with the previous hypothesis), and with the increasing in character number (length of the sequences) the resolution and support were also increasing. Instead, this beheaviour is not a rule for real data because the clades recovered by two genes could be different, in this case were the linkage is physical was expexted. The support values increased because the with the addition of new information in this particular case the number of synapomorphies increased as the number of (self-congruent) contradictory charactes ramains low.


Both, in real and simulated data the general development of the model caluculation was the same, to have the higher model among the partitions. In the simulated data, when the model simulation was done with variationof substitution among sites the model was not recovered by the hierarchical test, which is expected because as are different rates of evolution one could become inconsistent (Yang, 2006). Anyway, for most of the cases ML and parsimony recovered the true topology, ML got problems when the simulation model was very slow and other paramenters as invariants were involved.


Nevertheless, the different portions of the gene are evolving at differnt models and rates the groups are not sensitive to this liberty. The “dubium – signatum clade” is a very stable group, in which partitions do not compromise it as a unit and is also consistent in the way as with the increasing in character number the support and resolution of the clade also improve.




Estimating Branch lengths: A Bayesian approach


Villabona-Arenas, C. J.
Laboratorio de Sistemática y Biogeografía, Escuela de Biología
Universidad Industrial de Santander


Introduction

The Bayes theorem is used in Bayesian inference (BI) to calculate the posterior distribution of the parameter, the conditional distribution of the parameter given the data (Holder and Lewis, 2003). Bayesian Evolutionary Analysis Sampling Trees (BEAST) is a software for Bayesian MCMC analysis of molecular sequences which in contrast to other programs, it is orientated towards rooted, time-measured phylogenies. The RNA viruses have broadly similar substitution rates even having different genome organizations and biological properties that implies that both the error rate associated with RNA polymerase and the rate of viral replication are roughly constant (Holmes, 2003); therefore RNA viruses are suitable for BEAST framework. In this work I explored the change in the branch lengths estimates obtained with Maximum Likelihood and Bayesian methods using simulated and real data sets.


Methods

Seq-gen version 1.3.2 (Rambaud, 2007; http://tree.bio.ed.ac.uk/software/seqgen/) was used to simulate aligned sequences [The length of sequences was 1000 nucleotides and the model was Hasegawa-Kishino-Yano, 85 (HKY85)], producing three replicate alignments for three set of simulation parameters used (Figure 1). The Table 1 presents the taxa used for the analyses with real data. The data set included 11 published partial nucleoprotein gene sequences of Rabies viruses (RABV) isolated in Colombia during 1994-2005 deposited in Genbank and the CTN-181 reference RABV strain as out-group. The RABV nucleotide sequences were aligned with Muscle 3.6 software package (Edgar, 2005) using default parameters. The alignments were used to reconstruct a Maximum Likelihood tree (ML) with phyML 2.4.4 software (Guindon and Gascuel, 2003) and Bayesian Inference (BI) with beast software (Drummond and Rambaut, 2007). A bootstrapping with 1000 replicates was used to place confidence values on groupings within the ML tree. For BI two approximations were used: one specifying the different points in time for the sequences and another one without them. The MCMC search was run for 1,000,000 generations, sampling the Markov chain every 1000 generations and using a coalescent tree prior that assumes a constant population size back through time. The 25% trees were discarded as “burn-in” summarizing the posterior distribution of tree topologies and branch lengths finding the maximum credibility tree and the mean node height for each of the clades. Each BI analysis was performed three times

Results

Figure 2 and Table 2 presents the topologies and Branch lengths obtained with ML and BI for each set of simulations respectively. Maximun likelihood recover the true topology in all nine simulations while Bayesian Inference just all cases A and B; in case C IB recover the true topology one of three times. Both methods obtained similar branch lengths values and close to the initial ones for simulations A and B; ML recovers also the branch lengths for simulation C, but BI did not. The ML tree for RABV is presented in Figure 3; The BI trees are presented in Figure 4 and 5. The three trees have the same grouping. BI recovers the same branches as ML when not specifying years; when specifying points in time, the branch length changes according to them.

Discussion


IB presented the rapidly evolving sequences in simulation C as being closely related regardless of their true relationships; this situation supports that the method can suffer from Long branch attraction. Because MCMC is a stochastic algorithm that produces sample-based estimates of a target distribution and the BEAST implementation assumes calibrated trees the method interprets this similarity as a descend-relationship increasing the probability that both taxa be sample a sisters.
In the case of branch length in the previous mentioned simulation, BEAST uses as basic model for rates among branches a strict or relaxed molecular clock. Because of the strong assumption that the rate of evolutionary change of the specified sequences is approximately constant over time, the method no recover the branches well, mainly because it try to adjust the encounter differences to an arrangement when a strong variation among the branches of the tree is not quite common. In the other hand, BI works perfectly when there is not such variation in rates as seeing in closely related species or within populations.
As Figures 3 and 4 show, when the assumptions go according with the main requirements, BI behaves as ML. when dates are incorporate into the model, provide a source of information about the overall rate of evolutionary change that is seen is this case, and a change in the estimated branch lengths in contrast where not years were specified. As present here there are scenes where IB does not work well; in general when working with the coherent framework of the method, IB can be used for evolutionary parameter estimation. Even though not time data implementation is allowed fro ML methods, it recovered branch lengths and correct topologies in all the evaluated scenarios evidencing it as a method to accurately describe molecular sequence variation.




























miércoles, 12 de marzo de 2008

The phylogeny of Falconidae ¿morphological or molecular? A view from PBS

Introduccion

The phylogenetic analyses are subject to inherent factors related to the nature of the data. Among them, incongruence is found in the obtained topologies because different kinds of data used in the analysis. About clade quality, there are two aspects that are frequently mentioned: support and stability (Brower, 2006).

A measure of node support frequently used is the Bremer support or BS (Bremer, 1994). BS is a statistical parameter of a particular data set and it is quantified as the extra length needed to lose a branch in the consensus of near most parsimonious trees. This approach is based solely on the original data, opposed to the data permutation used in the bootstrap procedures (Bremer, 1994).

There are two forms of calculate BS, the first approach is to find the most parsimonious tree(s) for a given data set, and then examining sets of trees of increasing length (referred to as the ‘‘tree decay’’ method). The second method is by the employment of anticonstraint trees (Bremer, 1994).

An extension based in the Bremer's method is the Partitioned Branch Support or PBS (Baker & DeSalle, 1997; Baker et al., 1998), this approach is used when a data set is divided into partitions (morphological-molecular, gene-gene). PBS first estimates the support to each partition and combined data, and after is possible to estimate incongruence between partitions. So, the overall BS for a given branch is the sum of the BS derived from each of the data partitions for the most parsimonious tree(s).

Falconidae (diurnal raptors) is a family within Falconiformes groups. The phylogenetic relationships of Falconidae have been debated along time because morphological and molecular characters generate different results (Griffiths, 1994, 1997, 1999; Griffiths et al., 2004). The objective of this study is to elucidate the phylogenetic relationships within Falconidae using PBS to estimate the support of different data (morphological and molecular).


Methods

15 species representing of Falconidae and two outgroups (Pelecanus onocrotalus and Gampsonyx swainsonii) were chosen. The morphological data were collected from Griffiths (1994, 1999) and molecular sequences from Griffiths (1999) and Griffiths et al. (2004). The RAG-1 sequences were downloaded from GenBank (AY461396 – AY461410, DQ881819 and EF078725). The sequences of specimens were alignment using the software MUSCLE 3.6 (Edgar, 2005). The matrices of costs used in the alignments were generates using TTG version 1.0 (Villabona-Arenas, 2008), available by the author.

The phylogenetic analysis were developed using the software T.N.T. version 1.1 (Goloboff, Farris & Nixon, 2001). Heuristic searches, Bremer Support, and Partitioned Bremer Support were elaborated following the methodologies from Hovenkamp (2005) and Arias et al. (2007). The Partitioned Bremer Support was made using a T.N.T. macro created by Pablo Goloboff (available in http://tux.uis.edu.co/labsist/intro.html). TreeView version 1.6.6 (Page, 1996) was used to view the tree generated.


Results

The phylogenetic analysis using morphological data generates 60 most parsimonious trees of length 45 (Fig. 1), the strict consensus tree is shown in Fig. 2. Nodes that define the various morphological species groups are generally supported by low Bremer support values.



In molecular data (Rag-1) is found one tree (Fig. 3). The resulting groups of this topology were monophyletic, and they were supported in BS. The nodes in molecular topology were different to the results of Griffiths et al. (2004) because the different species (outgroups) used in this analysis. However, the Falco group is recovered in the topology (F. sparverius is not within of Falco group).


In combined analysis generates one most parsimonious tree (Fig. 4). Here, the groups appear supported with a high Bremer Support. In molecular and combined analysis the same nodes were recovered.

There is high incongruence between these two data partitions (Table 1). The results using PBS to estimate the support of partitions indicates that the nodes generated by morphological data were not supported (the PBS in whole nodes of morphological analysis were negatives), so, the Falconid group is collapsed totally. On the other hand, the nodes have a high support in the molecular topology.


Discussion

The measures of support in Phylogenetic Systematics are appropriates to estimate the fit of different kinds of data in phylogenetic analysis. There are several methods to estimate support, Bremer Support among them. A advantage of BS is a statistical parameter of a particular data set, rather than being an estimate based on pseudoreplicated subsamples of the data (like bootstrapping and jackknifing), and thus is not dependent on the data matching a particular assumed distribution (Brower, 2006).

The poor support in nodes of morphological analysis shows that the syringeal data are not posses phylogenetic signal sufficiently, this phenomenon is due to the amount characters that supported a node are not high related to the characters that are not supported this. So, the node is supported for few characters within matrix. I disagree with Griffiths (1994) who stated that the syringeal characters “can be used to resolve phylogenetic questions at the generic and family levels of the Falconidae”. Also, the syringeal morphology is relatively conservative within genera and there may not be enough variation within speciose genera to resolve relationships (Griffiths, 1994). So, these characters were not highly informative in this study.

In molecular and combined analysis, the recovered nodes show high BS. Also, the PBS for molecular partition is high (1-812). The very high values in PBS in the nodes (Milvago chimachima, Polihierax semitorquatus), and (Daptrius americanus, Falco sparverius) is due the great phylogenetic signal of molecular data against the morphological data.

The implementation of PBS in phylogenetic analysis is extensive in literature (Baker & DeSalle, 1997; DeSalle & Brower, 1997; Baker et al., 1998; Gatesy et al., 1999). Brower (2006) reviewed the advantages and disadvantages of Bremer Support (BS) and Partitioned Bremer Support (PBS). PBS posses some disadvantages in some issues. For example, if the size of the partitions are different. In this study, the morphologic matrix contains 23 characters and the molecular matrix 2936 characters. So, the different size in the data set could influence in the results of analysis. Also, PBS also appears to be sensitive to missing data, and can shift dramatically among partitions as missing data are filled into the matrix. In morphological analysis, Falco vespertinus not posses syringeal characters, because is taxon is not sampled. However, preliminary runs without F. vespertinus were not affecting the results.

I agree with Brower (2006) who stated that PBS is a efficient tools to estimate the support degree in data sets because it is a more direct and less sophisticated way to document the accumulation of character support for a particular branch in a particular phylogenetic hypothesis. In the same way, the phylogenetic relationships within Falconidae are more supported for molecular data than morphological data. A interesting point may be study more morphological characters (osteological).


Bibliography

Arias, J. S., Garzón, I. J., & Miranda, R. D. (2007) Sistemática Filogenética: Introducción a la práctica. División Editorial y de Publicaciones UIS. Colombia.

Baker, R. H., & DeSalle, R. (1997) Multiple sources of character information and the phylogeny of Hawaiian Drosophila. Systematic Biology, 46, 654–673.

Baker, R. H., Yu, X., & DeSalle, R. (1998) Assessing the relative contribution of molecular and morphological characters in simultaneous analysis trees. Mol. Phyl. Evol., 9, 427-436.

Bremer, K. (1994) Branch support and tree stability. Cladistics, 10, 295-304.

Brower, A, V. Z. (2006) The how and why of branch support and partitioned branch support, with a new index to assess partition incongruence. Cladistics, 22, 378-386.

Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32(5), 1792-97.

Gatesy, J., O’Grady, P., & Baker, R. H. (1999) Corroboration among data sets in simultaneous analysis: hidden support for phylogenetic relationships among higher level artiodactyl taxa. Cladistics, 15, 271-313.

Griffiths, C. S. (1994) Monophyly of the Falconiformes based on syringeal morphology. Auk, 111, 787-805.

Griffiths, C. S. (1997) Correlation of functional domains and rates of nucleotide substitution in cytochrome b. Mol. Phyl. Evol., 7, 353-365.

Griffiths, C. S. (1999) Phylogeny of the Falconidae inferred from molecular and morphological data. Auk, 116, 116-130.

Griffiths, C. S., Barrowclough, G. F., Groth, J. G. & Mertz, L. (2004) Phylogeny of the Falconidae (Aves): a comparison of the efficacy of morphological, mitochondrial, and nuclear data. Mol. Phyl. Evol., 32, 101-109.

Hovenkamp, P. (2005) Branch Support. (Available in http://www.nationaalherbarium.nl/taskforcemolecular/PDF/branch%20supports.pdf).

Lambkin, C. L., Lee, M. S. Y., Winterton, S. L., & Yeates, D. K. (2002) Partitioned Bremer support and multiple trees. Cladistics, 18, 436-444.

Lee, M. S. Y., & Huggall, A. F. (2003) Partitioned likelihood support and the evaluation of data set conflict. Systematic Biology, 52, 15-22.

Page, R. D. M. 1996. TREEVIEW: An application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 12: 357-358.

lunes, 3 de marzo de 2008

On Support

Nowadays, the production of a phylogenetic hypothesis typically involves two steps: (I) a method of phylogenetic inference and (II) calculation of internal support measures to discriminate between groups with a clear phylogenetic signal (Salamin etal., 2003) given actual data and those with none.

Although, “support” has been interpreted in different ways, since a statistical measure of stability, confidence or probability of recovering a true phylogenetic group (Felsenstein, 1985), or a measure of the favouring evidence (Bremer, 1988; 1994), support is a measure of the relation between evidence in favor and evidence against a node (Goloboff & Farris, 2001; Goloboff etal., 2003; Ramirez., 2005). There are different ways to assess support, resampling techniques as jackknife and bootstrap or relative measures as relative Bremer. Despite resampling techniques has been used to estimate stability it also could be interpreted as a support measure because “the frequency with which replicates display a given group will be determined by the relative amounts of favorable and contradictory evidence” (Goloboff etal., 2003, see also Ramirez, 2005). The other support measure, the relative Bremer has the advantage that vary between 0 and 1, so they provide a directly comparable data between the favorable and contradictory evidence (Goloboff & Farris, 2001). Resampling measures could be compared with the strict consensus in order to detect problems or artifacts of the method(Ramirez, 2005) as underestimations (Simons etal., 2004).

Other approaches has been said to measure probability or even support, as the posterior probability of the Bayesian inference. Although, this measure should not be interpreted as a probability of truth (Simons etal., 2004) and not even as a support because the method is inappropriate for recovering groups not accordant with the data with high “support” (Simons etal., 2004) (given to problems as the impossibility of uniform priors on clades (Steel & Pickett, 2006)). Bayesian inference is just a probability of recovering a branch given the prior, the model and the data. Additionally, statistical view of resampling methods need data that perform a series of parameters that biological data just do not have, so is simpler to see resampling techniques as an indirect way to evaluate the relative amount of favorable and contradictory evidence based on the actual data (Goloboff etal., 2003; Ramirez, 2005).

On Bayesian Inference (BI)

Huelsenbeck etal. (2001) said Bayesian inference of phylogeny is a powerful tool for addressing a number of long-standing, complex questions in evolutionary biology. The power they talk about states in the plausibility of fixing prior distributions of the parameters, and based on those prior probabilities and the likelihood of the data inferred the posterior probabilities of a tree. The posterior probability of each clade is estimated based on the frequency at which that clade is recovered among sampled trees once stationary log-likelihood have been reached under an MCMC algorithm. The numbers on the branches are said to represent the probability that the clade is correct or true (Huelsenbeck etal. 2002). The MCMC chain also sums another virtue to the Bayesian inference: makes it fast.

But are real those virtues or is only a hope that attracts pushovers? Firstly the prior problem, how do we calculate the probability of something we have not seen (Sober, 2002)? Someone could say that the answer to the trees prior distribution is not a problem and we can use the same probability to all of them (flat priors) and consider all possibilities, but this takes off one of the magnificent virtues of the Bayesian inference that is to involve into the calculation the plausibility of an event to occur. Although, Steel & Pickett (2006) proved that only under a special case priors do not induces a uniform distribution on clades which makes impossible the support evaluation for particular clades when the probability could be influenced by the clade size.

One of the most attractive features of the Bayesian inference is the speed, but when we need to be sure that the chains converge the time increment with the complexity and size of the data sets (Goloboff and Pol., 2005). On chains convergence roots the possibility of estimate the posterior probability of each clade, so a wrong implementation of the method drives to mistaken estimations. Moreover, admitting that the posterior probability has being well estimated the way the MCMC chain is summarized could give auto-inconsistent answers (the topology with the data) because the majority rule consensus may not recognize certain similarities among trees, and may be a poor summary (Yang, 2006). Besides, the posterior probability cannot be seen as a universal probability of truth, because it is given by the data, the model and the prior, so it is just a “local” probability (Simmons etal., 2004; Yang, 2006). Finally, but not less, actually Bayesian inference inflates the probabilities of correct clades and recovers high probabilities for incorrect nodes ( i.e. Douody etal., 2003; Simons etal., 2004; Goloboff & Pol., 2005).

On IB

Bayesian methods deal with the notion of a probability distribution for the parameter; the distribution of the parameter before the data are analyzed is called the prior distribution. The Bayes theorem is used in Bayesian inference (BI) to calculate the posterior distribution of the parameter, that is, the conditional distribution of the parameter given the data (Holder and Lewis, 2003).

The general idea beyond the “tree inference” done by IB is to construct a Markov Chain that has as its state space the parameters of the statistical model, and a stationary distribution that is the posterior probability distribution of the parameters and run a sampling chain for long enough time; then sort sampled trees in probability order and pick trees until cumulative probability is reached! (Yang, 2006). Clearly it is not conceived as a search mechanism, but instead as a sampling mechanism and therefore it probably won’t find the individual trees of maximum a posteriori probability. There is still the difficulty of when to know whether the chain has run long enough and when the method converges, facts that are ignore in many publications.

There is an idea that IB provides measures of support faster than ML bootstrapping. Bayesian inference produces both a tree estimate and measures of uncertainty for the groups on the tree (Holder and Lewis, 2003). But the problem is that it attribute a high probability to false groups that should at least be recognized as ambiguous (Albert, 2005) and, indeed, when recognizing monophyletic group IB does it a frequentist way: check how many sampled trees claim a particular group is monophyletic and this is the probability of our group of being monophyletic!.

The optimal hypothesis under BI is the one that maximizes the posterior probability (Holder and Lewis, 2003). The posterior probability for a hypothesis is proportional to the likelihood multiplied by the prior probability of that hypothesis. In many publications Prior probabilities are ignored or a uniform distribution over the range of the parameters (flat priors) is used. It have been suggested that priors can be specified by using either an objective assessment of prior evidence concerning the parameter or the researcher's subjective opinion! (Yang, 2006), but when no available information about the parameter exists, it is unclear which prior is more reasonable. In the other hand it has been shown that uniform priors are not non informative -no prior represent total ignorance- and is generally accepted that personal prejudices influence statistical inference.


Note on Support:

If a data set contains homoplasy then different characters support different trees, hence which tree (or trees) a given data supports will depend on which characters have been sampled (Page and Holmes, 1998). I consider that support is a measure of how perturbation in the data gives a different result given that repeated sampling from the population is difficult and sometimes we are interested in what we call repeatability: the probability that another such sample shares the groups with the original one.

Estimates of phylogeny based on samples will be accompanied by sampling error. One way to measure sampling error is to take multiple resamples (pseudoreplicates) from our sample and build a tree. The variation among estimates derived from each pseudoreplicate is a measure of the sampling error associated with our sample. The simple bootstrap can be applied therefore as a perturbation tool to asses the stability (in the sense of continuity, a small perturbation in the data that produces only a small perturbation in the data that produces only small perturbation in the estimate) of the estimator (Holmes, 2003). Bootstrapping and jackknife (Bayesian methods based on Markov chain Monte Carlo as well) essentially make confidence statements for the trees. The other approach, Bremer support, examines how many extra steps are needed to lose a branch in the consensus tree of near-most-parsimonious trees. This method explores suboptimal solutions and determines how much worse a solution must be for a hypothesized group not to be recovered -the amount of contradictory evidence required to refute a group (Bremer, 1994)-.

On Bayesian Inference and Support

Bayesian Inference


The Bayesian inference (BI) in Systematics (Rannala & Yang, 1996; Yang & Rannala, 1997) is a most frecuently used methods in phylogenetic analysis in the last decade. The speed of BI in comparision with other methods such as Parsimony and Maximum Likelihood (ML). Further, the posteriori probability is a attractive concept to show certainty of the results in an analysis. However, BI posses several problems and mistakes in the phylogenetic Systematics (Alfaro et al, 2003; Erixon et al, 2003; Wheeler & Pickett, 2008).


A problem in BI is the absence of convergence in the results when new data are adhered to data set. This parameter, the consistency (same groups recovered in different run and data set), is a adequate parameter to estimate the "efficiency" of a phylogenetic method. The "efficiency" is measured as the amount of nodes recovered to adhere more data in an statistical analysis.


Evenly, the a priori assumptions in BI (priors) has been debated because its subjectivity and inffluence in the results (Huelsenbeck et al, 2001; Rannala, 2002). The priors are parameters with probability and characteristics a priori. This assumptions are treated as random variables. So, the rate of substitution, the substitution model, rate of evolution, and the prior probability of the initial trees are asigned before the analysis.


In the common works, priors are estimated frequently using previous studies or subjectively. In the same way, is posible that the posteriori probability be influenced by the asumptions a priori, namely,the bayesian calculation is sensible to the estimate of priors.


Finally, the result of BI is a "topology" that show groupings of clades supposedly. Nevertheless, this “topologies” are not phylogenetic trees, but they are a representations of groups with "high probability" of to be sampled. So, the phylogenetic relationship are not recovered in the bayesian analysis. Too is interesting the frequentist vision of Bayesian Inference, where the more probable grouping of taxa is the correct. This approach is inappropriated because the hypothesis in Phylogenetics Systematics must be corroborated (in Popperian sense), a approach more adequate to scientific objective of Systematics.


Support


Some authors (Alfaro et al, 2003; Douady et al, 2003; Cummings et al, 2003) claims that the results of BI is equall to parametric bootstrapping (however, this statement is not very strong!). Nevertheless, others authors states that the BI has advantages on parametric bootstrap because its easy intepretation and speed. The parametric bootstrapping (one kind of support in phylogenetic systematics) generates new data from the initial topology and makes a new search using the new data and model. While the a posteriori probability of a bayesian topology is based in the more high frecuencies of the nodes recovered. So, the Bi is not similar to parametric bootstrapping, although yours results be equals in some studies.


Other methods of support as no parametric Bootstrap (Felsenstein, 1985) Bremer support (Bremer, 1994), Jackknife (1996), and Bremer support (Relative Bremer support) sensu Goloboff & Farris (2001), are considered approaches to estimate the support of nodes in a topology. However, an ideal measure of support must can estimate support using the evidence in favor and against of the resulting groups (nodes). So, i believe that the measure of support more adequate in phylogenetic Systematics is the Bremer support sensu Goloboff & Farris (2001). Furthermore, the measures of support that uses evidence in favor only can be considered as frecuentist, in the same way as BI.


Bibliography


  • Alfaro, M. E., Zoller, S., & Lutzoni, F. (2003) Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol. Biol. Evol., 20, 255-266.

  • Bremer, K. (1994) Branch support and tree stability. Cladistics, 10, 295-304.

  • Cummings, M. P., Handley, S. A., Myers, D. S., Reed, D. L., Rokas, A., & Winka, K. (2003) Comparing Bootstrap and Posterior Probability Values in the Four-Taxon Case. Systematic Biology, 52, 477-487.

  • Douady, C. J., Delsuc, F., Boucher, Y., Doolittle, W. F., & Douzery, E. J. P. Comparison of Bayesian and Maximum Likelihood Bootstrap Measures of Phylogenetic Reliability. Mol. Biol. Evol., 20, 248-254.

  • Erixon, P., Svennblad, B., Britton, T., & Oxelman, B. (2003) Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Systematic Biology, 52, 665-673.

  • Farris, J. S., Albert, V. A., Kallersjo, M., Lipscomb, D., & Kluge, A. G. (1996) PARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR-JOINING. Cladistics, 12, 99-124.

  • Goloboff, P. A., & Farris, J. S. (2001) Methods for quick consensus estimation. Cladistics, 17, 26-34.

  • Huelsenbeck, J. P., Ronquist, F., Nielsen, R., & Bollback, J. P. (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294, 2310–2314.

  • Rannala, B. (2002) Identifiability of parameters in MCMC Bayesian inference of phylogeny. Systematic Biology, 51, 754-760.

  • Rannala, B., & Yang, Z. (1996) Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference.

  • Wheeler, W. C., & Pickett, K. M. (2008) Topology-Bayes versus Clade-Bayes in Phylogenetic Analysis. Mol. Biol. Evol., 25, 447-453.

  • Yang, Z., & Rannala, B. (1997) Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo Method. Mol. Biol. Evol., 14, 717-724.