martes, 25 de noviembre de 2014

Bayesianism and Popperian spirit... sounds like contradiction.

Hypotheses and theories are part of the science in a wide sense; both are the basis for its development and these are the essence to understand the Popperian ideas.

According to different dictionaries science is
A systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe”

It is important to know the meaning of this word to understand the subsequent discussion, but also the meaning of the words theory and hypothesis has to be clear in our minds. So, a theory
“Is a contemplative and rational type of abstract or generalizing thinking or the results of such thinking.”

“In modern science, the term "theory" refers to scientific theories, a well-confirmed type of explanation of nature, made in a way consistent with scientific method.”

And hypothesis is,
            “A proposed explanation for a phenomenon… and one can test it.”

I agree with some ideas of Sr. Karl R. Popper, which can be applied in the scientific life. However, I think that is not necessary to have them as the only thinking idea, because, even when there is a different stream of thought considered opposite to Popper ideas, it can works well with both in the evaluation of hypotheses and the evidence.

First, the principal idea that many people know about Popper is ‘Falsifiability’ or refutability, the logical possibility that a statement could be false by a particular observation or an experiment, but something “falsifiable” does not mean it is false. This idea is a little easy to understand because any statement that is formulated could be tested and will be falsifiable. It occurs in the way of singular and universal statements. So, if you have formulated a theory or a hypothesis is necessary that both have a degree of falsifiability, otherwise, you are in front of something totally true or an artifact.


Based on that, Popper concluded that a hypothesis, proposition, or theory is "scientific" if it is, among other things, falsifiable. That is, falsifiability is a necessary criterion for scientific ideas, but is not sufficient. Things that cannot be tested are strange to understand and would need to include a term as is ‘faith’. In addition, which contributions to science raise whether someone would answer a problem whose solution is already known or propose a theory adorned with hypothesis that prevent their falsifiability (ad hoc hypothesis).


Popper's view is not equivalent with confirmation and does not guarantee that the theory is true or even partially true. I think that if something does not falsify a statement, you should not conclude that is true, maybe it was the wrong way to apply the falsifiability, but neither is evidence of a statement confirmed.


People used to practice inductive thinking, arriving to general ideas from the particular ones. This class of thinking is appropriated to educate the scientific mind of children or people who wants to stay in science because, you can generate a global idea from many singular statement and this capacity of thinking is recognize in many Scientifics. However, it has a problem and Popper proposed falsification as a solution to the induction. The issue is that although a singular existential statement cannot be used to affirm a universal statement, it can be used to show that one is false. It is known like modus tollens, a rule of inference.


The famous example of swans is bringing here,


The singular observation of a ‘white swan’ cannot be used to affirm the universal statement ‘all swans are white’.

    The singular observation of a black swan show that 'all swans are white' is false.


Karl Popper's philosophy of science uses modus tollens as the central method of disconfirming, or falsifying, scientific hypotheses, is an useful tool that assist in discerning what hypothesis are really remarkable in science.


In addition, thanks to the inverse relationship between falsifiability and probability, proposed by Popper, is necessary formulated improbable theories in science; it has more sense than search for those in which there is some degree of confirmation.


It is relevant to cite Helfenbein & DeSalle (2005) who says, “The popperian spirit or critical attitude toward hypotheses is fundamental to all science”.


But as I said before there is another way of thinking and in many cases contradict the Popperian ideas, it is because Bayesianism assigns ‘degrees of belief’ that is like confirmation. Bayesian inference is an evidence-relationship, or confirmationist approach, and Popper’s corroboration is a non-bayesian test to the evaluation of hypotheses (Mayo, 1996). Also, Bayesianism allows informative priors and the prior knowledge or results of a previous model can be used to inform the current model.

"The Bayesian approach delivers the answer to the right question in the sense that Bayesian inference provides answers conditional on the observed data and not based on the distribution of estimators or test statistics over imaginary samples not observed" (Rossi et al., 2005). It is remarkable and one of the most interesting ideas of bayesianism, the way of have priors and the use of likelihood inside the formula is a significant thing, moreover, it can generate degree of beliefs and it is a decision theoretic foundation (Bernardo & Smith, 2000; Roberts, 2007).


The purpose of most of statistical inference is to facilitate decision-making (Roberts, 2007). The optimal decision is the Bayesian decision.         


The likelihood principle, by itself, is not sufficient to build a method of inference but should be regarded as a minimum requirement of any viable form of inference. (Rossi et al., 2005).



So, Bayesianism is a complete method of inference with prior probabilities, it integrates the likelihood principle and with it, you can obtain a result or posterior probabilities with a degree of belief… then you can take an optimal decision about your data and hypothesis.


In conclusion, I think that the ideas of Popper are not wrong and are useful in some aspects of sciences but the Bayesianism, even when is contradictory with Popper ideas is a relevant method of inference and I can say that is the best method to phylogenetic analysis at the moment.
__________________________________________________________________

Bernardo J, Smith A (2000). Bayesian Theory. John Wiley & Sons, West Sussex, England.
 
Helfenbein, K. G., & DeSalle, R. (2005). Falsifications and corroborations: Karl Popper’s influence on systematics. Molecular phylogenetics and evolution, 35(1), 271-280.

Mayo, D. G. (1996). Error and the growth of experimental knowledge. University of Chicago Press.

Robert, C. (2007). The Bayesian Choice. 2nd edition. Springer, Paris, France.


Rossi, P, Allenby, G, McCulloch, R. (2005). Bayesian Statistics and Marketing. John Wiley & Sons, West Sussex, England.

viernes, 14 de noviembre de 2014

Choose one...

Always in all life moment we place in the situation of choose between two or more options, maybe for social reasons in every area always tell us that we must take a decision, what religion we must believe, whether we are pro-yankees or not, what political party we belong, even in Biology we must choose. whether we are Botanist, Primatologist, Ornithologist, etc. In Compared Biologist class my professor asked me choose one philosophical current which I must use to make and approach my questions. I can choose between three different philosophical currents: Bayesianism, Likelihood Ism, Frequentist and Popperian  Falsificationis

We're going to talk each of the three streams. Although the Bayesianism has a origin very old, until only a few years had a major resurgence in science. This stream applies the probability that some events occur given certain observations (priors), which are changing or being updated. The probability is denoted Pr(H|O), and is the probability of the hypothesis given the data, observations or evidence (Sober, 2008; Kruschke, 2011). On the other hand is the Likelihoodism, which lacks of priors and the logic is contrary to the Bayesianism, looks for the probability how well the hypothesis fits to the data Pr(O|H) (Sobber, 2008; Royal, 1998). Both branches are extremely powerful to make inferences and contrasting hypotheses, the only difference is the concept used for making comparisons (Sober 2008, Kruschke, 2011). The likelihoodism uses the concept of favoring to show that the evidence says regarding the comparison of two hypotheses, while the Bayesianism adopts the concept of confirmation to show that the evidence says regarding a hypotheses and it's negation (Sober, 2008. Pag 34). Finally, the frecuentism, which has dominated the last century, is based on the probability that an evidence or event occurs depending on a set of experiments and N (Johansson, 2011), that gives a ratio and the value P is calculated and compared with a null model or null hypotheses(Sober, 2008; Johansson, 2011; Wagenmakers, 2007). The main difference (i think) is it's philosophical perspective on the comparasion of hypotheses and the use of priors in Bayesianism (Sober, 2008), in other topics are very similar. As a first approach to this affirmation I put the following exercise: Suppose that we have our hypothesis (H) and data (T), now when we do the Bayesian analysis we seek the probability of H given T(Pr(T|H)), in contrast to Likelihood we want to look at how well H fits to our data T(Pr(H|T)), when we apply the Bayes theorem to our example we have: p(T|H)p(H) = p(H|T)p(T). Then = Pr(T|H)=[p(H|T)p(T)]/p(H). From a value of Likelihood we can get the posterior values, the example is somewhat crude and simplist but implies the idea. I don't pretend to fill this with formulas and derivations that even I can't explain but Branden Fitelson from page 7 of his article "Likelihoodism, Bayesianism, and Realtional Confirmation" shows us some examples of how some Bayesian measures are more Likelihoodians than Bayesians and vice versa, if someone wants deepen in the topic.
One of the main problems in Bayesianism and Frecuentism is little objectivity when the data are managed (Ayacaguer, 2000). On one side are the priors of the Bayesian analysis, and have more influence in the analysis and their value can be altered to benefit any particular hypothesis, this is one reason why many people argue the unreliability of this method when is used in daily life, in government agencies because anyone can manipulate priors to the own benefit. It has used 'flat-priors' as a solution to this problem, which causes that the entire analysis falls on the Likelihood, but the Frequentism is not far behind, because you can manipulate the P values or the values of positive or negative false (the famous alpha and beta), to favor some result in special, just as the criticism is the use of a null hypothesis (Ayacaguer, 2000; Johansson, 2011). Similarly the Frequentism present the N problem, because the P values are influenced by the sample that was used, so we can know beforehand what would be the result if we use a small N or a big N and the P value also can be influenced subjectively by the amount of N that is used (Ayacaguer, 2000; Wagenmakers, 2007; Johansson, 2011). So if it's subjectivity we have a winner ¡Likelihoodismo!. But we don't get excited because Likelihoodism also has crtitics and one of these is it's restriction to some cases (Sober, 2005).
According everything written above, it appears that Likelihoodism is the best stream and therefore I'll choose it, but no. It can sound crazy but for me and after of all this timeexploring this trend I can conclude that one can't choose any stream in special, but I have to highlight that all have good and bad things and for that reason I consider they complement each other and all can be used in Bayesin analysis (obviously without declare Bayesianista). To understand this idea we must have the main components in mind of Bayesian analysis: the priors and the Likelihood. Already I denote the relationship between both Likelihood and Bayes using the theorem. On the other hand, in priors is where It would enter the Frequentism, we can consider the results from a Frequentist analys as priors in the Bayesin analysis, let me give you an example: Suppose you arrive to a new city and want to know if that month is rainy or not, throughout the month you take notes on what days is raining and is not raining, assuming that it rains 25 of 30 days. From that relationship and calculating the P value you will know that this month is rainy or not, but What could you say from this assertion on the following months?, really nothing, but from these observations you could infer how likely is that the next month it will rainy, because throughout that month we have noticed that in general before the rain come the sky is clouded, so if the next day we see the sky is clouded (O), we know that the probability of rain is going to be high (H), all thanks to the prior value we obtained from our frequentist observations. I would like to give an example that occurred to me while I played Xbox to better explain my idea. Suppose we are going to fight with the  Final Boss, at the begining, we dont know anything, how are attacks, his  moves and we have to spend one or more lives to defeat it, is in that moment where our Bayesian, Likelihoodism and Frequentism analysis arises !!. At first we do not know how to approach the enemy (flat priors) and defeat it (H) with our initial strategy (T) is very unlikely (low likeliihod, p (T | H)), which ultimately leaves a unlikely to pass the game (posterioris bayes, Pr (H | T)). As we move forward in the fight we noticed that the enemy has a particular frequency for certain attacks, then we will know what is the probability of making certain attack, these probabilities increase as you fight more and make more observations on the movements of the enemy, this, I believe, is a well frequentist analysis (we have an accumulation of knowledge and increase our priors), once we know these make a change in our strategy (T) and the probability of defeat given that our change strategy (Pr (T | H)), so that eventually the probability of passing the game increases (Pr (H | T)).
So, you will ask, ¿ where is the Popperian Falsificationism ?, well, I think that is the less critical among the 4 currents. Basically Popper says us: In science we must reject some theories and hypotheses to corroborate others ( but, It doesn't mean that these are true), something like Modus Tollens Tollens (If A is true, no mean that B are true too). So in this way the three currents use  Popper's logic: The likelihood 'favoring' one hypotheses over another, Bayesianism 'confirms' one hypotheses respect its own negation, and Frequentism compare one hypotheses against the null hypotheses, but we never corroborate that are true hypotheses
 I think choose between any of these three currents, is like choose just one phylogenetic search method is better, all three have good and bad things and often what really influences is the data, not the method. I consider more appropriate, for example what Morrone and Crisci do with the two methods of historical biogeography (Panbiogeography and Historical Cladistic), they show how each method is complementary each other, and that are necessary steps for a good biogeographic analysis (Morrone & Crisci, 1995; Morrone 2001). This (I think), allows us to look the problem at multiple ways and allows find multiples and well solutions, avoiding bias. Always tell me that extremes are not good, so, why we don't avoid the extremes, and find a intersection between them ? and take the better of each one, just imagine how world will change if religious zealots get find a middle point. At the end these are only methods and it seems to me more crucial and critical the objectivity with which the researcher will  analyze the results.

___________________________________________________________________________________

References.

  • Branden Fitelson. Likelihoodism, Bayesianism, and Relational Confirmation. Syntheses (2007).
  • Tobias Johansson. Hail the imposible: p-values, evidence and likelihood. Scandinavian Journal of Psychology (2011).
  • L.C. Silva Ayacaguer & A. Muñoz Villegas. Debate sobre métodos frecuentistas vs bayesianos. Gac Sanit (2000).
  • Eric-Jan Wagernmakers. A practical solution to the pervaise problems of p values. Psychonomic Bulletin & Review (2007).
  • Silvio Pinto. El Bayesianismo y la Justificación de la inducción. Principia (2002).
  • Royall, R.  Statistical Evidence: A Likelihood Paradigm, Boca Raton, Fla.:Chapman and Hall.(1997).
  • Elliot Sober.  Evidence adn Evolution: The Logic Behind The Science. Cambridge University Press, United States of America. (2008).
  •  Kruschke, J.K.. Bayesian data analysis: A tutorial with R and BUGS. Amsterdam: Elsevier. 2011.
  • Juan J. Morrone. Homology, biogeography and areas of endemism. Diversity and Distribution (2001).
  • Sober, E.: 2005, ‘Is Drift a Serious Alternative to Natural Selection as an Explanation of Complex Adaptive Traits?’. In: A. O’Hear (ed.): Philosophy, Biology and Life. Cambridge: Cambridge University Press.

domingo, 28 de agosto de 2011


A geometrical approach to know the distributional pattern/structure of the neotropical species of Staphylinidae: Plochionocerus Dejean & Agrodes Nordmann.

Daniel Felipe Silva Tavera

Introduction

The beetles species of the genus Plochionocerus, and Agrodes, have been recently subjected of phylogenetic and Biogeographic analysis[1][2]. The species of these genus have caracteristic large body size and metallic colorations; as a result of their systematic revision, several synonyms were detected, mainly for species of Plochionocerus, which currently comprise 18 species and Agrodes with 2 species[1]. a Track analysis of these sister taxa were implemented using the Croizat`s manual reconstruction[2]. Three generalized tracks were identified from 15 individual tracks. This track analysis provides further species supporting the primary biogeographic homology of the 3 detected generalized tracks, which correspond to 3 major biotic components. Two of the generalized tracks are in the Caribbean subregion and a third in the Amazonian subregion[3]. In order to avoid the ambiguity and the subjective factor that lies on the traditional track analysis[4], in this opportunity is implemented a geometrical approach to know the distributional pattern/structure of the species of Plochionocerus & Agrodes, and answer the question: Are the generalized tracks representing the general patterns of distribution in the neotropical species of Plochionocerus and Agrodes?.

Methods
The distributional information of 13 of 18 Plochionocerus species and the 2 species of Agrodes are considered here. 279 record were used for build the input file with the distributional data, to be used on MartiTracks[4]. 38 of these records come of GBIF(accessed through GBIF data portal, Entomology Collection, http://data.gbif.org/datasets/resource/7911), 2 records from CENTO-UIS, the rest from the revision work of ASIAIN et al in 2007 and Herman Lee in 2001[5]. The parameters values implemented are show in the commands1.txt file(below).

Results & Discussion

From 14 original (individual) tracks, were proposed the hipotesis primary of biogeographic homology, represented by 6 generalized track(fig 1). four are in the Amazonic subregion and two are in the Caribbean subregion (1 on the mesoamerican domain and the other in the Northeast South American domain). the 4 amazonic generaliced tracks are based on the individual tracks of A. conicicollis, A. elegans, P. janthinus, P. igneus, P.fulgens, P.splendens and the 2 Caribbean generalized tracks are based on the individual tracks of P.discedens, P.simplicicollis, P.ashei, P.humeralis, P.impressipennis, P.marquezi, P.puncticeps, A. elegans. The geographical distribution of P.newtonorum and P.pronotalis does not coincide with any of the generalized tracks obtained. From my geometrical approach to know the distributional pattern of these staphylinids , the hipotesis primary of homology biogeographic shown by Asiain et al (3 general tracks), is reevaluated, considerating the six general tracks proposed above. Nine species have been recorded exclusively from South America, 2 exclusively from Central America and 4 are shared between both areas. However, these results allow corroboration of previous biogeographic hypotheses about the mesoamerican and southamerican tracks from other component of the staphylinid biota[6].

Conclusion.
The implementation of a geometrical tool represent an unambiguous pangiogeographic approach to know the distributional pattern of these taxa.

[1] Asiain, J., J. Márquez and J. J. Morrone. 2007. Phylogenetic systematics of the genera Plochionocerus Dejean and Agrodes Nordmann (Coleoptera: Staphylinidae: Xantholinini).
Zootaxa 1584:1-53

[2] Asiain, J., J. Márquez and J. J. Morrone.2010. Track analysis of the species of Agrodes and Plochionocerus (Coleoptera:Staphylinidae). Revista Mexicana de Biodiversidad 81: 177- 181, 2010

[3] Morrone, J. J. 2006. Biogeographic areas and transition zones of Latin America and the Caribbean islands based on panbiogeographic and cladistic analyses of the entomofauna.
Annual Review of Entomology 51:467-494.

[4] Echeverría-Londoño, S. & Miranda-Esquivel, D. R.2011. MartiTracks: a geometrical approach for identifying geographical patterns of distribution. PLoS ONE, 6(4), 0018460


[5] Herman, L.2001. Catalog of the Staphylinidae (Insecta: Coleoptera). 1758 to the end of the second millennium. VI. Staphylinine Group (Part 3). Staphylininae: Staphylinini (Quediina, Staphylinina, Tanygnathinina, Xanthopygina), Xantholinini. Staphylinidae Incerta Sedis Fossils, Protactinae. Bulletin of the American Museum of Natural History,
265, 3021–3840.


[6] Márquez, J. and J. J. Morrone. 2003. Análisis panbiogeográfico de las especies de Heterolinus y Homalolinus (Coleoptera, Staphylinidae, Xantholinini). Acta Zoológica Mexicana (nueva serie) 90:15-25

commands1.txt

sset cv 0.25
set lmin 0.5
set lmax 0.75
set maxline 1
set ci 0.8
kmlgen
croizat0

bash: croizat0.sh
#!/bin/bash
wine mt05-win32.exe test1.dat test1.dat.kml commands1.txt


Phylogeny of Tabaninae: A critique to Abu El-Hassan et al. (2010)


Introduction

Tabanidae is a Diptera famyly , which has been reconized the monophyly on basis of molecular information (Wiegmann et al. 2000; Morita, 2008). However, relationships within the family have not been resolved. Abu El-Hassan et al. (2010) based on morphological characters, perform the phylogeny of this family. They did not present a formal phylogenetic analysis, their characters are ambiguous and how to perform the analysis is not adequate. The objective of this study is to evaluate the results obtained by Abu El-Hassan et al. (2010) and compared by a phylogenetic analysis using parsimony criteria.

Materials and methods

For phylogenetic analysis were used 20 terminal taxa and 91 morphological characters recoded from the matrix proposed by Abu El-Hassan et al. (2010), all based on adult morphology. The cladistic analyses, it was implied weights search (Goloboff 1993). with differents concavity values from one to ten using TNT version 1.0 (Goloboff et al 2004). The tree search strategy was an traditional search using tree bisection reconnection randomizing the addition sequence 100 times. Then, made a tree search after jackknife 37%; and, calculated the number of initial groups (those without resampling) recovered after jackknife (Goloboff 1997). Analyzed the character distribution made with WINCLADA 1.00.08 (Nixon 2002).

Results and discussion

All characters presented by Abu El-Hassan et al. (2010) were binary characters, and many of them had ambiguous coding. Most of the characters were recoded binary characters to multistate characters as antennal scape color, antennal pedicel and antennal shaped. The most of the groups recover was implicit weight search with the concavity value of nine. Using this concavity value, we obtained 1 trees (Fit k=9= 8.533). The recovered nodes with each concavity value used are shown in figure 1.


Figure 1. Average number of recovered nodes based on repeating ten runs,
after Jackknife resampling with integer concavity values from one to ten
under implicit weights
Concavity value Average of the shared consensus nodes
1 0,5789
2 0,5789
3 0,6316
4 0,6842
5 0,6316
6 0,6316
7 0,6316
8 0,7895
9 0,8421
10 0,7895


The phylogenetic analysis support monophyly of Tabaninae, however the internal relationships are no resolved. The Atylotus genera appears as monophyletic, contrary to the results presented by Abu El-Hassan et al. (2010) , This relationship is supported by one character, upper and middle calli separated. The character distribution are shown in figure 2, Finally it is recommended to repeat the analysis by expandind the number of taxa (ingroup and outgroup) and characters, as well as review and coding characters.




Figure 2. Analyzed the character distribution made with WINCLADA 1.00.08, Jacknife 37%, k=9.

References

Abu El-Hassan, Gawhara M. M, Haitham B. M. Badrawy, Salwa K. Mohammad and Hassan H. Fadl (2010). Cladistic analysis of Egyptian horse flies (Diptera: Tabanidae) based on morphological data. Egypt. Acad. J. biolog. Sci., 3 (2): 51- 62.

Goloboff, P. A. (1993) Estimating character weights during tree search. Cladistics 9: 83–92.

Goloboff, P. A. (1997) Self-weighted optimization: tree searches and character state reconstructions under implied transformation cost. Cladistics 13: 225-245.

Goloboff, P. A., Farris, J. S. & Nixon, K. (2004) T. N. T:Tree Analysis Using New Technology, Version 1.0. Program and documentation, available from www.zmuck.dk/public/phylogeny/TNT

Morita, S.I. 2008. A phylogeny or long-tongued horse flies(Philoliche, Diptera:Tabanidae) with the first cladistic evaluation of higher relationships within the family. Invertebrate Systematics, 22(3): 311-327.

Nixon, K. C. (2002) WinClada Version 1.008. Sofware implementation. Published by the author. Ithaca. New York. Available from www.cladistics.com




EVALUATION OF THE GEOGRAPHIC STRUCTURE IN DENGUE VIRUS TYPE 1 FROM A PHYLOGENETIC AND BIOGEOGRAPHIC APPROACH


INTRODUCTION

Phylogenetic relationships amongst strains of dengue virus often can show a strong structure associated with geography and temporality (Gray et al. 2011; Carvalho et al. 2009), however geography seems to be the main component in modeling these phylogenetic reconstructions. Likewise, global comparisons of lineages and their geographic location have allowed further classifications of isolates from the same serotype into new genotypes known as topotypes (Samuel and Knowles, 2001). However, due to the poor georeferencing of the isolates in public databases, sometimes to make inferences about geographic patterns is hard and doubtful because the management of country´s political division can be biogeographically inadequate and little detailed. Based on the above, the aim in this work was to assess the congruence between geographic patterns found from phylogenetic and biogeographic approaches in dengue virus type I circulating in America.

METHODS

Phylogenetic analysis of 50 DENV-1 E gene sequences were assess from the Bayesian inference criterion using BEAST v1.6.2 program (Drummond & Rambaut, 2007), under a General Time Reversible model of nucleotide substitution (Rodriguez et al.,1990) with gamma-distributed rate variation and a proportion of invariable sites (GTR + G + I) were selected and two runs of 4 chains were run for ten millions of generations. Sequences were sampled in American counties, including islands in the Atlantic and Pacific Oceans

From the topology (maximum clade credibility tree) obtained, in the Phylogeographic analysis were identified possible genotypes according to five areas intuitively postulated on the basis of geographic information contained in each clade. The criteria used were monophyletic clades and posterior probabilities values above 0.80. Results were constrasted with the subclusters found by Carvalho et al.2010.

Finally, the geographic patterns were evaluated following the method of track compatibility by Craw (1988a, 1989a). The areas used were those postulated in this work and the biotic components of Latin America and the Caribbean compiled by Morrone (2004). under the level of large regions and provinces.

RESULTS AND DISCUSSION


The phylogenetic relationchips from American sequences seems to be structured by geographics patterns. According with this, five areas were proposed corresponding to Pacific, Caribbean, southern South America, central América and Northern south America. These components were determined following the geographic information available to each viral isolated. Intuitively, central and Northern south America were taken as independent unities.

Figure 1. Maximum clade credibility tree in Bayesian analysis of E

gene sequences representing Latin America strains. Posterior probabilities are shown for key nodes.






Phylogeographic analysis pointed the same pattern like phylogenetic analysis, also SAN and CA were closely related. The strong geography associated structure posibbly indicates the continous viral movement between different countries and in differents directions. On the other hand, the viral exchange seems to be limited and uneven among areas, even though they are geographycally close, as with the Caribbean and Central America.

Figure 2. Phylogeographic patterns between genotypes and postulated areas in Dengue virus type 1

Tracks compatibility analysis resulted in a clique (based in regions) representing a pattern that related Mexican transition area with Neotropical Region, which is congruent with the relationship between SAN and CA areas in phylogeographic analysis. This is probably due to the magnitud of the areas which includes a higher proportion of distributions and strains that are distribuited in intermediate areas. Areas delimited as Provinces by Morrone (2004) and phylogeographic areas delimited here, do not showed compatible traks.

Figure 3. Traks compatibility analysis. a) Areas proposed in this study. Biotic components of Latin America and the Caribbean b) Provinces c) Regions.

CONCLUSION

Phylogenetic and Biogeographic analysis in dengue virus can reflect a similar geographic pattern however is necessary to know the level in which both approaches can be congruent. In this study, Central America and northern South America form a large unit that corresponds to the clique found in the track compatibility analysis, which supports the close relationship between the Mexican transition area and the Neotropical region. Obviously, the use of geopolitical units in the assessment of geographical structure in shaping the phylogenetic relationships dengue is not the most accurate and dengue virus strains behave as a large dispersive population connecting large areas in America.

REFERENCES

Carvalho SE, Martin DP, Oliveira LM, Ribeiro BM, Nagata T (2010) Comparative analysis of American Dengue virus type 1 full-genome sequences. Virus Genes 40: 60–66.

CRAW, R. C. 1988. Continuing the synthesis between panbiogeography, p

hylogenetic systematics and geology as illustrated by empirical studies on the biogeography of New Zealand and the Cha tham Islands. Systematic Zoology 37: 291-310.

CRAW, R. C. 1989a. New Zealand biogeography: A panbiogeographic approach. New Zealand Journal of Zoology 16: 527-547

Drummond AJ & Rambaut A (2007) "BEAST: Bayesian evolutionary analysis by sampling trees." BMC Evolutionary Biology 7, 214

Gray, R. R., Pybus, O. G. and Salemi, M. (2011), Measuring the temporal structure in serially sampled phylogenies. Methods in Ecology and Evolution. doi: 10.1111/j.2041-210X.2011.00102.x

Morrone, Juan J. 2004. Panbiogeografía,componentes bióticos y zonas de transición. Fonte: Rev. bras. entomol;48(2):149-162

Samuel, A. R., Knowles, N. J. 2001. Foot-and-mouth disease type O viruses exhibit genetically and geographically distinct evolutionary lineages (topotypes). Journal of General Virology 74, 2281-2285.

domingo, 22 de mayo de 2011

Measure support branches

Gualdrón-Diaz J. C.

Once it has obtained cladograms; it is important to know how strong is the evidence that supports a node. There are different ways to interpret the support (Stability, confidence levels and reliability) and different methods to asses it; the most popular are the resampling methods such as Bootstrap and Jackknife and those linked to relative optimality values such as Bremer support (Wheeler, 2010). For this must be a clear distinction in some terms. According Goloboff et al. (2003); Brower (2006, 2010) support and stability are logically different, support for a given branch in a tree is a measure of the net amount of evidence that favors the appearance of that branch in a most parsimonious topology and stability is the persistence of a given branch in the face of the addition, deletion, or reweighting of characters, taxa, or both from the data matrix as in bootstrap and jackknife approaches. Likewise, strong statistical assumptions are necessary to interpret jacknife or bootstrap as confidence levels (Felsenstein, 1985). Another way to measure the support for individual branches of a cladogram is Bremer support, also referred as the “decay index”(Bremer, 1994). It is measured by comparing the fit of the data to optimal and suboptimal trees. This support measure two different aspects of group support. The absolute bremer estimated amount of favorable evidence (Bremer, 1994) and relative bremer (Goloboff and Farris, 2001) estimated the ratio between favorable and contradictory evidence (Goloboff et al., 2003). Both support and stability are attributes have proven to be particularly tricky to measure in a direct manner, due to the complexity of character interactions in homoplastic data (Goloboff and Farris, 2001). Nevertheless, these measure serves as a means to discern groups that are plausible from those that are dubious,and can act as a guide to the generation of additional data to refine and improve the hypothesis (Brower, 2006).

Jackknifing and bootstrapping sometimes produce incoherent results. Uninformative characters and characters irrelevant to the monophyly of a group can influence the values of support for Jacknife and Bootstrapp, to solve this Farris et al. (1996) proposed to assign equal probabilities of deletion to individual characters. Similarly Goloboff et al. (2003) suggest a Poisson-based sampling regime for bootstrapping that also alleviates this problem. One clear advantage of the jackknife over the bootstrap is that the values on branches are less affected when there are characters with homoplasy(Freudenstein and Davis, 2010). Another wrong conclusion with regard to support both for Jackknife and Bootstrapp is when some characters have differents weights or costs, producing either under or overestimations of the actual support (Goloboff et al., 2003).This influence of the weight can be eliminated by symmetric resampling, done that the probability of increasing the weight of the character equals the probability of decreasing it (Goloboff et al., 2003); so, given the above, this explains the differences in the error produced by jackknife and bootstrap.


Bremer support rather than being an estimate based on pseudoreplicated subsamples of the data (like bootstrapping and jackknifing) is a statistical parameter of a particular data set and thus is not dependent on the data matching a particular assumed distribution; an advantage of bremer support that it never hits a maximum value (such as 100%), and continues to increase as character support for a particular branch in the tree accumulates (Brower, 2006). A defect of that method is that it does not always take into account the relative amounts of evidence contradictory and favorable to the group. This problem is diminished if the support for the group is calculated as the ratio between the amounts of favorable and contradictory evidence (Goloboff and Farris, 2001). This method is known as relative bremer and its potential advantages are that their values vary between 0 and 1 and they provide an approximate measure of the amount of favorable/contradictory evidence. Under weighting methods the bremer supports may be hard to interpret, but the relative supports for different weighting strengths are directly comparable (Goloboff and Farris, 2001). A disadvantage of the relative supports is that the values of in different pairs of trees must be calculated carefully.

An important extension of bremer support was the discovery by Baker and DeSalle (1997) is Partitioned Branch Support (PBS). The PBS value for a particular branch for a given data partition is determined by subtracting the length of the data partition on the MP tree(s) from the length of the data partition on the MP anticonstraint tree(s) for that branch (Brower, 2006). Thus, given partition may contribute positively, be neutral or conflict with the weight of the evidence that supports a particular branch in combined analysis.PBS allows exploration of partition incongruence within a total evidence framework (Brower et al., 1996). This ability to localize incongruence to a single partition for a single branchs has the potential to reveal both interesting evolutionary processes, such as selection on a particular gene. Partitioning data is a potentially useful way to explore incongruence of signal among characters from different sources (Brower, 2006). PBS has the advantage that parameters calculated are using the complete data matrix and may be for any combination of partitions. One of the problems with PBS is that it is sensitive to missing data, and can shift dramatically among partitions as missing data are filled into the matrix (Brower, 2006). Much of the critism of support measures is focused upon their employment of reanalyses of data subsets or partitions as though they were separate sources of evidence, but as have pointed out Goloboff et al. (2003), no measure of clade quality yet developed is immune to certain cases conceivable.

According Brower (2006) there are no objetive means to set a criterion of rejection of support or stability for a particular branch in a particular cladogram. Nevertheless the support for the current data does not necessarily imply that this will be robust to addition of taxa and characters: support today is no guarantee of stability in the future. For this reason, measurements that imply a confidence interval like bootstrap values are potentially misleading; By contrast bremer support, because it has no upper bound, is more direct and way to document the accumulation of character support for a particular branch as additional data are incorporated in a particular phylogenetic hypothesis (Brower, 2006).

References

RH Baker and R DeSalle. Multiple sources of character information and the phylogeny of hawaiian drosophilids. 1997.

Kare Bremer. Branch support and tree stability. Cladistics, 10(3):295–304, 1994. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1994.tb00179.x. URL http://dx.doi.org/10.1111/j.1096-0031.1994.tb00179.x.

A. V. Z. Brower, R. DeSalle, and A. Vogler. Gene trees, species trees, and systematics: A cladistic perspective. Annual Review of Ecology and Systematics, 27(1):423–450, 1996. doi: 10.1146/annurev.ecolsys.27.1.423. URL http://www.annualreviews.org/doi/abs/10.1146/annurev.ecolsys.27.1.423.

Andrew V. Z. Brower. The how and why of branch support and partitioned branch support, with a new index to assess partition incongruence. Cladistics, 22(4):378–386, 2006. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2006.00113.x. URL http://dx.doi.org/10.1111/j.1096-0031.2006.00113.x.

Andrew V. Z. Brower. Stability, replication, pseudoreplication, support and consensus a reply to brower. Cladistics, 26(1):112–113, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00319.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00319.x.

James S. Farris, Victor A. Albert, Mari KAllersjA¶, Diana Lipscomb, and Arnold G. Kluge.Parsimony jackknifing outperforms neighbor-joining. Cladistics, 12(2):99–124, 1996. ISSN 1096-0031. doi: 10.1111/j.1096-0031.1996.tb00196.x. URL http://dx.doi.org/10.1111/j.1096-0031.1996.tb00196.x.

Joseph Felsenstein. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 39(4):783–791, 1985. ISSN 00143820. doi: 10.2307/2408678. URL http://dx.doi.org/10.2307/2408678.

John V. Freudenstein and Jerrold I. Davis. Branch support via resampling: an empirical study. Cladistics, 26(6):643–656, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00304.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00304.x.

Pablo A. Goloboff and James S. Farris. Methods for quick consensus estimation. Cladistics, 17(1):S26–S34, 2001. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2001.tb00102.x. URL http://dx.doi.org/10.1111/j.1096-0031.2001.tb00102.x.

Pablo A Goloboff, James S Farris, Mari Kallersj, Bengt Oxelman, M J Ramirez, and Claudia A Szumik. Improvements to resampling measures of group support. Cladistics, 19(4):324–332, 2003. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2003.tb00376.x. URL http://dx.doi.org/10.1111/j.1096-0031.2003.tb00376.x.

Ward C. Wheeler. Distinctions between optimal pected support. Cladistics, 26(6):657–663, 2010. ISSN 1096-0031. doi: 10.1111/j.1096-0031.2010.00308.x. URL http://dx.doi.org/10.1111/j.1096-0031.2010.00308.x.


Branch Support: confidence, stability, credibility?

By Susana Ortiz

One way of assessing whether a clade present in a phylogenetic reconstruction really is part of the true configuration in the phylogeny, is evaluating its support, which may be established by estimating confidence intervals based on sampling methods (Bootstrap and Jackknife), and Bremer support, based on the length difference of trees as a stability measure. Even if, this approaches are not independent of the search strategy given that they are sensitive to its effectiveness (Freudenstein and Davis, 2010). Therefore a highly weighted clade, not necessarily means it is real, maybe is just the kind of response that fits to the resources used (e. g. search strategy). Posterior probabilities in Bayesian analyses have been used as a probabilistic measure of support (e. g. Goloboff et. al, 2003; Pickett and Randle, 2005), because it quantifies credibility, how likely a certain clade is to be correct, given the data, model and priors (Huelsenbeck et al., 2002). Comparision between Bayesian and nonoparametric Bootstrapping was proposed by Efron et al. (1996), where the bootstrap confidence level can be thought as the assessments of error for the estimated tree. However, posterior probabilities are sensitive to the prior for internal branch lengths (Yang Z., Rannala 2005), and are significantly higher than corresponding nonparametric bootstrap frequencies when the models used for analyses are underparameterized (Goloboff et. al, 2003). Despite have been several the attempts to come close the different approaches under certain conditions, this approaches are not freely assessable under all phylogenetic criteria given some restrictions not only methodological but conceptual.

Bootstrap and Jackknife are resampling techniques from the original data to infer variability of the estimate, in this case the phylogeny. The variation among trees provide an adequate indication of the uncertainty (Felsestein, 1985). Nevertheless, Bootstrap has also been proposed as a tool to assess robustness with regard to small changes in data (Holmes, 2003), it is not a test of how accurate is a topology but provides information about its stability, as well as to assess whether the data are adecuate to validate the topology (Berry and Gascuel, 1996). As for repeteability unless it is a perfectly Hennigian data set (Felsestein, 1985), is expected to have variations between replicas, so one might think that many replicates would mean a greater precition regarding the idea of which groups are monophyletic, but according to Pattengale et al. (2009), rather small number of Bootstrap replicates (typically after 100–500 replicates) producing support values that correlate at better than 99.5% with the reference values on the best ML trees.

This last, although the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. In the same way, the above does not mean that a clade is or is not monophyletic depending on its support, this just points out the certainty with which you can find a particular node in the topology. If this node are not in the Bootstrap consensus, it could means there is a polytomy due to multiple nodes’ resolutions maybe by incongruence between characters. Mort et al (2000), compared Bootstrap and Jackknife, their findings show the relation between the bootstrap’s values and the deletion proportion chosen in Jackknife. However, in favor of Jackknife, it has been proposed as a rapid and efficient method to identify strongly supported clades (Farris et al. 1996) and the assigment of equal deletion probabilities to characters, it reduces the problem of competition bewteen informative and noninformative characters (Freudenstein and Davis, 2010).

Bremer support (Bremer, 1984) is another alternative to measure support, although only under Parsimony criterion. This method measures the diference between the most parsimonious cladogram and suboptimal that lacks of interes clade (Grant and Kluge, 2008). So in Bremer a strongly supported branch means a large increment in the length of the suboptimal trees. The absolute (Bremer, 1984) and relative Bremer support (Goloboff and Farris, 2001) are variants depending on the type of evidence that it takes into account. The firts measures the absolute amount of favorable evidence, and second the ratio between favorable and contradictory evidence to the group, and both represent two aspects of support that can vary independently (Goloboff et al., 2003). Bremer support as a support measure has been interpreted as a stability measure, so independent to the influence to autapomorphies and lower frequencies for better supported groups, however, have raised objections to this vision, such that stability depends of the specific scenario as noted Goloboff et al. (2003) “a group stable under additions of characters may be very unstable under addition of taxa or under recoding of some charactes” but bremer as support only is based on the available evidence.

Homoplasy is another factor affecting the estimation of support, clades delimited by “unique and unreversed” or relatively less homoplastic character states are often considered more strongly supported (Grant and Kluge, 2008), although all support aproaches are not equally sensitive. According to Freudenstein and Davis (2010) The values on branches not affected by homoplasy are slightly higher for the bootstrap than the jackknife, but the addition of homoplastic characters caused support on branches affected by homoplasy to drop substantially more, as measured by the bootstrap than as measured by the jackknife different to Bremer support which takes the distribution of homoplasy into account (Sanderson, 1995). Incongruence between characters, the proportion of homoplastic characters versus homologous, additivity, and character weighing (in bootstrap) are key topics in the evaluation of support. Number of nonhomoplastic synapomorphies supporting a clade provides a numerical estimate of the support of a hypothesis but maybe it does not provide evidence than favor a hypothesis over some another alternative (Wilkinson et al. 2003). I agree with Grant and Kluge (2003) about support measures do not test phylogenetic hypotheses, they evaluate the relative degree or strength of evidence.


References

- Berry, V. and Gascuel, O. (1996). On the interpretation of bootstrap trees: Appropriate threshold of clade selection and induced gain. Molecular Biology and Evolution 13 999–1011.
- Efron B., Halloran E., and Holmes S. 1996. Bootstrap confidence levels for phylogenetic trees. Recherche, 93(14):7085–7090.
- Erixon, P B. Svennblad, T. Britton y B. Oxelman. 2003. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylo- genetics. Systematic Biology 52: 665-673
- Farris, J.S., 1996. Jac. Computer Program Distributed by the Author. Moleky-larsystematiska laboratoriet, Naturhistoriska riksmuseet, Stockholm, Sweden.
- Felsestein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783–791.
- Freudenstein J. V., Davis J., I. 2010 Branch support via resam
pling: an empirical study. Cladistics, 26:1–14.
-
Goloboff P. A., Farris J. S., K Mari, J Ram, and C. A. Szumik. Cladistics Improvements to resampling measures of group support. Cladistics, 19:324–332, 2003. doi: 10.1016/S0748-3007(03)00060-4.
- Grant, T., Kluge, A. G. 2003. Data exploration in phylogenetic inference: scientific, heuristic, or neither. Cladistics 19, 379–418.
- Grant, T., Kluge A. G. 2008. Cladistics Clade support measures and their adequacy. Cladistics, 24:1051–1064, 2008.
- Holmes S. 2003. Bootstrapping Phylogenetic Trees :. October, 18(2):241–255, 2003.
- Huelsenbeck, J. P B. Larget , R. E. Miller, and F. Ronquist. 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51:673–688.
- Mort, M.E., Soltis, P Soltis, D.E., Mabry, M.L., 2000. Comparison of three. S., methods for estimating internal support on phylogenetic trees. Syst. Biol. 49, 160–171.
-
Pattengale N. D., Masoud Alipour, Olaf R. P. Bininda-emonds, Bernard Memoret, and Alexandros Stamatakis. 2009. How Many Bootstrap Replicates Are Necessary ? (i):184–200.
- Pickett, C.P Randle. 2005. Strange bayes indeed: uniform topological priors imply non-uniform clade priors, Molecular Phylogenetics and Evolution 34.
- Sanderson, M.J., 1995. Objections to bootstrapping: a critique. Syst. Biol. 44, 299–320.
- Wilkinson, M., Lapointe, F.-J., Gower, D.J., 2003. Branch lengths and support. Syst. Biol. 52, 127–130.
- Yang Z., Rannala B. 2005. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54(3), 455-70.