domingo, 29 de noviembre de 2015

Phylogenetic inference and philosophy, different approaches for the same purpose


************** This post was updated on 27.01.2016***************


Phylogenetic inference attempts to elucidate the evolutionary relationships among organisms.
Various approaches have been made for this purpose, they differ in their rationale for addressing the problem (De Queiroz & Poe, 2001). Among the best known approaches are parsimony, bayesian inference and  likelihoodism. Below I will discuss some of its characteristics, basic assumptions and finally express which one, in my opinion, comes more adequately to face a phylogenetic analysis.

Parsimony has its grounds in the principle of simplicity. Proponents of the principle of parsimony argue that this approach is justified by the ideas of Karl Popper. This implies that the phylogenetic hypothesis must be falsifiable and  rigorous tests, so be corroborated. From this perspective, those hypotheses with the least amount of  changes should be preferred, thus minimizing the number of ad hoc explanations (Grupe & Harbeck, 2015). But the preference for simpler explanations does not mean that nature behaves well, evolution does not have to be parsimonious. This seems difficult to understand, and in fact, sounds contradictory.

Statistical approaches such as maximum likelihood or Bayesian inference share using evolutionary models that take into account the probability of  changes  between  character states and base frequencies (Archibald, Mort, & Crawford, 2003). In a parsimonious approach  seems that these changes are equally likely.

The likelihood method can be defined as the probability of a hypothesis given the data   of the data given a hypothesis. This approach seeks to find that hypothesis explains the observed data (characters) in the manner that maximizes the probability that these are observed.

Bayesian inference is different from the likelihood that takes into account prior knowledge to the observations,  it is posibe assign probabilities to hypotheses (Topologies) before observations are made. The main problem with the Bayesian inference is its distinguishing feature. The priors can be a double-edged sword, on the one hand allow the process to incorporate prior knowledge of phylogenetic inference, but actually priors are difficult to accurately estimate  (Velasco,2008). For several authors is a common practice then assign equal priors, but this means that the main advantage of this method had just wasted . Additionally, what if the priors are estimated incorrectly, this could result in a bias in the results of the process.

The advantages of statistical methods seem obvious, they make assumptions on a given model. Parsimony however, assumes any evolutionary model, or does this mean that the changes are equally likely, it does not seem logical, especially if we speak of continuous characters. Given these difficulties with the priors, in my opinion, a likelihoodism approach is most appropriate for phylogenetic inference.

References


  • Grupe, G., & Harbeck, M. (2015). Taphonomic and Diagenetic Processes. En W. Henke & I. Tattersall (Eds.), Handbook of Paleoanthropology (pp. 417–439).
  •  De Queiroz, K., & Poe, S. (2001). Philosophy and phylogenetic inference: a comparison of likelihood and parsimony methods in the context of Karl Popper’s writings on corroboration. Systematic Biology, 50(3), 305–321.
  • Archibald, J. K., Mort, M. E., & Crawford, D. J. (2003). Bayesian inference of phylogeny: a non-technical primer. Taxon, 187–191.
  • Velasco, J. D. (2008). Philosophy and The Tree of Life (Doctoral dissertation, Ph. D. Thesis). University of Wisconsin-Madison).

  




Maximum Likelihood versus Parsimony and Bayesian inference

Many authors emphasize in parsimony method by resorting to realism, and the simplicity of the assumptions (Goloboff, 2003). While others say that parsimony subject to specific models is the same as Likelihood (Farris, 1983). Below I will discuss some arguments that lower use of parsimony as a method to clarify the evolutionary relationships among organisms. 

First, that offers simplicity parsimony in assumptions does not mean that these are clear and they are the best. Many wonder what really are the assumptions about the evolutionary process that takes parsimony method? Just assume that the offspring having modification with respect to their ancestors? Assumptions parsimony leave many doubts. 

Second, parsimony does not discriminate changes in the branches are more probability or improbable. It not assumed if a branch is more probability to change over another (Sober, 2004). It is, for parsimony no selection for one character over another (Sober, 2002). It is at this point that the Maximum Likelihood method has its advantages. Using evolutionary models allows us from propositions given by the data and calculate the probability given the hypothesis (Goloboff, 2003). In addition to the rate of Likelihood we can measure the strength of the statistical evidence and so choose the topology more Likelihood (Royall, 1999). 

 On the other hand, it is the Bayesian inference method, a probabilistic method like Likelihood uses evolutionary models. This method uses priors basis for calculating the posterior probability of the data given hypothesis. One risk of using priors is that these can become subjective and condition the calculation of posterior probabilities. I personally think that Bayesian inference is a modification of Likelihood, but with more potential for bias given the priors. 


Andrea Lizeth Silva Cala 

Reference

Goloboff, P. A. (2003). Parsimony, likelihood, and simplicity. Cladistics, 19(2), 91-103.

Farris, J. (1983). The logical basis of phylogenetic analysis (pp. 7-36). na.

Sober, E. (2004). The contest between parsimony and likelihood. Systematic biology, 53(4), 644-653.

Sober, E. (2002): “Reconstructing Ancestral Character States – A Likelihood Perspective on Cladistic Parsimony.” The Monist 85: 156-176.

Royall, R. (1999): The Strength of Statistical Evidente. Johns Hopkins University Department of Biostatisks 615 North Wolfe Street Baltimore MD 2120.5 USA

Bayesianism and likelihoodism


¿Bayesianism or Likelihoodism?

Let me start with the Royall's three questions:

1. ¿What does the present evidence say?
2. ¿What should you believe?
3. ¿What should you do?

Although Likelihoodists and Bayesians both share the likelihood principle and the law of likelihood which are important in the philosophy of scientific method, they disagree on several instances:

Its necessary highlight that the most remarkable difference between them is that Bayesians use prior probabilities in other words posterior probability distributions that require prior probability distributions and likelihood functions and likelihoodists not.

Another difference points to the meaning of evidence: Likelihoodists characterize data as evidence and they don't use them to guide our beliefs or actions and maintain that this characterization is valuable in itself (Royall 1997, Ch. 1). Then, you couldn't give answer to the second nor the third question of Royall because they say nothing about what you should believe after you receiving the evidence without take into account what you believe before receiving the evidence. On the other hand forBayesians the prior is updated in the light of new data that is the evidence (Sober, 2008) from this perspective you could give answer to all questions.

Regarding to the second question about your degree of belief Bayesians answer this question from the concept of confirmation where the observation (O) provides confirmation of hypothesis 1 (H1) when this has a higher likelihood than its own negation (Gandenberger, 2013). Unlike Likelihoodists whom doesn't use this concept of confirmation, they don't take into account if the evidence raises, lowers or not change the probability of the hypothesis. They compare hypotheses to each other which have their own likelihoods and use the law of likelihood to interpret the data where: the observation (O) favors hypothesis 1 (H1) over hypothesis 2 (H2) if Pr (O | H1) > Pr (O | H2) and the likelihood ratio is used to show the degree to which O favors H1 over H2 that is given by Pr(O | H1) / Pr(O | H2), and they ask if H1 has a higher likelihood than H2. So, for likelihoodists is enough use the likelihood ratio as a measure of degree favoring one hypothesis over other one (Sober, 2008). In contrast to Bayesian for whom is not enough and then implemented the use of posterior probabilities, see below.

In Bayesian inference you assign a probability to the hypothesis (H) before doing an observation in other words is the distribution of the parameters before doing analysis of the data (prior probability) and after of doing it there is a reallocation of the probability assigned to H and the probability in the light of evidence is known as posterior probability and is denoted Pr(H|O) that means probability of the Hypothesis given the Observation. In contrast Maximum likelihood where the likelihood of the hypothesis is the probability that H confers on O Pr(O|H) (Sober, 2008).

Other thing in common is that ML and BI use the same models of evolution, but the way to measure the support of relationships in the topology are different, ML uses bootstrap support (BS) which is a measure of confidence, and uses data resampling to estimate the support (Cummings et al., 2003). Unlike BI that uses the posterior probability (PP) which is calculated from prior probability, likelihood functions and data. Both measures have been controversial because of several reasons and some claim there is a equivalence between both measures (Efron, H. and Holmes, 1996), but some studies like Erixon et al. (2003) reject this assumption and others claim PP is a better measure of support (Alfaro, Zoller, and Lutzoni, 2003).

Given the similarities and differences between them I think that Bayesian inference is the best method of all.

Bibliography

Alfaro, M. E., Zoller, S., & Lutzoni, F. (2003). Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Molecular Biology and Evolution, 20(2), 255–266.

Cummings, M. P., Handley, S. A., Myers, D. S., Reed, D. L., Rokas, A., & Winka, K. (2003). Comparing bootstrap and posterior probability values in the four-taxon case. Systematic Biology, 52(4), 477–487.

Efron, B., Halloran, E., & Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proceedings of the National Academy of Sciences, 93(23), 13429.

Erixon, P., Svennblad, B., Britton, T., & Oxelman, B. (2003). Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Systematic Biology, 52(5), 665–673.

Gandenberger Greg . 2013. Why I am not a likelihoodist.


Royall, R. Statistical Evidence: A Likelihood Paradigm, Boca Raton, Fla.:Chapman and Hall.(1997).

SOBER, Elliott. Evidence and evolution: The logic behind the science. Cambridge University Press, 2008.


 

Philosophy in the biological world

The discussion about construction of how knowledge is built are not new, from Plato and his proposed world of ideas to Kant in his criticism to pure reasons(1) is underlined that the construction of knowledge which we call science is not dogmatic and static, instead it is a non volatile element and  a conditioned subject to time-space paradigm own of humanity, reducing everything to a purely linguistic problem (2). Therefore, syncretism is not a symptom of intellectual immaturity or inferiority of it, but a prudent demonstration against scholastic thought own religious processes, some scholars confused with scientific work.

Added to all this it is important to clarify that, contrary to sciences like mathematics, physics, and chemistry. The theoretical corpus in biological sciences completely lacks axiomatic systems that support the developed theories. While evolution is a fact. The causality of the phenomenon is highly debated due to the number of ad-hoc theories and hypotheses (3, 4). It is in this environment that the phylogenetic theories that attempt to answer the evolutionary relationships of organisms are developed. Therefore this essay  put on the table Bayesian analysis , the likelihood and parsimony as "irreconcilable" philosophical. Trying to approach the reality of evolutionary phenomena (not yet finished to be clear) to consider some as "the best ".

Let's start with the parsimony in which methodologically the tree with fewer transformations is selected, this is derivative of the philosophical principle that the simplest theory must be correct for being the least complex (5). A strongly nominalism position that can result in multiple ad- hoc theories that could never be applied to different events rather than the themselves cases. Following this logic we could generate own theories for each type of phylogenetic relationships of every living form. Which will lead to a greater number of hypotheses the number of species (counting the species already extinct). Which paradoxically ends up being contrary to the principle of parsimony.
In contrast to the  parsimony, the probabilistic models certainly have an advantage developing stochasticity in their methods, thus avoiding a possible fall in phylogenetic Laplace demon advantage. Maximum likelihood analyzes the conditional probability of the observations given the hypothesis, or in more colloquial terms how well the data fit in a given hypothesis. In which each tree is considered a hypothesis generated by choosing the highest likelihood. However likelihood ignores previously gathered evidence about the event at that epistemological terms can only be assigned certain degree of value as truth. By contrast,  Bayesian analysis is an excellent tool for the analysis of phylogenetic hypothesis by quantifying past evidence (prior) and included in the phylogenetic analysis. It is therefore the most appropriate tool to address the problem to elucidate the evolutionary relationships of living forms.

References

  1. Kant, Immanuel, and Norman Kemp Smith. 1929. Immanuel Kant's Critique of pure reason. Boston: Bedford.
  2. Wittgenstein, Ludwig. 1922. Tractatus logico-philosophicus. London: Routledge & Kegan Paul.
  3. Margulis, Lynn; Dorion Sagan (2003). Captando Genomas. Una teoría sobre el origen de las especies. Ernst Mayr (prólogo). David Sempau (trad.) (1ª edición). Barcelona: Editorial Kairós
  4. Darwin, Charles Robert. The Origin of Species. Vol. XI. The Harvard Classics. New York: P.F. Collier & Son, 1909–14.
  5. Robert Audi, ed., Ockham's razor, The Cambridge Dictionary of Philosophy (2nd Edition), Cambridge University Press.

martes, 22 de septiembre de 2015

Host structure of the Phylogeny of West Nile Virus WNV: Does it shape the spatiotemporal structure?

Introduction
The West Nile Virus (WNV) is a mosquito-born flavivirus that causes neurologic diseases such as encephalitis, meningitis, and acute flaccid paralysis (Lim, Koraka, Osterhaus, & Martina, 2011)⁠⁠. Similar to other flaviviruses, WNV is an enveloped virus with a single-stranded, positive sense, ∼11-kb RNA genome whose strains are grouped into at least 7 genetic lineages. WNV was first isolated in Uganda in 1937. Posteriorly, the first large outbreak of West Nile neuroinvasive disease (WNND) was recorded in Romania in 1996, with 393 confirmed cases (Tsai, Popovici, Cernescu, Campbell, & Nedelcu, 1998)⁠⁠. Three years later, it became a global public health concern after its introduction into North America, and subsequently into Central and South America (Lanciotti et al., 1999)⁠⁠. Since then, major outbreaks of WNV fever and encephalitis took place in all continents, apart from Antarctica, causing human and animal deaths. Although its enzootic cycle is mainly maintained between mosquitoes and birds, it can eventually infect horses, humans, and other vertebrates (Hayes et al., 2005)⁠⁠. Despite this variety of hosts, studies on the host structure and its influence on the spatiotemporal structure are still scarce. Since host genetic factors have a significant influence on disease distribution patterns, the overall purpose of this study was to assess the host structure of the phylogenetic relationships of WNV in a phylogeographic context, taking the spatiotemporal structure into account.

Specific Objectives
To identify the lineages of each viral strain.
To infer the main phylogeographic events.
To determine the host shift events within spatiotemporal structure.

Methods
Sequence Data: All the available sequences of complete genome of WNV, with collection times, and geographic locations( 453 sequences, from 25 countries, and 79 hosts species) were retrieved from GenBank. In order to identify and delete recombinants, clones, and duplicates from the data base, I used Uclust v1.2.22q with 99 % of identity (Edgar, 2010)⁠. A sequence of Japanese encephalitis virus (JEV) was used as the outgroup. Subsequently, all the WNV sequences were aligned using the algorithm of multiple sequence alignment, implemented in MUSCLE v3.8.31 (Edgar, 2004). The substitution model for the Envelope gene sequences was selected using Akaike information criterion with PhyML (Guindon, Dufayard, Lefort, & Anisimova, 2010)⁠⁠, called from the function phymltest{ape} (R Core Team, 2014). Phylogenetic signal was calculated for both complete coding sequence and E gene sequence, using TreePuzzle v.5.3(Schmidt, Strimmer, Vingron, & von Haeseler, 2002)⁠.

Lineages identification: A Maximum likelihood (ML) inference with the complete coding sequence, was performed using RaxML, with 20 searches and 100 bootstrap replicates, which are considered as sufficient for large data sets (Kozlov, Aberer, & Stamatakis, 2015)⁠⁠. Every lineage was assumed as a monophyletic group as sugested by (MacKenzie & Williams, 2009)⁠, and all the obtained clades will be revised taking previous studies into account.

Phylodynamics: Topologies, model parameters, evolutionary rates, TMRCA, viral population size variation over time were co-estimated for the E gene sequences dataset, using an uncorrelated log-normal relaxed clock model (rate: 0.053(Subbotina & Loktev, 2014)⁠(rationale given in (May, Davis, Tesh, & Barrett, 2011)⁠, and the MCMC method implemented in the BEAST package v1.8.2 (Drummond, Suchard, Xie, & Rambaut, 2012)⁠. I set up a phylogeographic Bayesian stochastic search variable selection (BSSVS) procedure for location data, and host as discrete traits, for this approach assumes exchange rates in the continuous-time Markov Chain(CTMC) to be zero with the prior probability, in order to find the most parsimonious set of rates explaining the diffusion process along the phylogenies. Bayesian skyline plot was used as a coalescent prior during the estimation over time of the change in effective population size per generation, per year (Ne.g). The MCMC analysis was run twice for 50 million generations, with sampling every 10000. MCMC convergence was measured by estimating the effective sampling size (ESS), using Tracer software version 1.5 (http://tree.bio.ed.ac.uk/software/tracer/). Uncertainties were estimated as 95% high probability densities (95% HPD). 

Host-Shift Events: The results for the two runs were combined for final analysis and BF support for host shift. Transition rates supported by a BF > 3 will be considered as significant support for a host shift between species. The obtained topologies will be summarized in a maximum clade credibility (MCC) tree, and annotated by the use of TreeAnnotator (http://beast.bio.ed.ac.uk/treeannotator).

Phylogeographic analysis: I used the software SPREAD v.1.7(Bielejec, Rambaut, Suchard, & Lemey, 2011)⁠, to visualize the diffusion rates over time. Locations were assigned by two capital letters using the ISO 31-66 alpha code, and coordinates corresponded to the centroids of each country. Bayes Factor (BF) test was run to get the support of diffusion rates among localities (Carlo, Pagel, Meade, Pagel, & Meade, 2013; Lemey, Rambaut, Drummond, & Suchard, 2009).

Results and Discussion

WNV lineages

I obtained a final dataset of 52 sequences, from 19 countries and 24 host species. The ML tree(Figure 1), showed the seven lineages previously proposed (MacKenzie & Williams, 2009; Mann, McMullen, Swetnam, & Barrett, 2013): Ia(20 sequences), II(9 sequences), IV(6 sequences), Ia2(10 sequences), Ib(2 sequences), Ia3(2 sequences), Ia1(3 sequences).⁠


Figure 1. Phylogeny inferred using a Maximum likelihood analysis of 53 sequences. Numbers correspond to bootstrapping values.

Spatio-temporal structure

Lineages 1a, 1a1-3 contain isolates from Africa, Europe, the Middle East, Russia, and the Americas, and includes isolates from all recent outbreaks. Phylogeny in (Figure 2) shows the spatio-temporal history, in which WNV exists in an endemic cycle for certain areas such as Australia, whereas it is epidemic in Europe, being reintroduced regularly from Africa either directly (in western Europe) or via the Middle East (Pesko & Ebel, 2012). Estimations of the TMRCA of lineages 1a,1-3(Table 1), are supported with the records of outbreaks and introductions for the clade (Amore et al., 2010; Gubler, 2007; Jerzak, Bernard, Kramer, & Ebel, 2005). Significantly, introduction into other geographic areas has occurred on one occasion only in each region, leading to subsequent establishment and expansion of the virus in these areas. WNV was successfully introduced in the Americas in 90's and subsequently became endemic across most temperate regions of North America (Amore et al., 2010; May, Davis, Tesh, & Barrett, 2011).

Lineage II, on the other hand, has been associated with outbreaks of West Nile virus in Western and Eastern Europe, and appears to have established endemic cycles in Spain and Greece (Pesko & Ebel, 2012). The estimated TMRCA (112.2 (1900)) shows the origins of this lineage might be older than reported (Botha et al., 2008). Lineage IV groups numerous isolates made in Russia, first detected in 1988 from a Dermacentor tick, and since isolated from mosquitoes and frogs in 2002 and 2005 in Russia (May et al., 2011). Estimation of TMRCA for this lineage (26.4 (1985)) is consistent to the reported dates of introduction in Western Europe.



Figure 2. WNV Spatio-temporal structural. Topology corresponds to Maximum Credibility Tree (MCC) for all lineages in time. Values in nodes are posterior probabilities. In legend, countries are represented by two capital letters using the  ISO 3166-1 alpha code.


Table 1. Estimation of the TMRCA for each lineage.



Stat Ia Ia2 Ia3 II IV Ia1
TMCRA 89.3 (1922) 74.1 (1938) 42.8 (1970) 112.2 (1900) 26.4 (1985) 74.2 (1937)
95% HPD [69.7,114.5] [32.3, 107.8] [13.4,91.6] [66.5,176.7] [14.1,48.8] [46.1, 107.3]


In general, the phylogeographic structure of the virus was recovered for the sampled lineages, and estimations of the TMCRA were consistent to the reported dates of introductions. Distributions and spatial structure show that several lineages(except Ia1), have been reintroduced to locations of the past. 

Phylogeography

The discrete phylogeography analysis shows major centers of spread of WNV in different regions. The interactive kml file can be found in anexes.  BF>3 support major migration rates between Western Europe and the Middle East, Mediterranean region, and the Americas; and Africa with Western Europe. See anexes for BF information.

Figure 3. Representation of the geographic diffusion process of WNV throughout the time.

Host shift events
Host shift between mosquitoes and other metazoans shapes the phylogenetic structure of each clade (Figure 4). On the other hand, BF analysis indicates the probabilities to migrate from one host onto another, with remarkably strong evidence (BF>3). See anexes.

Figure 4. Phylogeny with Host structure for WNV. Topology corresponds to Maximum Credibility Tree (MCC) for all lineages in time. Values on branches represent the posterior probability. Thickness of branches represents the probability of host shift for each node. Legend names correspond to the abbreviations of species' names.

Conclusions

Host-shift events for the WNV can be observed in the phylogeny. Rates support  shifts from Culex mosquitoes to humans, birds, horses, and Urotaenia fish, with BF values over 5.

Merely good estimations of WNV history can be done from a small dataset, as the one used for this study, when a representative sample is obtained in terms of host species information and spatio-temporal structure. 



Anexes
A1. Phylogenetic signal from the downsampled dataset used to run analysis in this study, corresponding to: A. Envelope gene, B. complete coding sequence.

A2. Datasets used in the study.
A.2.1. Full
A.2.2. Downsampled 1
A.2.3. Downsampled 2

A3. kml file with the visualization of the probable migratory paths of WNV over time.

A4. Bayesian Factors for geographical diffusion rates.

A5. Bayesian Factors for host-shift rates.

A1-A5 can be accessed here
https://www.dropbox.com/sh/7djz0api54h7dnx/AAA-9i0OVWd1UR-q4Eiv8ASIa?dl=0

A6. Host species abbreviations


host host abbreviation
Homo sapiens Hs
Culex annulirostris Ca
Sylvia nisoria Sn
Rousettus leschenaultii Rl
Anopheles maculipennis Am
Mus musculus Mm
Equus caballus Ec
Culex pipiens Cp
Culex univittatus Cu
Dermacentor marginatus Dm
Columba livia Cl
Phalacrocorax carbo Pc
Ochlerotatus sticticus Os
Aedes vexans Av
Culex nigripalpus Cn
Corvus corone Cc
Uranotaenia unguiculata Uu
Culex quinquefasciatus Cq
Mimus polyglottos Mp
Corvus brachyrhynchos Cb
Aquila chrysaetos Ac
Phoenicopterus ruber Pr
Culex tarsalis Ct
Accipiter gentilis Ag

References

Bielejec, F., Rambaut, A., Suchard, M. a., & Lemey, P. (2011). SPREAD: Spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics, 27(20), 2910–2912. doi:10.1093/bioinformatics/btr481.

Drummond, A. J., Suchard, M. a, Xie, D., & Rambaut, A. (2012). Bayesian P hylogenetics with BEAUti and the BEAST 1 . 7. Molecular Biology and Evolution, 29(8), 1969–1973. doi:10.1093/molbev/mss075.

Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460–2461. doi:10.1093/bioinformatics/btq461.

Guindon, S., Dufayard, J.-F., Lefort, V., & Anisimova, M. (2010). New Alogrithms and Methods to Estimate Maximum- Likelihoods Phylogenies: Assessing the performance of PhyML 3.0. Systematic Biology, 59(3), 307–321.

Hayes, E. B., Komar, N., Nasci, R. S., Montgomery, S. P., O’Leary, D. R., & Campbell, G. L. (2005). Epidemiology and transmission dynamics of West Nile virus disease. Emerging Infectious Diseases, 11(8), 1167–1173. doi:10.3201/eid1108.050289a.

Kozlov, a. M., Aberer, a. J., & Stamatakis, a. (2015). ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers. Bioinformatics, (March), 1–3. doi:10.1093/bioinformatics/btv184.

Lanciotti, R. S., Roehrig, J. T., Deubel, V., Smith, J., Parker, M., Steele, K., … Gubler, D. J. (1999). Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science (New York, N.Y.), 286(5448), 2333–2337. doi:10.1126/science.286.5448.2333.

Lim, S. M., Koraka, P., Osterhaus, A. D. M. E., & Martina, B. E. E. (2011). West Nile virus: Immunity and pathogenesis. Viruses, 3(6), 811–828. doi:10.3390/v3060811.

MacKenzie, J. S., & Williams, D. T. (2009). The zoonotic flaviviruses of southern, south-eastern and eastern Asia, and australasia: The potential for emergent viruses. Zoonoses and Public Health, 56(6-7), 338–356. doi:10.1111/j.1863-2378.2008.01208.x.

Mann, B. R., McMullen, A. R., Swetnam, D. M., & Barrett, A. D. T. (2013). Molecular epidemiology and evolution of West Nile virus in North America. International Journal of Environmental Research and Public Health, 10(10), 5111–5129. doi:10.3390/ijerph10105111.

May, F. J., Davis, C. T., Tesh, R. B., & Barrett, A. D. T. (2011). Phylogeography of West Nile virus: from the cradle of evolution in Africa to Eurasia, Australia, and the Americas. Journal of Virology, 85(6), 2964–2974. doi:10.1128/JVI.01963-10.

Schmidt, H. a, Strimmer, K., Vingron, M., & von Haeseler, A. (2002). TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics (Oxford, England), 18(3), 502–504. doi:10.1093/bioinformatics/18.3.502.

Subbotina, E. L., & Loktev, V. B. (2014). Molecular evolution of West Nile virus. Molecular Genetics, Microbiology and Virology, 29(1), 34–41. doi:10.3103/S0891416814010054.

Tsai, T. F., Popovici, F., Cernescu, C., Campbell, G. L., & Nedelcu, N. I. (1998). West Nile encephalitis epidemic in southeastern Romania. Lancet, 352(9130), 767–771. doi:10.1016/S0140-6736(98)03538-7.



¿Total evidence, nuclear genes or non-nuclear genes? Here is the dilemma.

Introduction.

Datation methods for evolutionary process studies are broad used in different groups like Chiroptera (Teeling et al. 2005) or Gimnosperms (Rydin & Petra. 2009). Even the non-nuclear DNA are the most widely reported by its own characteristics: high rate evolution, uniparental inheritance or absence of recombination (Zardoya & Meyer. 1996). The results on datation using nuclear partitions and non-nuclear partitions could be differentials (Vawter & Brown. 1986), but this subject had not been evaluated even though with the widely use of the technique. The main goal in this work is contrast the estimate datation times and the standard deviation using nuclear genes, non-nuclear genes and combination of both.

_______________________________________________________________________________

Materials and Methods.

Data: 3 different partitions were created: Non-nuclear genes, nuclear genes and total evidence (Nuclear + Non-nuclear) for three taxas: Testudines, Picidae (Aves), Chiroptera (Mammalia) and Gimnospermas.

Phylogeny: A phylogenetic analysis under Maximum Likelihood with the Jukes Cantor nucleotide model with PhyML v. 3.0 (2012-12-08)(Guidon et al. 2010) was run for each partition. To avoid nucleotide model influence in the estimation, the same model were implemented in all partitions.

Datation: The Heuristic rate smoothing  algorithm (HSRA) implemented  in the BaseML packages from the software PALM (Yang. 2004) was used for the SD and datation time estimation. Two clock model were used: Strict Clock (SC) and Local Clock (LC). For each group  as minimum 3 fossils calibration points where implemented in the analysis (Magallon et al. 2013).

Analysis: The delta of standard deviation (ΔSD) and the datation time estimate were compared among each partition in a group. A Spearman correlation was made between  the ΔSD and the number of common nodes (Fig. 1) in each group.

________________________________________________________________________________

Results and Discussion.

On average, the topologies present a 40% of common nodes (range: 33 - 47%), and compare those nodes no differences in the datations generate under SC and LC where found in all groups. Chiroptera and Picidae present a trend in the SD where nuclear and total evidence partitions had the lowest values; While Gimnosperms and Testudines no present any trend in the SD (Fig. 2). Using the ΔSD, in general the nuclear and total evidence were the partition with the lowest values. Just Testudines present a different resutl, where the non-nuclear and nuclear partitions had the lowest values (Fig. 2).

The Spearman correlation shows inversely proportional relationship between the common node number and the ΔSD, despiting that it wasn't significant (p > 0.05) the relation is not descarted because the low number of taxonomic groups (< 5)(Fig.3) and it could be associate to the reduce number of terminals in the phylogeny.

Comparing with other previous works (Teeling et al. 2005; Rydin & Perea. 2009; Lourenco et al. 2012), the total evidence paritition present similar results in the datation times and SD. The partition combined are recomended in datation analysis, but also nuclear genes are a good choice too. Is necessary evaluate the effect of the number of of tips (terminals) in the phylogeny, and the use of different nucleotide evolution models.

________________________________________________________________________________

Figures. 
Figure 1. Two  comparable nodes between two differents topologies.

Figure 2. Standard deviation (SD) for all internal nodes of partitions from each group.
 
Figure 3. ΔSD values for each partition in each group and the Spearman regression with its respective rho value.


________________________________________________________________________________

Bibliography.
 
Catarina Rydin & Petra Korall. 2009: Evolutionary Relationships in Ephedra (Gnetales), with Implications for Seed Plant Phylogeny. Int. J. Plant. Sci. 170 (8):1031-1043.

Emma C. Teeling, Mark S. Springer, Ole Madsen, Paul Bates, Stephen J. O'Brien and William J. Murphy. 2005: A Molecular Phylogeny for Bats Illuminates Biogeography and the Fossil Record. Science Vol (307): 580-584.

Guidon S., Dufayard J. F., Lefort V., Anisimova M., Hordijk W., and Gascuel O. 2010: New algorithm and methods to estimate maximum likelihood phylogenies: assessing the perfomance of PhyML 3.0. Systematic Biology, 59 (3): 307-321.

Joao M. Lourenco, Julien Claude, Nicolas Galtier and Ylenia Chiari. 2012: Dating cryptodiran nodes: Origin and diversification of the turtle superfamily Testudinoidea. Molecular Phylogenetics and Evolution. 62: 496-507.

Lisa Vawter and wesley M. Brown. 1986: Nuclear and Mitochondrial DNA Comparisons Reveal Extreme Rate Variation in the Molecular Clock. Science Vol (234): 194-195.
Susana Magallon, Khidir W. Hilu and Dietmar Quandt. 2013: Land plant evolutionary timeline: Gene effect are scondary to fossil constraints in relaxed clock estimation of age and substitution rates. American Journal of Botany 100 (3): 000-018.

Rafael Zardoya & Axel Meyer. 1996: Phylogenetic perfomance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Molecular Biology and Evolution 13 (7): 933-942.

Ziheng Yang. 2004: A heuristic rate smoothing procedure for maximum likelihood estimation of species divergence times. Acta Zoologica Sinica. 50(4): 645-656.
 
_______________________________________________________________________________

Supplements.

1.
 
Genes: 


Picidae (Aves): COI, ND2, CYTB, Brahama Protein (BRM), Beta-Fibrogen (BFG), Aconitasa 1 (ACO1).

Chiroptera: 16s, COX, ND2, CYTB, RAG1, RAG2, ATP7a, BRCA1.

Gimnosperms: MATK, RBCL, ATPB, 18s, 26s, 5.8s.

Testudines: 12s, COX, CYTB, NAD4, RAG, Brain-derived Neurotrophic Factor (BNDF), Aryl Hydrocarbon Receptor 1 (AHR), Nerve Growth Factor (NGF).

2.

Fossils.

Picidae (Aves). Colaptes 1.8 Mya. Pliopicus spp. 13.6 Mya. Paleonerpes shorti. 11 Mya.

Chiroptera. (mayor información: E.C.Teeling et al., 2005 10.1126/science.1105113). Notonycteris spp. 30 Mya. Trachypteron franzeni (Emballonuridae) 37 Mya. Philisis spp. + Chamtwaria spp. + Chibanycteris spp. 37 Mya. Brachipposideros spp. + Pseudorhinolophus spp. 55 Mya.

Gimnospermas. Antarcticycas spp. 171 Mya. Rissikia spp. 245 Mya. Palaeognetaleana asupicia. 125 Mya.

Testudines. Chrysemys antiqua. 33.8 Mya. Cearachelys placidoi. 121 Mya. Proterochersis spp. 140 Mya. Hoplochelys spp 50 Mya.
 
_______________________________________________________________________________

This work was presented at the V Simposio Colombiano de Biología Evolutiva, on poster presentation and is avaible here.