Filosofía, especie y sistemática: septiembre 2015

martes, 22 de septiembre de 2015

Host structure of the Phylogeny of West Nile Virus WNV: Does it shape the spatiotemporal structure?

Introduction
The West Nile Virus (WNV) is a mosquito-born flavivirus that causes neurologic diseases such as encephalitis, meningitis, and acute flaccid paralysis (Lim, Koraka, Osterhaus, & Martina, 2011)⁠⁠. Similar to other flaviviruses, WNV is an enveloped virus with a single-stranded, positive sense, ∼11-kb RNA genome whose strains are grouped into at least 7 genetic lineages. WNV was first isolated in Uganda in 1937. Posteriorly, the first large outbreak of West Nile neuroinvasive disease (WNND) was recorded in Romania in 1996, with 393 confirmed cases (Tsai, Popovici, Cernescu, Campbell, & Nedelcu, 1998)⁠⁠. Three years later, it became a global public health concern after its introduction into North America, and subsequently into Central and South America (Lanciotti et al., 1999)⁠⁠. Since then, major outbreaks of WNV fever and encephalitis took place in all continents, apart from Antarctica, causing human and animal deaths. Although its enzootic cycle is mainly maintained between mosquitoes and birds, it can eventually infect horses, humans, and other vertebrates (Hayes et al., 2005)⁠⁠. Despite this variety of hosts, studies on the host structure and its influence on the spatiotemporal structure are still scarce. Since host genetic factors have a significant influence on disease distribution patterns, the overall purpose of this study was to assess the host structure of the phylogenetic relationships of WNV in a phylogeographic context, taking the spatiotemporal structure into account.

Specific Objectives

To identify the lineages of each viral strain.

To infer the main phylogeographic events.
To determine the host shift events within spatiotemporal structure.

Methods
Sequence Data: All the available sequences of complete genome of WNV, with collection times, and geographic locations( 453 sequences, from 25 countries, and 79 hosts species) were retrieved from GenBank. In order to identify and delete recombinants, clones, and duplicates from the data base, I used Uclust v1.2.22q with 99 % of identity (Edgar, 2010)⁠. A sequence of Japanese encephalitis virus (JEV) was used as the outgroup. Subsequently, all the WNV sequences were aligned using the algorithm of multiple sequence alignment, implemented in MUSCLE v3.8.31 (Edgar, 2004). The substitution model for the Envelope gene sequences was selected using Akaike information criterion with PhyML (Guindon, Dufayard, Lefort, & Anisimova, 2010)⁠⁠, called from the function phymltest{ape} (R Core Team, 2014). Phylogenetic signal was calculated for both complete coding sequence and E gene sequence, using TreePuzzle v.5.3(Schmidt, Strimmer, Vingron, & von Haeseler, 2002)⁠.

Lineages identification: A Maximum likelihood (ML) inference with the complete coding sequence, was performed using RaxML, with 20 searches and 100 bootstrap replicates, which are considered as sufficient for large data sets (Kozlov, Aberer, & Stamatakis, 2015)⁠⁠. Every lineage was assumed as a monophyletic group as sugested by (MacKenzie & Williams, 2009)⁠, and all the obtained clades will be revised taking previous studies into account.

Phylodynamics: Topologies, model parameters, evolutionary rates, TMRCA, viral population size variation over time were co-estimated for the E gene sequences dataset, using an uncorrelated log-normal relaxed clock model (rate: 0.053(Subbotina & Loktev, 2014)⁠(rationale given in (May, Davis, Tesh, & Barrett, 2011)⁠, and the MCMC method implemented in the BEAST package v1.8.2 (Drummond, Suchard, Xie, & Rambaut, 2012)⁠. I set up a phylogeographic Bayesian stochastic search variable selection (BSSVS) procedure for location data, and host as discrete traits, for this approach assumes exchange rates in the continuous-time Markov Chain(CTMC) to be zero with the prior probability, in order to find the most parsimonious set of rates explaining the diffusion process along the phylogenies. Bayesian skyline plot was used as a coalescent prior during the estimation over time of the change in effective population size per generation, per year (Ne.g). The MCMC analysis was run twice for 50 million generations, with sampling every 10000. MCMC convergence was measured by estimating the effective sampling size (ESS), using Tracer software version 1.5 (http://tree.bio.ed.ac.uk/software/tracer/). Uncertainties were estimated as 95% high probability densities (95% HPD).

Host-Shift Events: The results for the two runs were combined for final analysis and BF support for host shift. Transition rates supported by a BF > 3 will be considered as significant support for a host shift between species. The obtained topologies will be summarized in a maximum clade credibility (MCC) tree, and annotated by the use of TreeAnnotator (http://beast.bio.ed.ac.uk/treeannotator).

Phylogeographic analysis: I used the software SPREAD v.1.7(Bielejec, Rambaut, Suchard, & Lemey, 2011)⁠, to visualize the diffusion rates over time. Locations were assigned by two capital letters using the ISO 31-66 alpha code, and coordinates corresponded to the centroids of each country. Bayes Factor (BF) test was run to get the support of diffusion rates among localities (Carlo, Pagel, Meade, Pagel, & Meade, 2013; Lemey, Rambaut, Drummond, & Suchard, 2009).

Results and Discussion

WNV lineages

I obtained a final dataset of 52 sequences, from 19 countries and 24 host species. The ML tree(Figure 1), showed the seven lineages previously proposed (MacKenzie & Williams, 2009; Mann, McMullen, Swetnam, & Barrett, 2013): Ia(20 sequences), II(9 sequences), IV(6 sequences), Ia2(10 sequences), Ib(2 sequences), Ia3(2 sequences), Ia1(3 sequences).⁠

Figure 1. Phylogeny inferred using a Maximum likelihood analysis of 53 sequences. Numbers correspond to bootstrapping values.

Spatio-temporal structure

Lineages 1a, 1a1-3 contain isolates from Africa, Europe, the Middle East, Russia, and the Americas, and includes isolates from all recent outbreaks. Phylogeny in (Figure 2) shows the spatio-temporal history, in which WNV exists in an endemic cycle for certain areas such as Australia, whereas it is epidemic in Europe, being reintroduced regularly from Africa either directly (in western Europe) or via the Middle East (Pesko & Ebel, 2012). Estimations of the TMRCA of lineages 1a,1-3(Table 1), are supported with the records of outbreaks and introductions for the clade (Amore et al., 2010; Gubler, 2007; Jerzak, Bernard, Kramer, & Ebel, 2005). Significantly, introduction into other geographic areas has occurred on one occasion only in each region, leading to subsequent establishment and expansion of the virus in these areas. WNV was successfully introduced in the Americas in 90's and subsequently became endemic across most temperate regions of North America (Amore et al., 2010; May, Davis, Tesh, & Barrett, 2011).

Lineage II, on the other hand, has been associated with outbreaks of West Nile virus in Western and Eastern Europe, and appears to have established endemic cycles in Spain and Greece (Pesko & Ebel, 2012). The estimated TMRCA (112.2 (1900)) shows the origins of this lineage might be older than reported (Botha et al., 2008). Lineage IV groups numerous isolates made in Russia, first detected in 1988 from a Dermacentor tick, and since isolated from mosquitoes and frogs in 2002 and 2005 in Russia (May et al., 2011). Estimation of TMRCA for this lineage (26.4 (1985)) is consistent to the reported dates of introduction in Western Europe.

Figure 2. WNV Spatio-temporal structural. Topology corresponds to Maximum Credibility Tree (MCC) for all lineages in time. Values in nodes are posterior probabilities. In legend, countries are represented by two capital letters using the ISO 3166-1 alpha code.

Table 1. Estimation of the TMRCA for each lineage.

Stat	Ia	Ia2	Ia3	II	IV	Ia1
TMCRA	89.3 (1922)	74.1 (1938)	42.8 (1970)	112.2 (1900)	26.4 (1985)	74.2 (1937)
95% HPD	[69.7,114.5]	[32.3, 107.8]	[13.4,91.6]	[66.5,176.7]	[14.1,48.8]	[46.1, 107.3]

In general, the phylogeographic structure of the virus was recovered for the sampled lineages, and estimations of the TMCRA were consistent to the reported dates of introductions. Distributions and spatial structure show that several lineages(except Ia1), have been reintroduced to locations of the past.

Phylogeography

The discrete phylogeography analysis shows major centers of spread of WNV in different regions. The interactive kml file can be found in anexes. BF>3 support major migration rates between Western Europe and the Middle East, Mediterranean region, and the Americas; and Africa with Western Europe. See anexes for BF information.

Figure 3. Representation of the geographic diffusion process of WNV throughout the time.

Host shift events
Host shift between mosquitoes and other metazoans shapes the phylogenetic structure of each clade (Figure 4). On the other hand, BF analysis indicates the probabilities to migrate from one host onto another, with remarkably strong evidence (BF>3). See anexes.

Figure 4. Phylogeny with Host structure for WNV. Topology corresponds to Maximum Credibility Tree (MCC) for all lineages in time. Values on branches represent the posterior probability. Thickness of branches represents the probability of host shift for each node. Legend names correspond to the abbreviations of species' names.

Conclusions

Host-shift events for the WNV can be observed in the phylogeny. Rates support shifts from Culex mosquitoes to humans, birds, horses, and Urotaenia fish, with BF values over 5.

Merely good estimations of WNV history can be done from a small dataset, as the one used for this study, when a representative sample is obtained in terms of host species information and spatio-temporal structure.

Anexes
A1. Phylogenetic signal from the downsampled dataset used to run analysis in this study, corresponding to: A. Envelope gene, B. complete coding sequence.

A2. Datasets used in the study.
A.2.1. Full
A.2.2. Downsampled 1
A.2.3. Downsampled 2

A3. kml file with the visualization of the probable migratory paths of WNV over time.

A4. Bayesian Factors for geographical diffusion rates.

A5. Bayesian Factors for host-shift rates.

A1-A5 can be accessed here
https://www.dropbox.com/sh/7djz0api54h7dnx/AAA-9i0OVWd1UR-q4Eiv8ASIa?dl=0

A6. Host species abbreviations

host	host abbreviation
Homo sapiens	Hs
Culex annulirostris	Ca
Sylvia nisoria	Sn
Rousettus leschenaultii	Rl
Anopheles maculipennis	Am
Mus musculus	Mm
Equus caballus	Ec
Culex pipiens	Cp
Culex univittatus	Cu
Dermacentor marginatus	Dm
Columba livia	Cl
Phalacrocorax carbo	Pc
Ochlerotatus sticticus	Os
Aedes vexans	Av
Culex nigripalpus	Cn
Corvus corone	Cc
Uranotaenia unguiculata	Uu
Culex quinquefasciatus	Cq
Mimus polyglottos	Mp
Corvus brachyrhynchos	Cb
Aquila chrysaetos	Ac
Phoenicopterus ruber	Pr
Culex tarsalis	Ct
Accipiter gentilis	Ag

References

Bielejec, F., Rambaut, A., Suchard, M. a., & Lemey, P. (2011). SPREAD: Spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics, 27(20), 2910–2912. doi:10.1093/bioinformatics/btr481.

Drummond, A. J., Suchard, M. a, Xie, D., & Rambaut, A. (2012). Bayesian P hylogenetics with BEAUti and the BEAST 1 . 7. Molecular Biology and Evolution, 29(8), 1969–1973. doi:10.1093/molbev/mss075.

Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460–2461. doi:10.1093/bioinformatics/btq461.

Guindon, S., Dufayard, J.-F., Lefort, V., & Anisimova, M. (2010). New Alogrithms and Methods to Estimate Maximum- Likelihoods Phylogenies: Assessing the performance of PhyML 3.0. Systematic Biology, 59(3), 307–321.

Hayes, E. B., Komar, N., Nasci, R. S., Montgomery, S. P., O’Leary, D. R., & Campbell, G. L. (2005). Epidemiology and transmission dynamics of West Nile virus disease. Emerging Infectious Diseases, 11(8), 1167–1173. doi:10.3201/eid1108.050289a.

Kozlov, a. M., Aberer, a. J., & Stamatakis, a. (2015). ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers. Bioinformatics, (March), 1–3. doi:10.1093/bioinformatics/btv184.

Lanciotti, R. S., Roehrig, J. T., Deubel, V., Smith, J., Parker, M., Steele, K., … Gubler, D. J. (1999). Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science (New York, N.Y.), 286(5448), 2333–2337. doi:10.1126/science.286.5448.2333.

Lim, S. M., Koraka, P., Osterhaus, A. D. M. E., & Martina, B. E. E. (2011). West Nile virus: Immunity and pathogenesis. Viruses, 3(6), 811–828. doi:10.3390/v3060811.

MacKenzie, J. S., & Williams, D. T. (2009). The zoonotic flaviviruses of southern, south-eastern and eastern Asia, and australasia: The potential for emergent viruses. Zoonoses and Public Health, 56(6-7), 338–356. doi:10.1111/j.1863-2378.2008.01208.x.

Mann, B. R., McMullen, A. R., Swetnam, D. M., & Barrett, A. D. T. (2013). Molecular epidemiology and evolution of West Nile virus in North America. International Journal of Environmental Research and Public Health, 10(10), 5111–5129. doi:10.3390/ijerph10105111.

May, F. J., Davis, C. T., Tesh, R. B., & Barrett, A. D. T. (2011). Phylogeography of West Nile virus: from the cradle of evolution in Africa to Eurasia, Australia, and the Americas. Journal of Virology, 85(6), 2964–2974. doi:10.1128/JVI.01963-10.

Schmidt, H. a, Strimmer, K., Vingron, M., & von Haeseler, A. (2002). TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics (Oxford, England), 18(3), 502–504. doi:10.1093/bioinformatics/18.3.502.

Subbotina, E. L., & Loktev, V. B. (2014). Molecular evolution of West Nile virus. Molecular Genetics, Microbiology and Virology, 29(1), 34–41. doi:10.3103/S0891416814010054.

Tsai, T. F., Popovici, F., Cernescu, C., Campbell, G. L., & Nedelcu, N. I. (1998). West Nile encephalitis epidemic in southeastern Romania. Lancet, 352(9130), 767–771. doi:10.1016/S0140-6736(98)03538-7.

¿Total evidence, nuclear genes or non-nuclear genes? Here is the dilemma.

Introduction.

Datation methods for evolutionary process studies are broad used in different groups like Chiroptera (Teeling et al. 2005) or Gimnosperms (Rydin & Petra. 2009). Even the non-nuclear DNA are the most widely reported by its own characteristics: high rate evolution, uniparental inheritance or absence of recombination (Zardoya & Meyer. 1996). The results on datation using nuclear partitions and non-nuclear partitions could be differentials (Vawter & Brown. 1986), but this subject had not been evaluated even though with the widely use of the technique. The main goal in this work is contrast the estimate datation times and the standard deviation using nuclear genes, non-nuclear genes and combination of both.

_______________________________________________________________________________

Materials and Methods.

Data: 3 different partitions were created: Non-nuclear genes, nuclear genes and total evidence (Nuclear + Non-nuclear) for three taxas: Testudines, Picidae (Aves), Chiroptera (Mammalia) and Gimnospermas.

Phylogeny: A phylogenetic analysis under Maximum Likelihood with the Jukes Cantor nucleotide model with PhyML v. 3.0 (2012-12-08)(Guidon et al. 2010) was run for each partition. To avoid nucleotide model influence in the estimation, the same model were implemented in all partitions.

Datation: The Heuristic rate smoothing algorithm (HSRA) implemented in the BaseML packages from the software PALM (Yang. 2004) was used for the SD and datation time estimation. Two clock model were used: Strict Clock (SC) and Local Clock (LC). For each group as minimum 3 fossils calibration points where implemented in the analysis (Magallon et al. 2013).

Analysis: The delta of standard deviation (ΔSD) and the datation time estimate were compared among each partition in a group. A Spearman correlation was made between the ΔSD and the number of common nodes (Fig. 1) in each group.

________________________________________________________________________________

Results and Discussion.

On average, the topologies present a 40% of common nodes (range: 33 - 47%), and compare those nodes no differences in the datations generate under SC and LC where found in all groups. Chiroptera and Picidae present a trend in the SD where nuclear and total evidence partitions had the lowest values; While Gimnosperms and Testudines no present any trend in the SD (Fig. 2). Using the ΔSD, in general the nuclear and total evidence were the partition with the lowest values. Just Testudines present a different resutl, where the non-nuclear and nuclear partitions had the lowest values (Fig. 2).

The Spearman correlation shows inversely proportional relationship between the common node number and the ΔSD, despiting that it wasn't significant (p > 0.05) the relation is not descarted because the low number of taxonomic groups (< 5)(Fig.3) and it could be associate to the reduce number of terminals in the phylogeny.

Comparing with other previous works (Teeling et al. 2005; Rydin & Perea. 2009; Lourenco et al. 2012), the total evidence paritition present similar results in the datation times and SD. The partition combined are recomended in datation analysis, but also nuclear genes are a good choice too. Is necessary evaluate the effect of the number of of tips (terminals) in the phylogeny, and the use of different nucleotide evolution models.

________________________________________________________________________________

Figures.

Figure 1. Two comparable nodes between two differents topologies.

Figure 2. Standard deviation (SD) for all internal nodes of partitions from each group.

Figure 3. ΔSD values for each partition in each group and the Spearman regression with its respective rho value.

________________________________________________________________________________

Bibliography.

Catarina Rydin & Petra Korall. 2009: Evolutionary Relationships in Ephedra (Gnetales), with Implications for Seed Plant Phylogeny. Int. J. Plant. Sci. 170 (8):1031-1043.

Emma C. Teeling, Mark S. Springer, Ole Madsen, Paul Bates, Stephen J. O'Brien and William J. Murphy. 2005: A Molecular Phylogeny for Bats Illuminates Biogeography and the Fossil Record. Science Vol (307): 580-584.

Guidon S., Dufayard J. F., Lefort V., Anisimova M., Hordijk W., and Gascuel O. 2010: New algorithm and methods to estimate maximum likelihood phylogenies: assessing the perfomance of PhyML 3.0. Systematic Biology, 59 (3): 307-321.

Joao M. Lourenco, Julien Claude, Nicolas Galtier and Ylenia Chiari. 2012: Dating cryptodiran nodes: Origin and diversification of the turtle superfamily Testudinoidea. Molecular Phylogenetics and Evolution. 62: 496-507.

Lisa Vawter and wesley M. Brown. 1986: Nuclear and Mitochondrial DNA Comparisons Reveal Extreme Rate Variation in the Molecular Clock. Science Vol (234): 194-195.

Susana Magallon, Khidir W. Hilu and Dietmar Quandt. 2013: Land plant evolutionary timeline: Gene effect are scondary to fossil constraints in relaxed clock estimation of age and substitution rates. American Journal of Botany 100 (3): 000-018.

Rafael Zardoya & Axel Meyer. 1996: Phylogenetic perfomance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Molecular Biology and Evolution 13 (7): 933-942.

Ziheng Yang. 2004: A heuristic rate smoothing procedure for maximum likelihood estimation of species divergence times. Acta Zoologica Sinica. 50(4): 645-656.

_______________________________________________________________________________

Supplements.

1.

Genes:

Picidae (Aves): COI, ND2, CYTB, Brahama Protein (BRM), Beta-Fibrogen (BFG), Aconitasa 1 (ACO1).

Chiroptera: 16s, COX, ND2, CYTB, RAG1, RAG2, ATP7a, BRCA1.

Gimnosperms: MATK, RBCL, ATPB, 18s, 26s, 5.8s.

Testudines: 12s, COX, CYTB, NAD4, RAG, Brain-derived Neurotrophic Factor (BNDF), Aryl Hydrocarbon Receptor 1 (AHR), Nerve Growth Factor (NGF).

2.

Fossils.

Picidae (Aves). Colaptes 1.8 Mya. Pliopicus spp. 13.6 Mya. Paleonerpes shorti. 11 Mya.

Chiroptera. (mayor información: E.C.Teeling et al., 2005 10.1126/science.1105113). Notonycteris spp. 30 Mya. Trachypteron franzeni (Emballonuridae) 37 Mya. Philisis spp. + Chamtwaria spp. + Chibanycteris spp. 37 Mya. Brachipposideros spp. + Pseudorhinolophus spp. 55 Mya.

Gimnospermas. Antarcticycas spp. 171 Mya. Rissikia spp. 245 Mya. Palaeognetaleana asupicia. 125 Mya.

Testudines. Chrysemys antiqua. 33.8 Mya. Cearachelys placidoi. 121 Mya. Proterochersis spp. 140 Mya. Hoplochelys spp 50 Mya.

_______________________________________________________________________________

This work was presented at the V Simposio Colombiano de Biología Evolutiva, on poster presentation and is avaible here.

Estimating Evolutionary Rates and Times to the last Common Ancestor DEV-2, using partitions vs complete coding region

Introduction

Dengue Virus serotype 2(DENV-2) is one of the four serotypes of the arthropod borne viruses (arbovirus) that belongs to the Flaviridae family. As the other serotypes, it is responsible for causing Dengue fever and severe Dengue. The genome is formed by a single stranded positive-sense RNA of approximately 11kb long. This viral RNA serves as an mRNA for the translation of 10 viral proteins: 3 structural(Capside C, Membrane precursor preM, Envelope E), that assemble the nucleocapside and envelope of mature virions; and 7 non-structural(NS1, NS2a, NS2b, NS3, NS4a, NS4B, NS5), which are mainly responsible for viral replication and other cell functions.

Due to its considerable diversity, DENV-2, has been grouped into 6 lineages, designated as genotypes (Rico-hesse et al., 1997, 1998; S Susanna Twiddy et al., 2002)⁠. 1.) Asian I genotype, represents strains from Thailand, Malaysia, Cambodia, Myanmar, Vietnam and Australia. 2.) Asian II, represents strains from China, Indonesia, The Philippines, Taiwan, Sri Lanka, India, Honduras and Mexico. 3.) Southeast (SE) Asian/American genotype comprises all strains from Southeast Asia, and strains collected in Central and South America and the Caribbean over the last 30 years. 4.) The Cosmopolitan genotype, represents strains distributed in a wide geographic area, including East and West Africa, the Middle East, the Indian subcontinent, Indian and Pacific Ocean Islands and Australia. 5.) American genotype, representing strains from Central and South America, the Caribbean and older strains collected in the Indian subcontinent and the Pacific Islands. Lastly, there is the Sylvatic genotype, representing strains from humans, canopy-dwelling arboreal mosquitoes, and non-human primates collected in West Africa and Southeast Asia. Moreover, the great genetic diversity within the DENV-2 genotypes, reflect their continual divergence and diverse geographic distribution (Chen & Vasilakis, 2011)⁠.

Several studies have been aimed at understanding the epidemiology of DENV, rates and dates of evolution, and selection pressure in its different genes and genotypes (Rico-hesse & Rico-hesse, 1990; Rico-Hesse et al., 1997; S. S. Twiddy, Holmes, & Rambaut, 2003; Weaver & Vasilakis, 2009; Zanotto, Gould, Gao, Harvey, & Holmes, 1996)⁠. Further, a relatively recent review (Chen & Vasilakis, 2011)⁠ reports large scale phylogenetic studies of all dengue serotypes and discusses how factors such as rates of evolution, selection pressures, population sizes and evolutionary constraints influences DENV transmission, pathogenesis and emergence. New approaches reveal the patterns in the spread and epidemiology of viral diseases by phylogeographic studies. On a global scale, DENV-2 has been reported as the serotype with the highest effective population size (Costa, Voloch, & Schrago, 2012)⁠, and the time to the last common ancestor as 338.1 (227.1–462.9) (Patil et al., 2011)⁠. Nonetheless, all these studies have been based only in estimations made from the E gene sequences.

The objectives of this study were to infer the time to the Most recent common ancestor MRCA of the DENV2 genotypes, and estimate the evolutionary rates of the data set, using the complete genome and individual genes.

Materials and Methods

Sequence Data

Complete genome sequences of DENV-1 were retrieved from GenBank(1093 sequences with collection dates (1944 to 2014), from (48) distinct countries, database up to June 2015). In order to delete duplicates, recombinants, and chimeras, I used uclust v.4.0.38 (Edgar, 2010)⁠, with a cut-off id value of 90%. An statistical test for detection of significant recombination events in the dataset was run using SplitsTree v.4.2 (Bruen, Philippe, & Bryant, 2006)⁠. Sequences were aligned using MUSCLE v3.8.31(Edgar, 2004)⁠. From this alignment, I obtained 11 partitions corresponding to the individual genes(10), and the complete coding sequence(1). Substitution model was selected under the Akaike information criterion, using PhyML (Guindon, Dufayard, Lefort, & Anisimova, 2010)⁠⁠, called with the function phymltest(package ape, R v.3.2.1.(R Core Team 2014)).

Phylogenetic Signal

In order to test the phylogenetic signal, ergo, the topological resolution yielded by the data set of each partition, a Likelihood mapping analysis was run for 10 million random quartets using TreePuzzle v.5.3.(Schmidt, Strimmer, Vingron, & von Haeseler, 2002)⁠. The quartets were reconstructed following a ML approach under the selected substitution models for each partition. Thus, an independent analysis was run for each dataset. The results were plotted as a triangle with fully resolved quartets shown as dots concentrated near the vertices, and the unresolved ones are located in the center. (See anexes)

Genotypes Assignation

a Maximum likelihood (ML) tree was inferred using RaxML, using the complete coding sequence (CDS), with the model GTR+G+I.

Estimation of Evolutionary rates and TMCRA

Tree topologies, model parameters, evolutionary rates for each partition dataset, and times of the MCRA for each genotype were co-estimated, using BEAST v.8.1.2.(A. J. Drummond, Suchard, Xie, & Rambaut, 2012)⁠. Inferrences were performed using a relaxed molecular clock (uncorrelated lognormal[rate: 7.5*10^-4, variation: 0.4 (Costa et al., 2012)⁠]. A Bayesian skyline plot was used as a coalescent prior during the inference (a. J. Drummond, Rambaut, Shapiro, & Pybus, 2005; Ho & Shapiro, 2011)⁠, for it is an approach with no dependence on a pre-specified parametric model of demographic history, such as logistic, exponential. Analysis was run for twice for 50 million generations, sampling every 10000. Topology was rooted with a sequence of Japanese Encephalitis Virus(EF107523) from 1988.

Results and Discussion

The mean evolutionary rate for each gene ranged 5.3 10^-4 –1.5 × 10^-4 nucleotide substitutions per site per year, being the NS5, NS2B, and E, the ones with the highest substitution rates. This estimations can be compared to the ones obtained by(S. S. Twiddy et al., 2003)⁠. The complete coding sequence was not account for comparisons, because converge of this parameter is not guaranteed (ESS<100) for the partition. In general, evolutionary rates for the dataset obtained from the E gene sequence were similar compared to the non-structural genes NS2B, NS4A, NS4B, and NS5.

Table 1. Estimated Mean Nucleotide Substitution Rates per Site per Year for each partition of Dengue Virus Serotype 2.

	Rate(subs/site/year)	Rate 95%HPD
C	4.10E-004	[9.5E-4, 2.4E-4]
preM	4.20E-004	[8.6E-4, 2.5E-4]
E	2.30E-004	[4.9E-4, 1.4E-4]
NS1	3.80E-004	[6.4E-4, 1.1E-4]
NS2A	5.30E-004	[8.1E-4, 1.8E-4]
NS2B	1.40E-004	[4.1E-4, 7.3E-3]
NS3	2.80E-004	[5.2E-4, 1.1E-4]
NS4A	2.50E-004	[6.5E-4, 1.3E-4]
NS4B	2.60E-004	[6.5E-4, 2.4E-4]
NS5	1.50E-004	[7.5E-4, 3.1E-3]
CDS	4.30E-004	[9.4E-4, 1.2E-3]

Regarding the time the most recent common ancestor (MRCA), results are consistent to the times found by (Walimbe, Lotankar, Cecilia, & Cherian, 2014), and are comparable to the times of outbreaks and introductions summarized in (Chen & Vasilakis, 2011)

In average, the partitions whose TMCRA for Asian/American genotype were closest to the proposed time in other studies (using the E gene sequence: 42.757-*44.4), were NS1:46.4, NS4A:45.6, and NS4B:43.7. For genotype Asian1/Asian2, the reference value (using the E gene sequence: 64.189-*70.45), was nearly recovered by NS1: 70.7, and preM: 67.8. For the Cosmopolitan genotype, (using the E gene sequence: 59.8-*67.9), the TMCRA was roughly recovered by NS2B: 63.885, NS4B: 60.929, and the CDS: 77.779. The TMCRA of the sylvatic2 genotype (using the E gene sequence: 121.1-*158.4), was recovered by NS5:157.1, CDS:157.2, NS1:120.1, NS2A:135.3, and NS4B: 126.9. The only exception for this case, would be the American 2 genotype, whose TMCRA estimations were different (103.3*), from the obtained in this work (56.2). * Values obtained by (Patil et al., 2011; Walimbe et al., 2014).

Genes pre-M, NS1, NS2A, NS4B, showed the closest values to the references. Thence, they could be an alternative to use when estimating evolutionary processes in DENV2. Besides, regarding the potential resolution of phylogenetic relationships between clades, these genes have shown good phylogenetic signal (>90%). Also, they have relatively smaller sizes compared to the E gene (with exception of NS1), which may be even practical when running long analyses.

Table 2. Estimated time to the most recent common ancestor (MRCA) for every partition, estimated under a Bayesian skyline coalescent tree prior.

	root	95%HDP	AA	95%HDP	A1/A2	95%HDP	Cosmo	95%HDP	A2	95%HDP	Sylvatic	95%HDP
C	2174.6	[1100.8, 3381.1]	55.1	[38.5, 74.4]	57.2	[50, 72.1]	64.9	[46.2, 87.7]	54.9	[40.1, 77.1]	100.8	[56.7, 160.1]
preM	3268.7	[1866.7, 4821.1]	55.1	[39.9, 72.5]	67.8	[54.3, 83.4]	54.9	[43.9, 68.4]	46.8	[38.3, 59.2]	107.4	[64.3, 165.1]
E	6175.5	[3780, 7200.1]	42.8	[34.5, 52.1]	64.2	[53.8, 77.5]	59.8	[45.5, 76.4]	56.7	[42.3, 78.9]	121.1	[70.2, 179.5]
NS1	5452.1	[3371.3, 7790.7]	46.4	[34.9, 58.7]	70.7	[56.2, 87.3]	56.5	[44.9, 70.7]	59.5	[40.1, 71.5]	120.1	[69.9, 181.3]
NS2A	3362.9	[1944.8, 4966.4]	43.5	[34.9, 54.5]	67.1	[54.1, 83.7]	59.1	[46.6, 74.7]	53.3	[39.5, 72.4]	135.3	[72.4, 206.5]
NS2B	2712.5	[1497.1, 4033.4]	49.9	[35.9, 66.4]	62.9	[51.4, 78.7]	63.9	[45.3, 89.1]	57.2	[43.2, 74.8]	101.4	[61.2, 156.5]
NS3	2914.1	[1502.1, 4102.2]	56.9	[38.3, 73.5]	63.7	[54.1, 76.1]	59.1	[42.8, 83.9]	58.8	[46.4, 71.6]	119.1	[58.2, 215.6]
NS4A	3989.9	[2283.7, 5881.3]	45.6	[35.3, 57.8]	65.3	[52.4, 82.1]	55.1	[43.9, 66.5]	45.1	[38.1, 53.3]	118	[69.6, 187.7]
NS4B	2454.2	[1290.3, 3688.6]	43.7	[35.3, 54.2]	60.9	[51.6, 72.4]	60.9	[46.1, 79.8]	46.1	[37.9, 58.4]	126.9	[73.1, 187.9]
NS5	5675.6	[3371.4, 7790.7]	58.6	[39.9,72.5]	77.9	[56.2, 95.1]	67.3	[52.5, 83.5]	63.5	[49.3, 76.9]	157.1	[72.4, 206.5]
CDS	10944.7	[3459.8, 25368.5]	105	[36.6, 550.8]	117.6	[53.6, 337.7]	77.8	[43.9, 172.7]	76.6	[39.5, 145]	157.2	[75, 290.6]

AA:American/Asian. A1A2:Asian1/Asian2. Cosmo: Cosmopolitan. A2: American2.

One visible problem with the analysis was that the convergence of parameter for the inference based on complete coding region sequence was not reached. Although this problem was expectable, increasing the number of generations to the double of the used in this study, would probably yield better estimations in terms of ESS, and I consider that this is a worth-doing comparison.

Conclusion

In terms of the estimation of evolutionary rates and TMRCA, NS1, NS2A, NS4B, would be good candidates to infer evolutionary processes in DENV-2.

Anexes

A1.) List of accessions and details of the sequences used in the study. (Link to Dropbox)

https://www.dropbox.com/sh/svjbqupk3ydzeom/AABx2P67bR2q4hgd-9-l2P1ea?dl=0

A2.) Substitution models for each partition

Partition	Subs Model
C	GTR+G+I
preM	GTR+G+I
E	GTR+G+I
NS1	GTR+G+I
NS2A	GTR+G+I
NS2B	TN93+G+I
NS3	GTR+G+I
NS4A	TN93+G
NS4B	GTR+G+I
NS5	GTR+G+I
CDS	*

A3.) Phylogenetic signal for each partition.

A. Gene C; B. Gene preM; C. Gene E; D. Gene NS1; E. Gene NS2a; F. Gene NS2b; G. Gene NS3; H. Gene NS4a; I. Gene NS4b; J. Gen NS5; K. Complete CDS.
RQ= Resolved quartets.

References

Bruen, T. C., Philippe, H., & Bryant, D. (2006). A simple and robust statistical test for detecting the presence of recombination. Genetics, 172(4), 2665–2681. doi:10.1534/genetics.105.048975.

Chen, R., & Vasilakis, N. (2011). Dengue--quo tu et quo vadis? Viruses, 3(9), 1562–608. doi:10.3390/v3091562.

Costa, R. L., Voloch, C. M., & Schrago, C. G. (2012). Comparative evolutionary epidemiology of dengue virus serotypes. Infection, Genetics and Evolution, 12(2), 309–314. doi:10.1016/j.meegid.2011.12.011.

Drummond, a. J., Rambaut, a., Shapiro, B., & Pybus, O. G. (2005). Bayesian coalescent inference of past population dynamics from molecular sequences. Molecular Biology and Evolution, 22(5), 1185–1192. doi:10.1093/molbev/msi103.

Drummond, A. J., Suchard, M. a, Xie, D., & Rambaut, A. (2012). Bayesian P hylogenetics with BEAUti and the BEAST 1 . 7. Molecular Biology and Evolution, 29(8), 1969–1973. doi:10.1093/molbev/mss075.

Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. doi:10.1093/nar/gkh340.

Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460–2461. doi:10.1093/bioinformatics/btq461.

Guindon, S., Dufayard, J.-F., Lefort, V., & Anisimova, M. (2010). New Alogrithms and Methods to Estimate Maximum- Likelihoods Phylogenies: Assessing the performance of PhyML 3.0. Systematic Biology, 59(3), 307–321.

Ho, S. Y. W., & Shapiro, B. (2011). Skyline-plot methods for estimating demographic history from nucleotide sequences. Molecular Ecology Resources, 11(3), 423–434. doi:10.1111/j.1755-0998.2011.02988.x.

Patil, J. a., Cherian, S., Walimbe, a. M., Patil, B. R., Sathe, P. S., Shah, P. S., & Cecilia, D. (2011). Evolutionary dynamics of the American African genotype of dengue type 1 virus in India (1962-2005). Infection, Genetics and Evolution, 11(6), 1443–1448. doi:10.1016/j.meegid.2011.05.011.

Rico-hesse, R., Harrison, L. M., Nisalak, A., Vaughn, D. W., Kalayanarooj, S., Green, S., … Ennis, F. A. (1998). MOLECULAR EVOLUTION OF DENGUE TYPE 2 VIRUS IN THAILAND, 58(1), 96–101.

Rico-Hesse, R., Harrison, L. M., Salas, R. a, Tovar, D., Nisalak, a, Ramos, C., … da Rosa, a T. (1997). Origins of dengue type 2 viruses associated with increased pathogenicity in the Americas. Virology,230(2), 244–251. doi:10.1128/CVI.00363-08.

Rico-hesse, R., Harrison, L. M., Salas, R. A., Tovar, D., Nisalak, A., Ramos, C., … Nogueira, R. M. R. (1997). Origins of Dengue Type 2 Viruses Associated with Increased Pathogenicity in the Americas, 251(230), 244–251.

Rico-hesse, R., & Rico-hesse, R. (1990). Molecular Evolution and Distribution of Dengue Viruses Type 1 and 2 in Nature, 493, 479–493.

Schmidt, H. a, Strimmer, K., Vingron, M., & von Haeseler, A. (2002). TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics (Oxford, England), 18(3), 502–504. doi:10.1093/bioinformatics/18.3.502.

Twiddy, S. S., Farrar, J. J., Chau, N. V., Wills, B., Gould, E. A., Gritsun, T., … Holmes, E. C. (2002). Phylogenetic Relationships and Differential Selection Pressures among Genotypes of Dengue-2 Virus 1, 72, 63–72. doi:10.1006/viro.2002.1447.

Twiddy, S. S., Holmes, E. C., & Rambaut, a. (2003). Inferring the Rate and Time-Scale of Dengue Virus Evolution. Molecular Biology and Evolution, 20(1), 122–129. doi:10.1093/molbev/msg010.

Walimbe, A. M., Lotankar, M., Cecilia, D., & Cherian, S. S. (2014). Global phylogeography of Dengue type 1 and 2 viruses reveals the role of India. Infection, Genetics and Evolution, 22, 30–39. doi:10.1016/j.meegid.2014.01.001.

Weaver, S. C., & Vasilakis, N. (2009). Molecular evolution of dengue viruses: contributions of phylogenetics to understanding the history and epidemiology of the preeminent arboviral disease. Infection, Genetics and Evolution : Journal of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases, 9(4), 523–40. doi:10.1016/j.meegid.2009.02.003.

Zanotto, P. M., Gould, E. a, Gao, G. F., Harvey, P. H., & Holmes, E. C. (1996). Population dynamics of flaviviruses revealed by molecular phylogenies. Proceedings of the National Academy of Sciences of the United States of America, 93(2), 548–553.

Zhang, Z., Li, J., & Yu, J. (2006). Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC Evolutionary Biology, 6, 44. doi:10.1186/1471-2148-6-44.