martes, 22 de septiembre de 2015

Host structure of the Phylogeny of West Nile Virus WNV: Does it shape the spatiotemporal structure?

Introduction
The West Nile Virus (WNV) is a mosquito-born flavivirus that causes neurologic diseases such as encephalitis, meningitis, and acute flaccid paralysis (Lim, Koraka, Osterhaus, & Martina, 2011)⁠⁠. Similar to other flaviviruses, WNV is an enveloped virus with a single-stranded, positive sense, ∼11-kb RNA genome whose strains are grouped into at least 7 genetic lineages. WNV was first isolated in Uganda in 1937. Posteriorly, the first large outbreak of West Nile neuroinvasive disease (WNND) was recorded in Romania in 1996, with 393 confirmed cases (Tsai, Popovici, Cernescu, Campbell, & Nedelcu, 1998)⁠⁠. Three years later, it became a global public health concern after its introduction into North America, and subsequently into Central and South America (Lanciotti et al., 1999)⁠⁠. Since then, major outbreaks of WNV fever and encephalitis took place in all continents, apart from Antarctica, causing human and animal deaths. Although its enzootic cycle is mainly maintained between mosquitoes and birds, it can eventually infect horses, humans, and other vertebrates (Hayes et al., 2005)⁠⁠. Despite this variety of hosts, studies on the host structure and its influence on the spatiotemporal structure are still scarce. Since host genetic factors have a significant influence on disease distribution patterns, the overall purpose of this study was to assess the host structure of the phylogenetic relationships of WNV in a phylogeographic context, taking the spatiotemporal structure into account.

Specific Objectives
To identify the lineages of each viral strain.
To infer the main phylogeographic events.
To determine the host shift events within spatiotemporal structure.

Methods
Sequence Data: All the available sequences of complete genome of WNV, with collection times, and geographic locations( 453 sequences, from 25 countries, and 79 hosts species) were retrieved from GenBank. In order to identify and delete recombinants, clones, and duplicates from the data base, I used Uclust v1.2.22q with 99 % of identity (Edgar, 2010)⁠. A sequence of Japanese encephalitis virus (JEV) was used as the outgroup. Subsequently, all the WNV sequences were aligned using the algorithm of multiple sequence alignment, implemented in MUSCLE v3.8.31 (Edgar, 2004). The substitution model for the Envelope gene sequences was selected using Akaike information criterion with PhyML (Guindon, Dufayard, Lefort, & Anisimova, 2010)⁠⁠, called from the function phymltest{ape} (R Core Team, 2014). Phylogenetic signal was calculated for both complete coding sequence and E gene sequence, using TreePuzzle v.5.3(Schmidt, Strimmer, Vingron, & von Haeseler, 2002)⁠.

Lineages identification: A Maximum likelihood (ML) inference with the complete coding sequence, was performed using RaxML, with 20 searches and 100 bootstrap replicates, which are considered as sufficient for large data sets (Kozlov, Aberer, & Stamatakis, 2015)⁠⁠. Every lineage was assumed as a monophyletic group as sugested by (MacKenzie & Williams, 2009)⁠, and all the obtained clades will be revised taking previous studies into account.

Phylodynamics: Topologies, model parameters, evolutionary rates, TMRCA, viral population size variation over time were co-estimated for the E gene sequences dataset, using an uncorrelated log-normal relaxed clock model (rate: 0.053(Subbotina & Loktev, 2014)⁠(rationale given in (May, Davis, Tesh, & Barrett, 2011)⁠, and the MCMC method implemented in the BEAST package v1.8.2 (Drummond, Suchard, Xie, & Rambaut, 2012)⁠. I set up a phylogeographic Bayesian stochastic search variable selection (BSSVS) procedure for location data, and host as discrete traits, for this approach assumes exchange rates in the continuous-time Markov Chain(CTMC) to be zero with the prior probability, in order to find the most parsimonious set of rates explaining the diffusion process along the phylogenies. Bayesian skyline plot was used as a coalescent prior during the estimation over time of the change in effective population size per generation, per year (Ne.g). The MCMC analysis was run twice for 50 million generations, with sampling every 10000. MCMC convergence was measured by estimating the effective sampling size (ESS), using Tracer software version 1.5 (http://tree.bio.ed.ac.uk/software/tracer/). Uncertainties were estimated as 95% high probability densities (95% HPD). 

Host-Shift Events: The results for the two runs were combined for final analysis and BF support for host shift. Transition rates supported by a BF > 3 will be considered as significant support for a host shift between species. The obtained topologies will be summarized in a maximum clade credibility (MCC) tree, and annotated by the use of TreeAnnotator (http://beast.bio.ed.ac.uk/treeannotator).

Phylogeographic analysis: I used the software SPREAD v.1.7(Bielejec, Rambaut, Suchard, & Lemey, 2011)⁠, to visualize the diffusion rates over time. Locations were assigned by two capital letters using the ISO 31-66 alpha code, and coordinates corresponded to the centroids of each country. Bayes Factor (BF) test was run to get the support of diffusion rates among localities (Carlo, Pagel, Meade, Pagel, & Meade, 2013; Lemey, Rambaut, Drummond, & Suchard, 2009).

Results and Discussion

WNV lineages

I obtained a final dataset of 52 sequences, from 19 countries and 24 host species. The ML tree(Figure 1), showed the seven lineages previously proposed (MacKenzie & Williams, 2009; Mann, McMullen, Swetnam, & Barrett, 2013): Ia(20 sequences), II(9 sequences), IV(6 sequences), Ia2(10 sequences), Ib(2 sequences), Ia3(2 sequences), Ia1(3 sequences).⁠


Figure 1. Phylogeny inferred using a Maximum likelihood analysis of 53 sequences. Numbers correspond to bootstrapping values.

Spatio-temporal structure

Lineages 1a, 1a1-3 contain isolates from Africa, Europe, the Middle East, Russia, and the Americas, and includes isolates from all recent outbreaks. Phylogeny in (Figure 2) shows the spatio-temporal history, in which WNV exists in an endemic cycle for certain areas such as Australia, whereas it is epidemic in Europe, being reintroduced regularly from Africa either directly (in western Europe) or via the Middle East (Pesko & Ebel, 2012). Estimations of the TMRCA of lineages 1a,1-3(Table 1), are supported with the records of outbreaks and introductions for the clade (Amore et al., 2010; Gubler, 2007; Jerzak, Bernard, Kramer, & Ebel, 2005). Significantly, introduction into other geographic areas has occurred on one occasion only in each region, leading to subsequent establishment and expansion of the virus in these areas. WNV was successfully introduced in the Americas in 90's and subsequently became endemic across most temperate regions of North America (Amore et al., 2010; May, Davis, Tesh, & Barrett, 2011).

Lineage II, on the other hand, has been associated with outbreaks of West Nile virus in Western and Eastern Europe, and appears to have established endemic cycles in Spain and Greece (Pesko & Ebel, 2012). The estimated TMRCA (112.2 (1900)) shows the origins of this lineage might be older than reported (Botha et al., 2008). Lineage IV groups numerous isolates made in Russia, first detected in 1988 from a Dermacentor tick, and since isolated from mosquitoes and frogs in 2002 and 2005 in Russia (May et al., 2011). Estimation of TMRCA for this lineage (26.4 (1985)) is consistent to the reported dates of introduction in Western Europe.



Figure 2. WNV Spatio-temporal structural. Topology corresponds to Maximum Credibility Tree (MCC) for all lineages in time. Values in nodes are posterior probabilities. In legend, countries are represented by two capital letters using the  ISO 3166-1 alpha code.


Table 1. Estimation of the TMRCA for each lineage.



Stat Ia Ia2 Ia3 II IV Ia1
TMCRA 89.3 (1922) 74.1 (1938) 42.8 (1970) 112.2 (1900) 26.4 (1985) 74.2 (1937)
95% HPD [69.7,114.5] [32.3, 107.8] [13.4,91.6] [66.5,176.7] [14.1,48.8] [46.1, 107.3]


In general, the phylogeographic structure of the virus was recovered for the sampled lineages, and estimations of the TMCRA were consistent to the reported dates of introductions. Distributions and spatial structure show that several lineages(except Ia1), have been reintroduced to locations of the past. 

Phylogeography

The discrete phylogeography analysis shows major centers of spread of WNV in different regions. The interactive kml file can be found in anexes.  BF>3 support major migration rates between Western Europe and the Middle East, Mediterranean region, and the Americas; and Africa with Western Europe. See anexes for BF information.

Figure 3. Representation of the geographic diffusion process of WNV throughout the time.

Host shift events
Host shift between mosquitoes and other metazoans shapes the phylogenetic structure of each clade (Figure 4). On the other hand, BF analysis indicates the probabilities to migrate from one host onto another, with remarkably strong evidence (BF>3). See anexes.

Figure 4. Phylogeny with Host structure for WNV. Topology corresponds to Maximum Credibility Tree (MCC) for all lineages in time. Values on branches represent the posterior probability. Thickness of branches represents the probability of host shift for each node. Legend names correspond to the abbreviations of species' names.

Conclusions

Host-shift events for the WNV can be observed in the phylogeny. Rates support  shifts from Culex mosquitoes to humans, birds, horses, and Urotaenia fish, with BF values over 5.

Merely good estimations of WNV history can be done from a small dataset, as the one used for this study, when a representative sample is obtained in terms of host species information and spatio-temporal structure. 



Anexes
A1. Phylogenetic signal from the downsampled dataset used to run analysis in this study, corresponding to: A. Envelope gene, B. complete coding sequence.

A2. Datasets used in the study.
A.2.1. Full
A.2.2. Downsampled 1
A.2.3. Downsampled 2

A3. kml file with the visualization of the probable migratory paths of WNV over time.

A4. Bayesian Factors for geographical diffusion rates.

A5. Bayesian Factors for host-shift rates.

A1-A5 can be accessed here
https://www.dropbox.com/sh/7djz0api54h7dnx/AAA-9i0OVWd1UR-q4Eiv8ASIa?dl=0

A6. Host species abbreviations


host host abbreviation
Homo sapiens Hs
Culex annulirostris Ca
Sylvia nisoria Sn
Rousettus leschenaultii Rl
Anopheles maculipennis Am
Mus musculus Mm
Equus caballus Ec
Culex pipiens Cp
Culex univittatus Cu
Dermacentor marginatus Dm
Columba livia Cl
Phalacrocorax carbo Pc
Ochlerotatus sticticus Os
Aedes vexans Av
Culex nigripalpus Cn
Corvus corone Cc
Uranotaenia unguiculata Uu
Culex quinquefasciatus Cq
Mimus polyglottos Mp
Corvus brachyrhynchos Cb
Aquila chrysaetos Ac
Phoenicopterus ruber Pr
Culex tarsalis Ct
Accipiter gentilis Ag

References

Bielejec, F., Rambaut, A., Suchard, M. a., & Lemey, P. (2011). SPREAD: Spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics, 27(20), 2910–2912. doi:10.1093/bioinformatics/btr481.

Drummond, A. J., Suchard, M. a, Xie, D., & Rambaut, A. (2012). Bayesian P hylogenetics with BEAUti and the BEAST 1 . 7. Molecular Biology and Evolution, 29(8), 1969–1973. doi:10.1093/molbev/mss075.

Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460–2461. doi:10.1093/bioinformatics/btq461.

Guindon, S., Dufayard, J.-F., Lefort, V., & Anisimova, M. (2010). New Alogrithms and Methods to Estimate Maximum- Likelihoods Phylogenies: Assessing the performance of PhyML 3.0. Systematic Biology, 59(3), 307–321.

Hayes, E. B., Komar, N., Nasci, R. S., Montgomery, S. P., O’Leary, D. R., & Campbell, G. L. (2005). Epidemiology and transmission dynamics of West Nile virus disease. Emerging Infectious Diseases, 11(8), 1167–1173. doi:10.3201/eid1108.050289a.

Kozlov, a. M., Aberer, a. J., & Stamatakis, a. (2015). ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers. Bioinformatics, (March), 1–3. doi:10.1093/bioinformatics/btv184.

Lanciotti, R. S., Roehrig, J. T., Deubel, V., Smith, J., Parker, M., Steele, K., … Gubler, D. J. (1999). Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science (New York, N.Y.), 286(5448), 2333–2337. doi:10.1126/science.286.5448.2333.

Lim, S. M., Koraka, P., Osterhaus, A. D. M. E., & Martina, B. E. E. (2011). West Nile virus: Immunity and pathogenesis. Viruses, 3(6), 811–828. doi:10.3390/v3060811.

MacKenzie, J. S., & Williams, D. T. (2009). The zoonotic flaviviruses of southern, south-eastern and eastern Asia, and australasia: The potential for emergent viruses. Zoonoses and Public Health, 56(6-7), 338–356. doi:10.1111/j.1863-2378.2008.01208.x.

Mann, B. R., McMullen, A. R., Swetnam, D. M., & Barrett, A. D. T. (2013). Molecular epidemiology and evolution of West Nile virus in North America. International Journal of Environmental Research and Public Health, 10(10), 5111–5129. doi:10.3390/ijerph10105111.

May, F. J., Davis, C. T., Tesh, R. B., & Barrett, A. D. T. (2011). Phylogeography of West Nile virus: from the cradle of evolution in Africa to Eurasia, Australia, and the Americas. Journal of Virology, 85(6), 2964–2974. doi:10.1128/JVI.01963-10.

Schmidt, H. a, Strimmer, K., Vingron, M., & von Haeseler, A. (2002). TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics (Oxford, England), 18(3), 502–504. doi:10.1093/bioinformatics/18.3.502.

Subbotina, E. L., & Loktev, V. B. (2014). Molecular evolution of West Nile virus. Molecular Genetics, Microbiology and Virology, 29(1), 34–41. doi:10.3103/S0891416814010054.

Tsai, T. F., Popovici, F., Cernescu, C., Campbell, G. L., & Nedelcu, N. I. (1998). West Nile encephalitis epidemic in southeastern Romania. Lancet, 352(9130), 767–771. doi:10.1016/S0140-6736(98)03538-7.