Introduction
The
West Nile Virus (WNV) is a mosquito-born flavivirus that causes
neurologic diseases such as encephalitis, meningitis, and acute
flaccid paralysis (Lim, Koraka, Osterhaus, & Martina, 2011).
Similar to other flaviviruses, WNV is an enveloped virus with a
single-stranded, positive sense, ∼11-kb RNA genome whose strains
are grouped into at least 7 genetic lineages. WNV was first isolated
in Uganda in 1937. Posteriorly, the first large outbreak of West Nile
neuroinvasive disease (WNND) was recorded in Romania in 1996, with
393 confirmed cases (Tsai, Popovici, Cernescu, Campbell, &
Nedelcu, 1998). Three years later, it became a global public
health concern after its introduction into North America, and
subsequently into Central and South America (Lanciotti et al.,
1999). Since then, major outbreaks of WNV fever and
encephalitis took place in all continents, apart from Antarctica,
causing human and animal deaths. Although its enzootic cycle is
mainly maintained between mosquitoes and birds, it can eventually
infect horses, humans, and other vertebrates (Hayes et al., 2005).
Despite this variety of hosts, studies on the host structure and its
influence on the spatiotemporal structure are still scarce. Since
host genetic factors have a significant influence on disease
distribution patterns, the overall purpose of this study was to
assess the host structure of the phylogenetic relationships of WNV in
a phylogeographic context, taking the spatiotemporal structure into
account.
Specific
Objectives
To
identify the lineages of each viral strain.
To
infer the main phylogeographic events.
To determine the host shift
events within spatiotemporal structure.
Methods
Sequence
Data: All the available sequences of complete genome of WNV,
with collection times, and geographic locations( 453 sequences,
from 25 countries, and 79 hosts species) were retrieved
from GenBank. In order to identify and delete recombinants, clones,
and duplicates from the data base, I used Uclust v1.2.22q with 99 %
of identity (Edgar, 2010). A sequence of Japanese
encephalitis virus (JEV) was used as the outgroup. Subsequently, all
the WNV sequences were aligned using the algorithm of multiple
sequence alignment, implemented in MUSCLE v3.8.31 (Edgar, 2004). The
substitution model for the Envelope gene sequences was
selected using Akaike information criterion with PhyML (Guindon,
Dufayard, Lefort, & Anisimova, 2010), called from the
function phymltest{ape} (R Core Team, 2014). Phylogenetic signal
was calculated for both complete coding sequence and E gene sequence,
using TreePuzzle v.5.3(Schmidt, Strimmer, Vingron, & von
Haeseler, 2002).
Lineages
identification: A Maximum likelihood (ML) inference with
the complete coding sequence, was performed using RaxML,
with 20 searches and 100 bootstrap replicates, which are considered
as sufficient for large data sets (Kozlov, Aberer, &
Stamatakis, 2015). Every lineage was assumed as a monophyletic
group as sugested by (MacKenzie & Williams, 2009), and
all the obtained clades will be revised taking previous studies into
account.
Phylodynamics: Topologies,
model parameters, evolutionary rates, TMRCA, viral population
size variation over time were co-estimated for the E
gene sequences dataset, using an uncorrelated log-normal relaxed
clock model (rate: 0.053(Subbotina & Loktev,
2014)(rationale given in (May, Davis, Tesh, & Barrett,
2011), and the MCMC method implemented in the BEAST package
v1.8.2 (Drummond, Suchard, Xie, & Rambaut, 2012). I
set up a phylogeographic Bayesian stochastic search variable
selection (BSSVS) procedure for location data, and host as discrete
traits, for this approach assumes exchange rates in the
continuous-time Markov Chain(CTMC) to be zero with the prior
probability, in order to find the most parsimonious set of rates
explaining the diffusion process along the phylogenies. Bayesian
skyline plot was used as a coalescent prior during the
estimation over time of the change in effective population size
per generation, per year (Ne.g). The MCMC analysis was run twice
for 50 million generations, with sampling
every 10000. MCMC convergence was measured by
estimating the effective sampling size (ESS), using Tracer software
version 1.5
(http://tree.bio.ed.ac.uk/software/tracer/). Uncertainties were estimated as
95% high probability densities (95% HPD).
Host-Shift
Events: The results for the two runs were combined for
final analysis and BF support for host shift. Transition rates
supported by a BF > 3 will be considered as significant support
for a host shift between species. The obtained topologies will
be summarized in a maximum clade credibility (MCC) tree, and
annotated by the use of TreeAnnotator
(http://beast.bio.ed.ac.uk/treeannotator).
Phylogeographic
analysis: I used the software SPREAD v.1.7(Bielejec, Rambaut,
Suchard, & Lemey, 2011), to visualize the diffusion rates over
time. Locations were assigned by two capital letters using the ISO
31-66 alpha code, and coordinates corresponded to the centroids of
each country. Bayes Factor (BF) test was run to get the support of diffusion rates among localities (Carlo,
Pagel, Meade, Pagel, & Meade, 2013; Lemey, Rambaut, Drummond, &
Suchard, 2009).
Results
and Discussion
WNV lineages
I
obtained a final dataset of 52 sequences, from 19 countries and 24
host species. The ML tree(Figure 1), showed the seven lineages
previously proposed (MacKenzie & Williams, 2009; Mann, McMullen,
Swetnam, & Barrett, 2013): Ia(20 sequences), II(9 sequences),
IV(6 sequences), Ia2(10 sequences), Ib(2 sequences), Ia3(2
sequences), Ia1(3 sequences).
Figure
1. Phylogeny inferred using a Maximum likelihood analysis of
53 sequences. Numbers correspond to bootstrapping values.
Spatio-temporal structure
Lineages
1a, 1a1-3 contain isolates from Africa, Europe, the Middle East,
Russia, and the Americas, and includes isolates from all recent
outbreaks. Phylogeny in (Figure 2) shows the spatio-temporal history,
in which WNV exists in an endemic cycle for certain areas such as
Australia, whereas it is epidemic in Europe, being reintroduced
regularly from Africa either directly (in western Europe) or via the
Middle East (Pesko & Ebel, 2012). Estimations of the TMRCA of
lineages 1a,1-3(Table 1), are supported with the records of outbreaks
and introductions for the clade (Amore et al., 2010; Gubler, 2007;
Jerzak, Bernard, Kramer, & Ebel, 2005). Significantly,
introduction into other geographic areas has occurred on one occasion
only in each region, leading to subsequent establishment and
expansion of the virus in these areas. WNV was successfully
introduced in the Americas in 90's and subsequently became endemic
across most temperate regions of North America (Amore et al., 2010;
May, Davis, Tesh, & Barrett, 2011).
Lineage
II, on the other hand, has been associated with outbreaks of West
Nile virus in Western and Eastern Europe, and appears to have
established endemic cycles in Spain and Greece (Pesko & Ebel,
2012). The estimated TMRCA (112.2 (1900)) shows the origins of this
lineage might be older than reported (Botha et al., 2008). Lineage IV
groups numerous isolates made in Russia, first detected in 1988 from
a Dermacentor tick, and since isolated from mosquitoes and frogs in
2002 and 2005 in Russia (May et al., 2011). Estimation of TMRCA for
this lineage (26.4 (1985)) is consistent to the reported dates of
introduction in Western Europe.

Figure
2. WNV Spatio-temporal structural. Topology corresponds to
Maximum Credibility Tree (MCC) for all lineages in time. Values in
nodes are posterior probabilities. In legend, countries are
represented by two capital letters using the ISO 3166-1 alpha
code.
Table 1. Estimation of the TMRCA for each lineage.
Stat |
Ia |
Ia2 |
Ia3 |
II |
IV |
Ia1 |
TMCRA |
89.3 (1922) |
74.1 (1938) |
42.8 (1970) |
112.2 (1900) |
26.4 (1985) |
74.2 (1937) |
95% HPD |
[69.7,114.5] |
[32.3, 107.8] |
[13.4,91.6] |
[66.5,176.7] |
[14.1,48.8] |
[46.1, 107.3] |
In general, the phylogeographic structure of the virus was recovered for the sampled lineages, and estimations of the TMCRA were consistent to the reported dates of introductions. Distributions and spatial structure show that several lineages(except Ia1), have been reintroduced to locations of the past.
Phylogeography
The
discrete phylogeography analysis shows major centers of spread of WNV
in different regions. The interactive kml file can be found in
anexes. BF>3 support major migration rates between Western
Europe and the Middle East, Mediterranean region, and the Americas;
and Africa with Western Europe. See anexes for BF information.
Figure
3. Representation of the geographic diffusion process of WNV
throughout the time.
Host shift events
Host shift between mosquitoes and other metazoans shapes the phylogenetic structure of each clade (Figure 4). On the other hand, BF analysis indicates the probabilities to migrate from one host onto another, with remarkably strong evidence (BF>3). See anexes.
Figure
4. Phylogeny with Host structure for WNV. Topology
corresponds to Maximum Credibility Tree (MCC) for all lineages in
time. Values on branches represent the posterior probability.
Thickness of branches represents the probability of host shift for
each node. Legend names correspond to the abbreviations of species'
names.
Conclusions
Host-shift
events for the WNV can be observed in the phylogeny. Rates support
shifts from Culex mosquitoes to humans, birds,
horses, and Urotaenia fish, with BF values over 5.
Merely
good estimations of WNV history can be done from a small dataset, as
the one used for this study, when a representative sample is obtained
in terms of host species information and spatio-temporal structure.
Anexes
A1. Phylogenetic signal from the downsampled dataset used to run analysis in this study, corresponding to: A. Envelope gene, B. complete coding sequence.
A2. Datasets used in the study.
A.2.1. Full
A.2.2. Downsampled 1
A.2.3. Downsampled 2
A3. kml file with the visualization of the probable migratory paths of WNV over time.
A4. Bayesian Factors for geographical diffusion rates.
A5. Bayesian Factors for host-shift rates.
A1-A5 can be accessed here
https://www.dropbox.com/sh/7djz0api54h7dnx/AAA-9i0OVWd1UR-q4Eiv8ASIa?dl=0
A6. Host species abbreviations
host |
host abbreviation |
Homo sapiens |
Hs |
Culex annulirostris |
Ca |
Sylvia nisoria |
Sn |
Rousettus leschenaultii |
Rl |
Anopheles maculipennis |
Am |
Mus musculus |
Mm |
Equus caballus |
Ec |
Culex pipiens |
Cp |
Culex univittatus |
Cu |
Dermacentor marginatus |
Dm |
Columba livia |
Cl |
Phalacrocorax carbo |
Pc |
Ochlerotatus sticticus |
Os |
Aedes vexans |
Av |
Culex nigripalpus |
Cn |
Corvus corone |
Cc |
Uranotaenia unguiculata |
Uu |
Culex quinquefasciatus |
Cq |
Mimus polyglottos |
Mp |
Corvus brachyrhynchos |
Cb |
Aquila chrysaetos |
Ac |
Phoenicopterus ruber |
Pr |
Culex tarsalis |
Ct |
Accipiter gentilis |
Ag |
References
Bielejec,
F., Rambaut, A., Suchard, M. a., & Lemey, P. (2011). SPREAD:
Spatial phylogenetic reconstruction of evolutionary dynamics.
Bioinformatics, 27(20), 2910–2912.
doi:10.1093/bioinformatics/btr481.
Drummond,
A. J., Suchard, M. a, Xie, D., & Rambaut, A. (2012). Bayesian P
hylogenetics with BEAUti and the BEAST 1 . 7. Molecular Biology
and Evolution, 29(8), 1969–1973.
doi:10.1093/molbev/mss075.
Edgar,
R. C. (2010). Search and clustering orders of magnitude faster than
BLAST. Bioinformatics, 26(19), 2460–2461.
doi:10.1093/bioinformatics/btq461.
Guindon,
S., Dufayard, J.-F., Lefort, V., & Anisimova, M. (2010). New
Alogrithms and Methods to Estimate Maximum- Likelihoods Phylogenies:
Assessing the performance of PhyML 3.0. Systematic Biology,
59(3), 307–321.
Hayes,
E. B., Komar, N., Nasci, R. S., Montgomery, S. P., O’Leary, D. R.,
& Campbell, G. L. (2005). Epidemiology and transmission dynamics
of West Nile virus disease. Emerging Infectious Diseases,
11(8), 1167–1173. doi:10.3201/eid1108.050289a.
Kozlov,
a. M., Aberer, a. J., & Stamatakis, a. (2015). ExaML Version 3: A
Tool for Phylogenomic Analyses on Supercomputers. Bioinformatics,
(March), 1–3. doi:10.1093/bioinformatics/btv184.
Lanciotti,
R. S., Roehrig, J. T., Deubel, V., Smith, J., Parker, M., Steele, K.,
… Gubler, D. J. (1999). Origin of the West Nile virus responsible
for an outbreak of encephalitis in the northeastern United States.
Science (New York, N.Y.), 286(5448), 2333–2337.
doi:10.1126/science.286.5448.2333.
Lim,
S. M., Koraka, P., Osterhaus, A. D. M. E., & Martina, B. E. E.
(2011). West Nile virus: Immunity and pathogenesis. Viruses,
3(6), 811–828. doi:10.3390/v3060811.
MacKenzie,
J. S., & Williams, D. T. (2009). The zoonotic flaviviruses of
southern, south-eastern and eastern Asia, and australasia: The
potential for emergent viruses. Zoonoses and Public Health,
56(6-7), 338–356. doi:10.1111/j.1863-2378.2008.01208.x.
Mann,
B. R., McMullen, A. R., Swetnam, D. M., & Barrett, A. D. T.
(2013). Molecular epidemiology and evolution of West Nile virus in
North America. International Journal of Environmental Research and
Public Health, 10(10), 5111–5129.
doi:10.3390/ijerph10105111.
May,
F. J., Davis, C. T., Tesh, R. B., & Barrett, A. D. T. (2011).
Phylogeography of West Nile virus: from the cradle of evolution in
Africa to Eurasia, Australia, and the Americas. Journal of
Virology, 85(6), 2964–2974. doi:10.1128/JVI.01963-10.
Schmidt,
H. a, Strimmer, K., Vingron, M., & von Haeseler, A. (2002).
TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets
and parallel computing. Bioinformatics (Oxford, England),
18(3), 502–504. doi:10.1093/bioinformatics/18.3.502.
Subbotina,
E. L., & Loktev, V. B. (2014). Molecular evolution of West Nile
virus. Molecular Genetics, Microbiology and Virology, 29(1),
34–41. doi:10.3103/S0891416814010054.
Tsai,
T. F., Popovici, F., Cernescu, C., Campbell, G. L., & Nedelcu, N.
I. (1998). West Nile encephalitis epidemic in southeastern Romania.
Lancet, 352(9130), 767–771.
doi:10.1016/S0140-6736(98)03538-7.