miércoles, 24 de junio de 2015

Host Phylogeny of West Nile Virus: Does it shape the spatiotemporal structure?

The West Nile Virus (WNV) is a mosquito-born flavivirus that causes neurologic diseases such as encephalitis, meningitis, and acute flaccid paralysis (Lim, Koraka, Osterhaus, & Martina, 2011)⁠. Similar to other flaviviruses, WNV is an enveloped virus with a single-stranded, positive sense, ∼11-kb RNA genome whose strains are grouped into at least 7 genetic lineages. WNV was first isolated in Uganda in 1937. Posteriorly, the first large outbreak of West Nile neuroinvasive disease (WNND) was recorded in Romania in 1996, with 393 confirmed cases (Tsai, Popovici, Cernescu, Campbell, & Nedelcu, 1998)⁠. Three years later, it became a global public health concern after its introduction into North America, and subsequently into Central and South America (Lanciotti et al., 1999)⁠. Since then, major outbreaks of WNV fever and encephalitis took place in all continents, apart from Antarctica, causing human and animal deaths. Although its enzootic cycle is mainly maintained between mosquitoes and birds, it can eventually infect horses, humans, and other vertebrates (Hayes et al., 2005)⁠. Despite this variety of hosts, studies on the host structure and its influence on the spatiotemporal structure are still scarce. Since host genetic factors have a significant influence on disease distribution patterns, the overall purpose of this study is to assess the host structure of the phylogenetic relationships of WNV in a phylogeographic context, taking the spatiotemporal structure into account.

Specific Objectives
To identify the lineages of each viral strain.
To infer the main events of host-shift.
To determine the transmission paths within spatiotemporal structure.

1. Sequence Data: All the available sequences of complete genome of WNV, with collection times, and geographic locations will be retrieved from GenBank. In order to identify and delete recombinants, clones, and duplicates from the data base, I used Uclust v1.2.22q with 99 % of identity. Sequences of Ilheus virus (ILHV), Usutu virus (USUV), and Japanese encephalitis virus (JEV) will be used as the outgroup. Subsequently, all the WNV sequences will be aligned using the algorithm of multiple sequence alignment, implemented in MUSCLE v3.8.31 (Edgar, 2004).

2. Evolutionary rates: From this alignment, I will obtain a subset of 11 partitions, which correspond to the complete genome, and the genes that constitute it (C, prM/M, E, NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5). An exploration of evolution rates of every gene will be done using Distance Rates (DistR) method (Bevan, Lang, & Bryant, 2005)⁠ to get a first approach to the molecular evolution of the genes, as one of the key determinants of the occurrence of cross-species transmission (Longdon, Brockhurst, Russell, Welch, & Jiggins, 2014; Vrancken et al., 2015)⁠.

3. Lineages identification: The substitution model will be selected using Akaike information criterion with JmodelTest2 (Darriba, Taboada, Doallo, & Posada, 2012)⁠. With this model, a Maximum likelihood (ML) inference will be performed using ExaML v3.0.X, with 20 searches and 100 bootstrap replicates, which are considered as sufficient for large data sets (Kozlov, Aberer, & Stamatakis, 2015)⁠. Every lineage will be assumed as a monophyletic group as sugested by (MacKenzie & Williams, 2009)⁠, and all the obtained clades will be revised taking previous studies into account.

4. Phylodynamics: In order to evaluate every lineage independently, the data set will be down sampled. Thus, the tree topologies, model parameters, evolutionary rates, MRCA, viral population size variation over time will be co-estimated independently for the resultant lineages, using with an uncorrelated log-normal relaxed clock model (rationale given in (May, Davis, Tesh, & Barrett, 2011)⁠, and the MCMC method implemented in the BEAST package v1.8.2 (Drummond, Suchard, Xie, & Rambaut, 2012)⁠. Bayesian skyline plot will be used as a coalescent prior during the estimation over time of the change in effective population size per generation, per year (Ne.g). The MCMC analysis will be run twice for 50 million generations, with sampling every 10000. MCMC convergence will be measured by estimating the effective sampling size (ESS), using Tracer software version 1.5 (http://tree.bio.ed.ac.uk/software/tracer/). Uncertainties will be estimated as 95% high probability densities (95% HPD). The results for the two runs will be combined for final analysis and Bayesian Factor (BF) support for host shift. Transition rates supported by a BF > 3 will be considered as significant support for a host shift between species. The obtained topologies will be summarized in a maximum clade credibility (MCC) tree, and annotated by the use of TreeAnnotator (http://beast.bio.ed.ac.uk/treeannotator).

5. Host-Shift Events: To determine whether there is a stronger influence of cross species transmission (CST) in the genetic divergence over within species transmission, I will compute Genetic distances in PAUP* v.4.0b10 (http://paup.csit.fsu.edu/) using models of nucleotide substitution specific to the lineages, and compare them with a cutoff value. Subsequently, transmission of WNV will be quantified by Metropolis Coupled Markov Chain Monte Carlo (MC3) coalescent simulation of migration rates, implemented in the program Migrate-N v3.6 (Beerli & Palczewski, 2010)⁠. The model of transmission (whether asymmetrical, bi-directional, symmetrical, inter alia) will be assessed, and the transmission web will be visualized using this software. In order to estimate the potential of the strains to jump into new hosts (sensu (Frost & Volz, 2010)⁠, or to predict viral emergence, I will estimate the per capita cross-species transmission rate Rij, and the effective reproductive number of a pathogen Re.

6. Host Phylogeny and the spatiotemporal Structure: Genetic population predictors: Ne.g, Rij, and Rwill be plotted in function of time.


Beerli, P., & Palczewski, M. (2010). Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics, 185(1), 313–326. doi:10.1534/genetics.109.112532
Bevan, R. B., Lang, B. F., & Bryant, D. (2005). Calculating the evolutionary rates of different genes: a fast, accurate estimator with applications to maximum likelihood phylogenetic analysis. Systematic Biology, 54(6), 900–915. doi:10.1080/10635150500354829
Darriba, D., Taboada, G. L., Doallo, R., & Posada, D. (2012). jModelTest 2: more models, new heuristics and parallel computing. Nature Methods, 9(8), 772. doi:10.1038/nmeth.2109
Drummond, A. J., Suchard, M. a, Xie, D., & Rambaut, A. (2012). Bayesian P hylogenetics with BEAUti and the BEAST 1 . 7. Molecular Biology and Evolution, 29(8), 1969–1973. doi:10.1093/molbev/mss075
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. doi:10.1093/nar/gkh340
Frost, S. D. W., & Volz, E. M. (2010). Viral phylodynamics and the search for an “effective number of infections”. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 365(1548), 1879–1890. doi:10.1098/rstb.2010.0060
Hayes, E. B., Komar, N., Nasci, R. S., Montgomery, S. P., O’Leary, D. R., & Campbell, G. L. (2005). Epidemiology and transmission dynamics of West Nile virus disease. Emerging Infectious Diseases, 11(8), 1167–1173. doi:10.3201/eid1108.050289a
Kozlov, a. M., Aberer, a. J., & Stamatakis, a. (2015). ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers. Bioinformatics, (March), 1–3. doi:10.1093/bioinformatics/btv184
Lanciotti, R. S., Roehrig, J. T., Deubel, V., Smith, J., Parker, M., Steele, K., … Gubler, D. J. (1999). Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science (New York, N.Y.), 286(5448), 2333–2337. doi:10.1126/science.286.5448.2333
Lim, S. M., Koraka, P., Osterhaus, A. D. M. E., & Martina, B. E. E. (2011). West Nile virus: Immunity and pathogenesis. Viruses, 3(6), 811–828. doi:10.3390/v3060811
Longdon, B., Brockhurst, M. a, Russell, C. a, Welch, J. J., & Jiggins, F. M. (2014). The Evolution and Genetics of Virus Host Shifts. PLoS Pathogens, 10(11). doi:10.1371/journal.ppat.1004395
MacKenzie, J. S., & Williams, D. T. (2009). The zoonotic flaviviruses of southern, south-eastern and eastern Asia, and australasia: The potential for emergent viruses. Zoonoses and Public Health, 56(6-7), 338–356. doi:10.1111/j.1863-2378.2008.01208.x
May, F. J., Davis, C. T., Tesh, R. B., & Barrett, A. D. T. (2011). Phylogeography of West Nile virus: from the cradle of evolution in Africa to Eurasia, Australia, and the Americas. Journal of Virology, 85(6), 2964–2974. doi:10.1128/JVI.01963-10
Tsai, T. F., Popovici, F., Cernescu, C., Campbell, G. L., & Nedelcu, N. I. (1998). West Nile encephalitis epidemic in southeastern Romania. Lancet, 352(9130), 767–771. doi:10.1016/S0140-6736(98)03538-7
Vrancken, B., Lemey, P., Rambaut, A., Bedford, T., Longdon, B., Günthard, H. F., & Suchard, M. a. (2015). Simultaneously estimating evolutionary history and repeated traits phylogenetic signal: applications to viral and host phenotypic evolution. Methods in Ecology and Evolution, 6(1), 67–82. doi:10.1111/2041-210X.12293

Divergence time estimation using autocorrelated variation rate and independent rate, depending on the topology size using MULTIDIVTIME and BEAST.


Estimating divergence times can be performed by methods based on variation of substitution rates. The AR method establish that the rates of change between ancestor and descendant are autocorrelated and the rates can follow a log-normal distribution with a mean equal to the parent rate (Thorne, Kishino and Painter, 1998; Kishino, Thorne and Bruno, 2001; Thorne and Kishino, 2002) or can be determined by a non-central χ 2 distribution (Lepage et al., 2006). Another method used in molecular clock analysis is the RR method which assigns random independent rates for each lineage and these rates are drawn from a single underlying parametric distribution such as an exponential or log-normal (Drummond et al., 2006; Rannala and Yang, 2007; Lepage et al., 2007). RR method has been implemented in BEAST while the AR method has been implemented in MULTIDIVTIME. In terms of the programs, other difference is the priors on node ages which MULTIDIVTIME uses a dirichlet distribution without an explicit assumption about the biological processes (Kishino, Thorne and Bruno, 2001; Thorne and Kishino, 2002) and in the other hand BEAST uses a Yule prior and the Birth-Death prior (Yule, 1924; Rannala and Yang, 1996; Yang and Rannala, 1997). These two methods have been widely used to dating phylogenies and it is recommended to use several calibration points; however, it is not common to find a number of fossil equivalent to the number of nodes in the topology. At the moment, the minimum number of constraints to achieve correctness divergence times, has not been established.


General objective

  • To determine what the minimum number of calibration points is, related to the number of tips of a topology (size) using the Autocorrelated rate and Random rate methods.

Specifics objectives
  • To correlate the divergence time estimated with the divergence time simulated.
  • To determine the delta of variation when the number of points increase.
  • To assess the extent of the program to reconstruct the correct phylogeny.
  • To assess if the posterior probabilities is high in nodes where the divergence time estimated is approximately the same to the divergence time simulated.

1. Simulations

The trees will be simulated considering four different number of tips: 10, 25, 50 and 100, plus an age of 31Mya at the root, with the package phytools v.0.4-56 (Revell, 2012) in the R language and it will be replicated 10 times. Based on the these trees, the sequences will be simulated using the program Seq-gen v1.3.3 (Rambaut and Grass, 1997), under the model HKY, with base frequencies 0.30[T], 0.25[C], 0.30[A], 0.15[G]; a transition-transversion rate K=5, gamma parameter alpha = 0.5 (sensu Brown and Yang, 2010) and a length of 1000 bp.

2. Molecular clock analysis and calibration

The analyzes will be performed using the AR and RR methods respectively in MULTIDIVTIME v9.25.03 (Thorne and Kishino, 2002) (MDT) y BEAST v.1.8.2 (Drummond and Rambaut, 2007; Drummond, Suchard, Xie, and Rambaut, 2012). In MDT the divergence time will be calculated after 10⁶ generations employing a correlated relaxed lognormal clock. In BEAST the analyzes will be done under an uncorrelated relaxed lognormal clock, Yule speciation process, a normal distribution for the tmrca and different number of generations depending on the authors's recommendation about the relation between it and the number of tips. For both methods the calibration points will be chosen randomly, leaving the base node of the ingroup fixed to 30 Mya. The number of points will be replicated for every tree simulation.

3. Correlation and additional comparisons

The simulated times and the outcomes of these analyzes will be compared and related by a Pearson correlation for each topology of different size and replica. The values of common nodes will be correlated and the index of error in BEAST will be calculated for each reconstruction comparing the structure of the topology with the simulated one employing the Robinson & Foulds distance implemented in the package phangorn (Schliep, 2011) in the R language. Then, I will compare and graph the values of posterior probabilities of correct nodes with the probabilities of incorrect nodes.


Brown, R. P., & Yang, Z. (2010). Bayesian dating of shallow phylogenies with a relaxed clock. Systematic Biology, 59(2), 119–31. http://doi.org/10.1093/sysbio/syp082

Drummond, A. J., Ho, S. Y. W., Phillips, M. J., & Rambaut, A. (2006). Relaxed phylogenetics and dating with confidence. PLoS Biology, 4(5), e88. http://doi.org/10.1371/journal.pbio.0040088

Drummond, A. J., & Rambaut, A. (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 7(1), 214. http://doi.org/10.1186/1471-2148-7-214

Drummond, A. J., Suchard, M. A., Xie, D., & Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29(8), 1969–73. http://doi.org/10.1093/molbev/mss075

Kishino, H., Thorne, J. L., & Bruno, W. J. (2001). Performance of a Divergence Time Estimation Method under a Probabilistic Model of Rate Evolution. Molecular Biology and Evolution, 18(3), 352–361. http://doi.org/10.1093/oxfordjournals.molbev.a003811

Lepage, T., Bryant, D., Philippe, H., & Lartillot, N. (2007). A general comparison of relaxed molecular clock models. Molecular Biology and Evolution, 24(12), 2669–80. http://doi.org/10.1093/molbev/msm193

Lepage, T., Lawi, S., Tupper, P., & Bryant, D. (2006). Continuous and tractable models for the variation of evolutionary rates. Mathematical Biosciences, 199(2), 216–33. http://doi.org/10.1016/j.mbs.2005.11.002

R Core Team. (2014). R: A Language and Environment for Statistical Computing. Vienna, Austria. Retrieved from http://www.r-project.org/

Rambaut, A., & Grass, N. C. (1997). Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics, 13(3), 235–238. http://doi.org/10.1093/bioinformatics/13.3.235

Rannala, B., & Yang, Z. (1996). Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference. Journal of Molecular Evolution, 43(3), 304–311. http://doi.org/10.1007/BF02338839

Rannala, B., & Yang, Z. (2007). Inferring speciation times under an episodic molecular clock. Systematic Biology, 56(3), 453–66. http://doi.org/10.1080/10635150701420643

Revell, L. J. (2012). phytools: an R package for phylogenetic comparative biology (and other things). Methods in Ecology and Evolution, 3(2), 217–223. http://doi.org/10.1111/j.2041-210X.2011.00169.x

Schliep, K. P. (2011). phangorn: phylogenetic analysis in R. Bioinformatics (Oxford, England), 27(4), 592–3. http://doi.org/10.1093/bioinformatics/btq706

Thorne, J. L., & Kishino, H. (2002). Divergence time and evolutionary rate estimation with multilocus data. Systematic Biology, 51(5), 689–702. http://doi.org/10.1080/10635150290102456

Thorne, J. L., Kishino, H., & Painter, I. S. (1998). Estimating the rate of evolution of the rate of molecular evolution. Molecular Biology and Evolution, 15(12), 1647–1657. http://doi.org/10.1093/oxfordjournals.molbev.a025892

Yang, Z., & Rannala, B. (1997). Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. Molecular Biology and Evolution, 14(7), 717–724. http://doi.org/10.1093/oxfordjournals.molbev.a025811

Yule, G. U. (1925). A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213, pp. 21–87. Retrieved from http://www.jstor.org/stable/92117

JackknifEvolution for endemism areas

¿It can be used a Jackknife for endemism areas, and how it will be the result ?. To answer this question a programing code will be written for the Jackknife calculation on R plataform 3.2.0 (2015-04-16). An analysis of areas of endemism will be made for Amazonas using Primates distributions (based on Da Silva & Oren, 1995) in the program NDM/VNDM 3 (Goloboff, 2005). From the matrix data used for NDM/VNDM, two types of Jackknife will be implemented: Species Jackknife and Occurences Jackknife. For each Jackknife type a 25% and 50% resample will be used with 50 replications each one. Finally support values of each resample and Jacknnife type will be compared between them.
Silva, J. M. C. D., & Oren, D. C. (1996). Application of parsimony analysis of endemicity in Amazonian biogeography: an example with primates. Biological Journal of the Linnean Society, 59(4), 427-437.

Quinn, G. P., & Keough, M. J. (2002). Experimental design and data analysis for biologists. Cambridge University Press.
DA SILVA, J. O. S. É., CARDOSO, M., RYLANDS, A. B., FONSECA, D., & GUSTAVO, A. (2005). The fate of the Amazonian areas of endemism. Conservation Biology, 19(3), 689-694.
Szumik, C. A., & Goloboff, P. A. (2004). Areas of endemism: an improved optimality criterion. Systematic biology, 53(6), 968-977.

Diversification of the Amazon biota: Rivers as vicariant barriers.

-Download georeferenced distributions through the Global Biodiversity Information Facility (GBIF: http://www.gbif.org) for species of the Amazon basin (Aves, Magnoliophyta, Insecta, Squamata, Amphibia, Mammalia).

-Gathering phylogenetic clades for distributed species in the Amazon basin.

-Data cleaning under the protocol proposed by R-Alarcon., & Miranda-Esquivel. (in prep). 

-Delimitation of areas of endemism under the optimality criterion implemented in the program NDM/VNDM. 

-Localization the isolating barriers for different taxa with distribution in the Amazon basin using the Vicariance Inference Program (VIP) (Arias et al. 2011).

-Analysis of events under models (Ronquist, MC, Reconciled) implemented in TreeFitter 1.3B1 (Ronquist, 2002).

Goloboff, P. 2005. NDM/VNDM ver. 2.5. Programs for identification of areas of 
endemism. Programs and documentation available at www.zmuc.dk/public/phylogeny/endemism

Arias JS, Szumik CA, Goloboff PA (2011) Spatial analysis of vicariance: A method for using direct geographical information in historical biogeography. Cladistics 27: 617–628. doi: 10.1111/j.1096-0031.2011.00353.

Ronquist, F. 2002. TREEFITTER, version 1.3b. Computer program and manual available by anonymous FTP from Uppsala University.