lunes, 21 de septiembre de 2015

Inferring Ancestral Areas

Can they be inferred in an ancestral-states reconstruction?

Even though this question has been addressed in several ways, the answer is yes, they can! However, there are a few concepts that need to be accounted first.

Ancestral area has been defined as the center of origin of the diversification of a clade (Bremer, 1992)⁠⁠. In other words, it constitutes the ancestral ranges of distribution of a monophyletic group. Establishing this area is one of the central questions in historical biogeography, especially if the objective of a particular study is to assess the contribution of vicariance and dispersal to the speciation and distribution of a group of organisms (F Ronquist & Ronquist, 1995)⁠.

In order to achieve such purpose, two problems have to be considered when doing a biogeographical analysis: Earth history, which seeks to establish area relationships based on the phylogenies of at least two taxa inhabiting the areas of interest, thus using areas as taxa, and taxa as characters; and Taxon history, who aims to infer the biogeographic patterns and events that shaped the history of taxa (Hovenkamp, 1997)⁠. The main forethought with the latter approach, is that inferences are restricted to general patterns, and they can not be assumed arbitrarily.

Areas of Endemism
Regarding the first problem of biogeographical analysis, if one is using the taxon-as-area analogy, it presupposes the existence of discrete areas, as an example, I will call areas of endemism (Hovenkamp, 1997)⁠⁠, who are the first subject of investigation you should include in your analysis. The term “area of endemism” refers to a particular pattern of distribution delimited by the distributional congruence of at least two taxa (Platknick, 1991).

There are several methodological proposals to identify these areas, I am listing two of them, which have different theoretical backgrounds but as any other quantitative method use the distribution of species as data: Parsimony Analysis of Endemicity PAE (Morrone, 1994)⁠⁠, and Endemicity Analysis EA (C. a Szumik, Cuezzo, Goloboff, & Chalup, 2002; C. Szumik & Goloboff, 2004)⁠. PAE groups hierarchically groups area units based on their shared species, using the maximum-parsimony criterion. EA, on the other hand, identifies areas of endemism by assessing the congruence among species distributions, following an optimality criterion. The congruence between a species distribution and a given area is measured by an Endemicity Index EI ranging from 0 to 1. This proposal is implemented in NDM/VNDM (Goloboff, 2005)⁠, and currently it manages to run analyses for higher taxa (C. A. Szumik & Goloboff, 2015)⁠.

Ancestral Areas: Concepts and Models
Once you have established your areas of endemism, you can proceed with the reconstruction of ancestral areas, treating the areas you obtained as discrete characters. For this matter, there has been a long discussion about what definition, approach, and method to use. It formally started with Hennig (1950), who proposed the chorological progression rule, which assumes progression in the areas parallel to the progression in the characters in the cladogram, so that the areas inhabited by primitive species are deemed to be ancestral, whereas the areas inhabited by apomorphic species are situated far away from the center of origin. This rule is based on the assumption that peripatric speciation model is common in nature. Nonetheless, there are many exceptions to this rule.

Bremer(1992), assumed that areas or regions could be treated as binary irreversible characters we could analyze separately (each area as a character), optimizing it to a tree, using Camin-Sokal Parsimony to see which areas were most parsimoniously explained as being part of the ancestral area. Therefore, the selection criterion was that the areas that required the fewest independent losses relative to gains on the cladogram would be the ones most likely to be the central area of the clade.
This approach was ratified by Ronquist(1994). Nonetheless (Ronquist, 1995)⁠ discussed the notion of areas treated as irreversible characters, for it would be only valid if dispersal was irreversible and a region could not be subsequently invaded. Thence, he considered that allowing dispersal events to occur as unordered, reversible events would be more realistic. Therefore, he proposed Fitch's parsimony to optimize characters to trees. These approaches had many conceptual problems because they search for replicated areas in basal clades, ignoring homology and using paralogy to weigh areas and locate centers of origin (Ebach, 1999)⁠, thus tend to overestimate the areas that had less extinction processes, and fail dentifying an area as ancestral.

Dispersal Vicariance Analysis DIVA
This method was presented by (Fredrik Ronquist, 1997)⁠. It uses optimizations with reversible parsimony for estimating ancestral areas. DIVA searches ancestral areas using a three-dimensional cost matrix that gives different costs to events, minimizing the dispersal events needed for explaining the distributions. Unlike previous models, it focuses on mapping area distributions onto the phylogeny, and vicariance events have no cost, whereas dispersals and extinctions have a cost of one per area unit added to the distribution. The optimal reconstruction(s) are those requiring the minimal number of dispersal events. Since this method does not take dispersion into account, and always assumes that speciation is due to vicariance, it represents a problem if species have not followed this sort of event. Due to the aforementioned, it does not model extinction and range expansions, reason what it has been criticized (Kodandaramaiah, 2010)⁠⁠.

Despite these problems, it has been a powerful approach for inferring reticulate biogeographic scenarios that include different combination of events over time, such as the diversification in the Holartic (Sanmartín, 2001)⁠. Two statistical extensions of this model have been proposed: S-DIVA (Yu, Harris, & He, 2010)⁠, that evaluates the alternative ancestral ranges at each node in a tree accounting for phylogenetic uncertainty and uncertainty in DIVA optimization using an statistical framework, and Bayes-DIVA (Nylander, Olsson, Alström, & Sanmartín, 2008)⁠ which uses DIVA to perform reconstructions at all nodes that occur in a summary topology. Bayes-DIVA has been implemented in S-DIVA.

DEC Dispersal Extinction Cladogenesis
This is a continuous-time model for geographic range evolution that enables the inference of ancestral ranges in a likelihood framework (Ree, Moore, Webb, & Donoghue, 2005; Ree & Smith, 2008)⁠⁠. Range contractions and expansions are caused by dispersal to an unoccupied area and local extinction within an area. Given a phylogeny, the distribution of the taxa involved, and an explicit model of Dispersal-extinction and cladogenesis, dispersal and extinction rates are calculated using maximum likelihood. With this model, probabilities of range transitions are computed as a function of time, enabling free parameters in the model, rates of dispersal, and local extinction to be estimated by maximum likelihood. This model can be extended by incorporating fossil and geological information into the rate matrix, which is allowed to vary over time. Also, dating uncertainty can be accommodated by integrating DEC reconstructions over a Bayesian Inference posterior sample of dated trees.

To cite a few examples: (Smith, 2009)⁠⁠ examined uncertainty of divergence-time in a parametric biogeographical analysis of the Northern Hemisphere plant clade Caprifoliea; (Smedmark, Eriksson, & Bremer, 2010)⁠⁠, explored how uncertainty in estimated divergence times affects conclusions in biogeographical analysis, using the group Urophylleae, which has a disjunct pantropical distribution. DEC model has been implemented in Lagrange( Although it is considered a merely realistic model, it does not work efficiently when using more than 7 areas (Fredrik Ronquist & Sanmartín, 2011).

It is a Bayesian approach for inferring biogeographic history that extends the application of biogeographic models to the analysis of problems that involve a large number of areas (Landis, Matzke, Moore, & Huelsenbeck, 2013)⁠.

S-DIVA, DEC, and BayArea are implemented in the software RASP, which offers a graphical user interface (GUI) to specify a phylogenetic tree or set of trees and geographic distribution constraints, draws pie charts on the nodes of a phylogenetic tree to indicate levels of uncertainty, and generates exportable graphical results (Yu, Harris, Blair, & He, 2015)⁠.
is an R package, authored by Nicholas J. Matzke, that was designed to perform inference of biogeographic history on phylogenies, and also model testing, which includes dispersal, vicariance, founder-event speciation (free parameter j), DEC, DIVA, and BAYAREA, inter alia (Matzke, 2013)⁠. The advantage of using this package is that you can compare the probabilities of each model, and measure the effects of the parameters you use for each model. (Matzke NJ, 2014)⁠, encourages to test the founder-event parameter for the speciation of Island Clades.

Using Altitudinal and bathymetric data: Could that be an alternative?

Regarding the possibility of using other sort of data, such as altitudinal and bathymetric, there are studies that use these type of data. (Yesson, Yesson, & Culham, 2007)⁠ used distribution data and inferred climate preferences to determine the potential distribution of species in the past, present and future, which they called: Phyloclimatic Modelling (Yesson & Culham, 2011)⁠. This proposal was applied in the study of the biogeography of the garden plant Cyclamen. 

This approach is similar to that of (Vasconcelos, Rodríguez, & Hawkins, 2011)⁠, who used a cluster analysis of richness, topography and climate to determine the variable that most affects the distribution pattern of Amphibians in South America, thus, delimiting a new scheme of regionalization. (Brumfield & Edwards, 2007)⁠, on the other hand, reconstructed the ancestral area and inferred the shift from lowlands to highlands based on the elevations at which each species of Thamnophilus was most commonly observed in the field. Nonetheless, as exposed by the authors, the discrete coding scheme they used did not account for variance in habitat distributions, but found the ‘optimal’ elevation for each.

In summary, you can infer the ancestral area of your clade of study via ancestral-states reconstruction method. Also, you can do the reconstruction accounting for altitudinal or bathymetryc data if you discretize the ranges to use. What you can do is first, searching dated phylogenies or date them in case you do not feel positively sure about them. Then, get the distributional data of the species. Once you have done this, establish the biogeographic areas(preferably areas of endemism), or ecoregions in which you will test your hypothesis(es). Use these areas to discretize the distribution of your taxa, and make sure these areas do not overlap. Run an evaluation of patterns and events (vicariance, dispersion, extinction), and with your results, you can infer the ancestral area(s). I personally prefer using DEC model, for it accounts on probabilities of different events. Yet, the main problem would be if you need to include a large number of areas into your analysis.


Bremer, K. (1992). Ancestral areas: a cladistic reinterpretation of the center of origin concept. Systematic Biology, 41(4), 436–445. doi:10.2307/2992585
Brumfield, R. T., & Edwards, S. V. (2007). Evolution into and out of the Andes: A Bayesian analysis of historical diversification in Thamnophilus antshrikes. Evolution, 61(2), 346–367. doi:10.1111/j.1558-5646.2007.00039.x
Ebach, M. C. (1999). Paralogy and the Centre of Origin Concept. Cladistics, 15(4), 387–391. doi:10.1006/clad.1999.0118
Goloboff, P. (2005). NDM/VNDM v. 2.5. Programs for identification of areas of endemism. Retrieved from
Hovenkamp, P. (1997). Vicariance Events, not Areas, Should be Used in Biogeographical Analysis. Cladistics, 13(1–2), 67–79. doi:10.1006/clad.1997.0032
Kodandaramaiah, U. (2010). Use of dispersal-vicariance analysis in biogeography - A critique. Journal of Biogeography, 37(1), 3–11. doi:10.1111/j.1365-2699.2009.02221.x
Landis, M. J., Matzke, N. J., Moore, B. R., & Huelsenbeck, J. P. (2013). Bayesian analysis of biogeography when the number of areas is large. Systematic Biology, 62(6), 789–804. doi:10.1093/sysbio/syt040
Matzke, N. J. (2013). Probabilistic historical biogeography: new models for founder-event speciation, imperfect detection, and fossils allow improved accuracy and model-testing. Frontiers of Biogeography, 5(4), 242–248. doi:10.5811/westjem.2011.5.6700
Matzke NJ. (2014). Model selection in historical biogeography reveals that founder-event speciation is a crucial process in island clades. Syst Biol, (793117552). doi:10.1080/10635150490522232
Morrone, J. J. (1994). On the identifications of areas of endemism. Systematic Biology. doi:10.1093/sysbio/43.3.438
Nylander, J. a a, Olsson, U., Alström, P., & Sanmartín, I. (2008). Accounting for phylogenetic uncertainty in biogeography: a Bayesian approach to dispersal-vicariance analysis of the thrushes (Aves: Turdus). Systematic Biology, 57(2), 257–268. doi:10.1080/10635150802044003
Ree, R. H., Moore, B. R., Webb, C. O., & Donoghue, M. J. (2005). A likelihood framework for inferring the evolution of geographic range on phylogenetic trees. Evolution; International Journal of Organic Evolution, 59(11), 2299–2311. doi:10.1554/05-172.1
Ree, R. H., & Smith, S. a. (2008). Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. Systematic Biology, 57(1), 4–14. doi:10.1080/10635150701883881
Ronquist, F. (1997). Dispersal-vicariance analysis: A new approach to the quantification of historical biogeography. Systematic Biology, 46(1), 195–203. doi:10.2307/2413643
Ronquist, F., & Ronquist, F. (1995). Ancestral Areas Revisited. Systematic Biology, 44(4), 572–575.
Ronquist, F., & Sanmartín, I. (2011). Phylogenetic Methods in Biogeography. Annual Review of Ecology, Evolution, and Systematics, 42(1), 441–464. doi:10.1146/annurev-ecolsys-102209-144710
Sanmartín, I. (2001). Patterns of animal dispersal, vicariance and diversification in the Holarctic. Biological Journal of the Linnean Society, 73(4), 345–390. doi:10.1006/bij1.2001.0542
Smedmark, J. E. E., Eriksson, T., & Bremer, B. (2010). Divergence time uncertainty and historical biogeography reconstruction - an example from Urophylleae (Rubiaceae). Journal of Biogeography, 37(12), 2260–2274. doi:10.1111/j.1365-2699.2010.02366.x
Smith, S. a. (2009). Taking into account phylogenetic and divergence-time uncertainty in a parametric biogeographical analysis of the Northern Hemisphere plant clade Caprifolieae. Journal of Biogeography, 36(12), 2324–2337. doi:10.1111/j.1365-2699.2009.02160.x
Szumik, C. a, Cuezzo, F., Goloboff, P. a, & Chalup, A. E. (2002). An optimality criterion to determine areas of endemism. Systematic Biology, 51(5), 806–816. doi:10.1080/10635150290102483
Szumik, C. A., & Goloboff, P. A. (2015). Cladistics Higher taxa and the identification of areas of endemism. Cladistics, 1–5.
Szumik, C., & Goloboff, P. (2004). Areas of endemism: an improved optimality criterion. Systematic Biology, 53(6), 968–77. doi:10.1080/10635150490888859
Vasconcelos, T. D. S., Rodríguez, M. Á., & Hawkins, B. A. (2011). Biogeographic Distribution Patterns of South American Amphibians: A Regionalization Based on Cluster Analysis. Natureza & Conservação, 9(1), 67–72. doi:10.4322/natcon.2011.008
Yesson, C., & Culham, A. (2011). Biogeography of cyclamen: an application of phyloclimatic modelling. Retrieved from
Yesson, C., Yesson, C., & Culham, A. (2007). Phyloclimatic Modelling Can Estimate Ancestral Areas. Nature Precedings, (1997), 2006. doi:10.1038/npre.2007.478.1
Yu, Y., Harris, a. J., & He, X. (2010). S-DIVA (Statistical Dispersal-Vicariance Analysis): A tool for inferring biogeographic histories. Molecular Phylogenetics and Evolution, 56(2), 848–850. doi:10.1016/j.ympev.2010.04.011
Yu, Y., Harris, A. J., Blair, C., & He, X. (2015). RASP (Reconstruct Ancestral State in Phylogenies): A tool for historical biogeography. Molecular Phylogenetics and Evolution, 87, 46–49. doi:10.1016/j.ympev.2015.03.008