Performance of model-based inference methods for reconstructing the evolution of geographic ranges varying maximum range size.
Description of each model
DIVA: Dispersion-Vicariance analysis.
Proposed by Ronquist in 1997, it method uses a three-dimension matrix instead of one of two dimensions, it because the cost of an event depends on the particular combination of descendant distribution. The events taken into account are vicariance, dispersion and extinction. For this model, speciation is explain as vicariance and has cost zero, follow the idea of “the cost of an event should be inversely related to the probability of that particular event occurring.” (Ronquist, 1994), thus, DIVA maximize the vicariance. For DIVA dispersion is the addition of an area, and the cost is 1 per area added, contrary to extinction that is the elimination of an area but the cost is the same, 1 per area eliminated. The cost of dispersion-extinction in the matrix can be described as the sum of the difference between the union and the intersection among the distribution of the ancestral node and its two descendants.
Once the matrix is defined, the reconstruction proceeds similar to that of an ordinary step matrix optimization. First, the observed distributions are assigned to the terminal taxa and then, all the possible combinations for the remaining nodes. The optimization is divided in three parts, downpass (from tips to root), uppass (from root to tips) and final pass, where the array of a node is the product of combining its uppass array with its downpass array.
DEC: Dispersal, Local Extinction, and Cladogenesis model.
Proposed by Ree et al., in 2005; 2008, it method takes into account other types of speciation and not maximize vicariance. In contrast, in the absence of lineage divergence, the ranges can evolve by two stochastic processes: dispersion (range expansion) and local extinction (range contraction). This method work with dispersal rates and local extinction rates that can be used to construct the matrix of instantaneous transition rates between geographic ranges. It can assume that the rate of expansion or contraction from one area to other is a the sum of this rates. As a model based on maximum likelihood contrary to DIVA that is base on parsimony, this method accepts some prior probabilities for range inheritance scenarios. Once the matrix of instantaneous transition rate is constructed and the range evolution is put in term of prior probablities (e.g. flat priors (Ree et al., 2005)) then, the inference of ancestral ranges is done exactly as for character data but integrating over the conditional likelihoods of range inheritance scenarios.
When Ree et al., 2005 proposed a likelihood framework for inferring the evolution of geographic range, they argue that DIVA must be used only if the relationship of the areas are not known because, for this type of analysis if you know characteristics of the areas as times, history and geology (Matzke, 2013), it has to be introduce in the analysis as parameters or priors. The difference between DIVA and DEC is that the last one requires explicit dispersion and the vicariance is not favored.
The methods describe above are implemented in different programs, Dispersion-Vicariance analysis in DIVA and the DEC model in LAGRANGE. But recently, Nicholas Matzke developed the R package BioGeoBEARS (BioGeographic Bayesiean Evolutionary Analysis of RangeS) in which was implemented the essential features of LAGRANGE and a likelihood interpretation of DIVA.
For both methods the number of areas n plays a key role in the estimation because, it defines the size of the cost matrix given by (2n -1)3, thus the number of comparisons is proportional to the size of the matrix. With this in mind, Matzke, 2012 talk about setting different constraints as a step in the parametrization of the model, such dispersal limits, or a maximum range size. The aim of set a maximum range size is to limit the number of possible states in the matrix, whose growth is exponential.
For example, if you have 4 areas and set the maximum range size as 4, 3 and 2 the number of states is 16 or 15 (depending on including null range), 15 and 11, respectively. There is not much difference, but when you have 10 areas this constraint help to reduce the number of states, with a maximum range size of 10, 4 and 2 the results are 1024, 386, 56 states respectively.
To explore the influence of the maximum range size in the ancestral area reconstruction when the matrix of distribution (presence or absence in the area) is different. I chose 3 topologies of different number of terminal but, nested, and four distributional matrix with 5 areas for each one. Then, I used likelihood approaches, both implemented in the R package BioGeoBEARS (Matzke, 2012), DEC and DIVALIKE model, to evidence the differences in the LnL (Likelihood natural logarithmic) between 3 or 4 as maximum range size, and to compare the distribution assigned at each node.
(((Spp_1:1, Spp_2:1):1,(Spp_3:1, Spp_4:1):1):1, Spp_5:3);
(((Spp_1:1, Spp_2:1):1,(Spp_3:1, Spp_4:1):1):1, (Spp_5:1, Spp_6:1):2);
((((Spp_1:1, Spp_2:1):1,(Spp_3:1, Spp_4:1):1):1, (Spp_5:1, Spp_6:1):2):1, Spp_7:4);
As results, I can find that the estimation depends on the taxa distribution and how well is sampled. The LnL for 3 as maximum range size was less than when 4 max range size was used, except in widespread cases in which with a max range size of 4 obtain a less value, 9% of the cases. When I compared the distribution assigned at each node using the two size, in DEC model the 64.9% and in DIVALIKE the 77.9% of the nodes evaluated presented the same distribution, but the basal nodes showed conflict at time of assigned the areas. In the comparisons of the distribution assigned at each node, between methods I found a 44.1% of equal nodes.
Thus, the ancestral area can change in some nodes according to the method or the optimization used to reconstruct. For widespread taxa is better use major a high max range size or try this as dispersion towards the tips. Finally, to choose the adequate max range size that not only minimize the number of states but also does not allow errors in the estimation I recommend that you explore your dataset and identify how and when the estimation changes with different number of maximum range size.
Matzke, N. J. (2012). Founder-event speciation in BioGeoBEARS package dramatically improves likelihoods and alters parameter inference in dispersal–extinction–cladogenesis (DEC) analyses. Front. Biogeogr, 4, 210.
Matzke, N. J. (2013). Probabilistic historical biogeography: new models for founder-event speciation, imperfect detection, and fossils allow improved accuracy and model-testing. University of California, Berkeley.
Ree, R. H., Moore, B. R., Webb, C. O., & Donoghue, M. J. (2005). A likelihood framework for inferring the evolution of geographic range on phylogenetic trees.Evolution, 59(11), 2299-2311.
Ree, R. H., & Smith, S. A. (2008). Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. Systematic Biology, 57(1), 4-14.
Ronquist, F. (1997). Dispersal-vicariance analysis: a new approach to the quantification of historical biogeography. Systematic Biology, 46(1), 195-203.
Ronquist, F. (1994). Ancestral areas and parsimony. Systematic Biology, 267-274.