Performance
of model-based inference methods for reconstructing the evolution of
geographic ranges varying maximum range size.
Description
of each model
DIVA:
Dispersion-Vicariance analysis.
Proposed by Ronquist in
1997, it method uses a three-dimension matrix instead of one of two
dimensions, it because the cost of an event depends on the particular
combination of descendant distribution. The events taken into account
are vicariance, dispersion and extinction. For this model, speciation
is explain as vicariance and has cost zero, follow the idea of “the
cost of an event should be inversely related to the probability of
that particular event occurring.” (Ronquist, 1994), thus, DIVA
maximize the vicariance. For DIVA dispersion is the addition of an
area, and the cost is 1 per area added, contrary to extinction that
is the elimination of an area but the cost is the same, 1 per area
eliminated. The cost of dispersion-extinction in the matrix can be
described as the sum of the difference between the union and the
intersection among the distribution of the ancestral node and its two
descendants.
Once the matrix is defined,
the reconstruction proceeds similar to that of an ordinary step
matrix optimization. First, the observed distributions are assigned
to the terminal taxa and then, all the possible combinations for the
remaining nodes. The optimization is divided in three parts, downpass
(from tips to root), uppass (from root to tips) and final pass, where
the array of a node is the product of combining its uppass array with
its downpass array.
DEC:
Dispersal, Local Extinction, and Cladogenesis model.
Proposed by Ree et al.,
in 2005; 2008, it method takes into account other types of speciation
and not maximize vicariance. In contrast, in the absence of lineage
divergence, the ranges can evolve by two stochastic processes:
dispersion (range expansion) and local extinction (range
contraction). This method work with dispersal rates and local
extinction rates that can be used to construct the matrix of
instantaneous transition rates between geographic ranges. It can
assume that the rate of expansion or contraction from one area to
other is a the sum of this rates. As a model based on maximum
likelihood contrary to DIVA that is base on parsimony, this method
accepts some prior probabilities for range inheritance scenarios.
Once the matrix of instantaneous transition rate is constructed and
the range evolution is put in term of prior probablities (e.g. flat
priors (Ree et al., 2005)) then, the inference of ancestral
ranges is done exactly as for character data but integrating over the
conditional likelihoods of range inheritance scenarios.
When Ree et
al., 2005
proposed a likelihood framework for inferring the evolution of
geographic range, they argue that DIVA must be used only if the
relationship of the areas are not known because, for this type of
analysis if you know characteristics of the areas as times, history
and geology (Matzke,
2013), it has to be
introduce in the analysis as parameters or priors. The difference
between DIVA and DEC is that the last one requires explicit
dispersion and the vicariance is not favored.
The methods describe above
are implemented in different programs, Dispersion-Vicariance
analysis in DIVA and
the DEC model in LAGRANGE. But recently, Nicholas Matzke developed
the R package BioGeoBEARS (BioGeographic Bayesiean Evolutionary
Analysis of RangeS) in which was implemented the essential features
of LAGRANGE and a likelihood interpretation of DIVA.
For both methods the number
of areas n
plays a key role in the estimation because,
it defines the size of the cost matrix given by (2n
-1)3,
thus the number of comparisons
is proportional to the size of the matrix. With
this in mind, Matzke, 2012 talk about setting different constraints
as a step in the parametrization of the model, such dispersal limits,
or a maximum range size. The aim of set a maximum range size is to
limit the number of possible states in the matrix, whose growth is
exponential.
For example, if you have 4
areas and set the maximum range size as 4, 3 and 2 the number of
states is 16 or 15 (depending on including null range), 15 and 11,
respectively. There is not much difference, but when you have 10
areas this constraint
help to reduce the number of states, with a maximum
range size of
10,
4
and 2 the results are
1024, 386, 56 states respectively.
Data
Exploration
To explore the influence of
the maximum range
size in the ancestral
area reconstruction when the matrix of distribution (presence or
absence in the area) is different. I chose 3 topologies of different
number of terminal but, nested, and four distributional matrix with 5
areas for each one. Then, I used likelihood approaches, both
implemented in the R package BioGeoBEARS
(Matzke,
2012), DEC and
DIVALIKE model, to evidence the differences in the LnL (Likelihood
natural logarithmic) between 3 or 4 as maximum
range size, and to
compare the distribution assigned at each node.
Trees:
(((Spp_1:1, Spp_2:1):1,(Spp_3:1, Spp_4:1):1):1, Spp_5:3);
(((Spp_1:1, Spp_2:1):1,(Spp_3:1, Spp_4:1):1):1, (Spp_5:1, Spp_6:1):2);
((((Spp_1:1, Spp_2:1):1,(Spp_3:1, Spp_4:1):1):1, (Spp_5:1, Spp_6:1):2):1, Spp_7:4);
As results, I can find that
the estimation depends on the taxa
distribution
and how well is
sampled. The
LnL for
3 as maximum
range size was less
than when 4 max range size was used, except in widespread cases in
which with a max range size of 4 obtain a less value, 9% of the
cases. When I compared the distribution
assigned at each node using
the two size, in DEC
model the 64.9% and
in DIVALIKE the 77.9%
of the nodes evaluated presented the same distribution, but the basal
nodes showed conflict at time of assigned the areas. In
the comparisons of
the distribution
assigned at each node, between
methods I found a
44.1% of equal nodes.
Thus, the ancestral area can
change in some nodes according to the method or the optimization used
to reconstruct. For widespread taxa is better use major a high max
range size or try this as dispersion towards the tips. Finally, to
choose the adequate max range size that
not only minimize the
number of states but
also does not allow errors in the estimation I recommend that you
explore your dataset and identify how and when the estimation changes
with different number of maximum
range size.
References
Matzke,
N. J. (2012). Founder-event speciation in BioGeoBEARS package
dramatically improves likelihoods and alters parameter inference in
dispersal–extinction–cladogenesis (DEC) analyses. Front.
Biogeogr, 4,
210.
Matzke,
N. J. (2013). Probabilistic
historical biogeography: new models for founder-event speciation,
imperfect detection, and fossils allow improved accuracy and
model-testing.
University of California, Berkeley.
Ree,
R. H., Moore, B. R., Webb, C. O., & Donoghue, M. J. (2005). A
likelihood framework for inferring the evolution of geographic range
on phylogenetic trees.Evolution, 59(11),
2299-2311.
Ree,
R. H., & Smith, S. A. (2008). Maximum likelihood inference of
geographic range evolution by dispersal, local extinction, and
cladogenesis. Systematic
Biology, 57(1),
4-14.
Ronquist,
F. (1997). Dispersal-vicariance analysis: a new approach to the
quantification of historical biogeography. Systematic
Biology, 46(1),
195-203.
Ronquist,
F. (1994). Ancestral areas and parsimony. Systematic
Biology,
267-274.
No hay comentarios:
Publicar un comentario