lunes, 21 de septiembre de 2015

A first approach to Jackknife support on Endemism Areas.


The endemism areas are a geographic delimited by the distributional congruence of at less two taxa (Szumik et al. 2006). Those areas are the basis for inferences on historical distributional patterns in diversification process of taxa (Morrone & Crisci. 1995). Even with its importance and its broad use, there is not a statistical metric to allow us estimate the precision of the areas calcualted, like Jacknife or Bootstrap implemented in phylogenetic analysis. Jacknife are a resampling technique used as a variance and bias estimator, and therefore as a precision estimator (Badii et al. 2007). Basically this technique create a new matrix from the original data randomly removing a percentage of observations and re-calculate the corresponding analysis, then the result generated from the resampled matrix is compared with the original data and a average is calculate (Efron, 1982; Badii et al. 2007). The main goal in this work is to attempt the Jacknife method on endemism areas and establish bases for a future functional and replicable technique. 


Using 20.000 occurrences for 426 Mammals species, with distribution on the North Andean Block, endemism areas were calculated using the endemicity index implemented in NDM/VNDM (Szumik et al. 2002; Szumik & Goloboff. 2004). Two types of analysis were made: 1. Using only a single replication, and 2. Using 20 replications. A 0.5° x 0.5° cell gride, 50% of similarity strict consensus and default values were implemented for both analysis type. The R platform was used to create a Jacknife function with 15, 30 and 45 resamplig replics. The Jacknife function resamples the occurrences of the species from a xyd file and for each resampling its consensus areas were calculated and compared with original areas and the recovered percentage of each area was estimated from resampling consensus. 


Results and Discussion.
Consensus areas for the two analysis were different, 4 consensus areas in the single replicas analysis (OCA) and 7 areas in the 20 replicas analysis (OCA20). The OCA overlaps OCA20 on the geographic space (Fig 1.). Only two areas of 4 possible in OCA were recovered by the Jacknife resampling, and those two areas presented a low support value (2% - 6%) for the three resampling replications (Fig2. A), the highest values (0.06 and 0.04) were in the 15 resampling replications. For the 7 possible areas in OCA20, only 3 areas were recovered, also with low support values (2% - 6%) and the highest values were in the 15 replications. Something common for the analysis is that the supported areas in OCA are the same areas supported in OCA20, and one of those areas (Purple for OCA and red for OCA20) presented the highest support in the 15 and 45 replications. The lowest values for the supported areas can be associated with the small consensus area and the few occurrences used (considering all species with the same number of occurrences, each species will have approximately 45 occurrences for North Andean Block), but despite the support values, similar areas were recognized for both analysis by Jacknife, and is clear that those areas couldn't be generated by random. There is no improvement in the support values when the number of resampling replications increase. Otherwise the support values decrease with increasing the number of resampling replications. It appears that small areas are less recovered by an increasing number of resampling replications, and those areas are supported for specific combinations of species occurrences given the low number of occurrences. The Jacknife is a potential method in endemism areas, is necessary repeat this work with bigger data, especially in the number of occurrences and probe an extreme approach, resampling species not occurrences.



Figure 1. Geographic ubication of the consensus areas. OCA areas (Blue), overlaping the OCA20 areas (Green). The ubication is in North West Ecuador Region near to Colombia.

Figure 2. Jacknife support values for each resampling replic in both analysis: One single replic (A), and 20 replics (B).


Badii, M. H., J. Castillo, A. Wong & J. Landeros. (2007): Presición en los índices estadísticos: Técnicas de jacknife & Bootstrap. Innovaciones de Negocios 4(1): 63-78, 2007.

Efron, B. (1982):  The Jacknife, the bootstrap, and other resampling plans. Societyof Industrial and Applied Mathmatics. CBMS-NSF Monograph, 38.

Juan J. Morrone & Jorge V. Crisci. (1995): Historical Biogeography: Introduction to Methods. Annu. Rev. Ecol. Syst. 1995. 26: 373-401.

Szumik C., F. Cuezzo, P. Goloboff & A. Chalup. (2002):  An optimality criterion to determine areas of endemism. Syst. Biol. 51: 806-816.

Szumik C. & P. Goloboff. (2006): Areas of endemism. An improved optimality criterion. Syst.Biol. 53: 968-977.
Szumik, Claudia, Dolores Casagranada & Sergio Roig Juñent. (2006): Manual de NDM/VNDM: Programas para la identificación de áreas de endemismo. Insituto Argentino de Estudios Filogenéticos, Año V, Vol (3).



1. R code for resampling and Jacknife Calculation avaible here.