martes, 28 de octubre de 2008

BCS, conservation biology and biodiversity.


Always a biologist is faced with the problem of identification of species of a particular group of organisms, the first tool to determinate is the morphology as decisive criterion of species, however, the existence of sibling species, different morphological types or individual genetic variations as a consequence of the mosaic evolution, becomes this approach arbitrary at the time of determinate species. Thus, the concept of biological species (BCS) arose, as a result of the unsatisfactory inference of morphological species concept, since the BCS takes other factors such as their genetic identity.

Mayr's BCS is from my point described as an evolutionary strategy to keep the genetic identity certain populations, or as he says: "keep the gene pool well-balanced and well-adapted genotypes or harmonics," (Mayr, 1996), whose mechanism of protection against any kind of destabilization by recombination, diminishes the frequency of the exchange of genes between the groups, is interbreeding isolation. Likewise, the definition and the operation of BCS consistent with one another. However, the main problem is its application, because if we follow the BCS cannot go beyond the organisms with sexual reproduction, so those asexual organisms would be excluded, that would become impractical for studies of conservation and biodiversity, since the latter seeking a universal definition and objective, thus, as a consensus of species richness and biodiversity hotspots (Agapow, 2004).

Otherwise, the geography is an unsolved problem for the BCS, as far as allopatric populations are concerned (in addition to any already existing applicability problems), according to Mayr these populations are those that have not reaching or ever reach the status of species, characterized by a certain kind of continuous isolation. However, Zink in 2004 from a study using mitochondrial DNA, argue against considering subspecies as proxies for units of conservation because he demonstrates that certain avian subspecies do not have a clear population genetic structure, and that it is inconsistently related to subspecies boundaries sensu Mayr.

Hence, as far as I’m concerned the BCS is a vicious circle around the sexual organisms (is logical that the genetic barrier to sexual organisms, is the interbreeding isolation), which more than a practical concept of demarcation is an evolutionary strategy of sexual stabilization gene, that becomes in the game of apply and not apply, turning both ambiguous and expensive for two primary approaches where the concept of species is critical and decisive, biodiversity and conservation.


Reference

Agapow, PM. Bininda, OR. Crandall, KA. Gittleman, JL. Mace, GM. Marshall, JC. & Purvis, A. 2004. The impact of species concept on biodiversity studies. Q Rev Biol. 79:161-79.

Mayr, E. 1996. What is a species and what is not?. Philosphy of Science. 63: 262–277.

Zink, R. 2004. The role of subspecies in obscuring avian biological diversity and misleading conservation policy. Proc. R. Soc. Lond. B 271, 561–564


lunes, 27 de octubre de 2008

Biological Species Concept

Morales-Guerrero, A. M.
Universidad Industrial de Santander
“The BSC can illuminate
only a small fragment
of the Tree of Life” (Agapow et al. 2004).
There are two different questions involved to the Species Concept:
I. The question concerning the reality of phenomena or of objects in nature we want to name with the term “species” (this is the theoretical problem).
II. The criteria for the identification of species (and this is practical problem) (Wolfgang, 2005).

The solution of the practical problem depends on the definition of the term “species” we want to use (Wolfgang, 2005); currently the issue of species “reality” is central to decision of how to approach species delimitation and to understand the ontology of species (Goldstein & DeSalle, 2000). I share the view that species are spatio-temporally bounded entities (rather than classes defined by some common property), that species per se are not involved in processes - they are effects, not effectors (Kluge, 1990).

A short definition of the BSC is: ‘‘Species are groups of interbreeding natural populations that are reproductively isolated from other such groups’’ (Mayr, 1996). In this definition of the BSC, the species are considered as “kinds” (i.e. categories or classes) distinguishable from other species by the criterion of reproductive isolation and not overall phenotypic similarity.

The biological species concept is important because it places the taxonomy of natural species within the conceptual scheme of population genetics but has been criticized for several reasons; including the lack of a temporal dimension (Balakrishnan, 2005), for this reason is not possible to talk about age of the species or origin of the species, as a consequence the BSC definition may be only for populations that coexist in time-space and which live in sympatry. In this way the BSC also denies any idea about the fossil species due to the inability to measure its reproductive potential (Fernández et al. 1995), others problems are the practical impossibility of ascertaining reproductive isolation between populations in the wild and the inapplicability concept for asexual organisms (Balakrishnan, 2005), so this definition leaves a vast number of organisms with a nebulous status (Agapow et al. 2004). Finally there are also problems in the structure of the concept, because the BSC confuses the pattern with the process or the isolation with the speciation (Fernández et al. 1995).

The criteria for the identification of species or Operational methods (concerned with how a species may be delimited rather than what it represents) are a necessity (Sites & Crandall, 1997). Today, the empirical issue of the species delimitation is receiving increased attention and several methods have been proposed for delimiting species in a statistically rigorous framework.

According to Site & Marshall (2004), the methods for species delimitation are divided into two:
“Nontree-based methods delimit on the basis of gene flow assessments, whereas tree-based methods delimit species as historical lineages”

Templeton (2001) recently advocated application of his NCA method (Templeton et al. 1995) to the problem of species delimitation; this method is based in the reconstruction of tree and reproductive isolation and was designed for the concept of cohesive species. NCA takes into account all of the available information on the geographic and phylogenetic position of haplotypes and statistically tests for their association, and it can be applied to many different levels within a clade to determine whether a speciation event can be inferred with significant statistical support given the data available (Templeton, 2001). Though this was developed for tests cohesive species, is applicable to delimit species according to BSC, because this indirectly estimating gene flow within and between hypothesized species, so the reproductive isolation may thus be looked upon as a informative but not necessary condition for delimiting species boundaries, where it does exist, it is likely to unambiguously delimit species.
References:
-Agapow, PM. Bininda, OR. Crandall, KA. Gittleman, JL. Mace, GM. Marshall, JC. & Purvis, A. 2004. The impact of species concept on biodiversity studies. Q Rev Biol. 79(2):161-79.
-Balakrishnan, R.2005. Species concepts, species boundaries and species identification: A view from the tropics. Systematic biology. 54: 689-693.
-Fernandez, F., Hoyos, J.M. & D.R. Miranda. 1995. Especie: Es o Son? Número especial Evolución. Innovación y Ciencia. Colombia. 4(1):32-37.
-Goldstein, P. & DeSalle, R. 2000. Phylogenetic species nested hierarchies and character fixation. Cladistic. 16: 364-384.
-Kluge, AG. 1990. "Species as historical individuals." Biology and Philosophy. 5 (4): 417-431.
-Mayr, E. (1996): What is a species and what is not?. Philosphy of Science. 63: 262–277.
-Sites, JW. Jr, & K. A. Crandall. 1997. Testing species boundaries in biodiversity studies. Cons. Biol. 11:1289–1297.
-Sites JW Jr, Marshall, JC. 2004. Operational criteria for Delimiting Species. Annu. Rev. Ecol. Evol. Syst. 35:199–227.
-Templeton, AR. 2001. Using phylogeographic analyses of gene trees to test species status and boundaries. Mol. Ecol. 10:779–91.
-Templeton, AR. Routman, E. & CA. Phillips 1995. Separating Population Structure from Population History: A Cladistic Analysis of the Geographical Distribution of Mitochondrial DNA Haplotypes in the Tiger Salamander, Ambystoma tigrinurn. Genetics. 140: 767-782.
-Wolfgang Wagele, J. 2005 Foundations of Phylogenetic Systematics. verlag Dr. friedrich pfeil. Munchen.

domingo, 24 de agosto de 2008

Biogeography and polytomies: a fit based approach

Most, if not all, implementations used in historical biogeography analyses nowadays do not deal with polytomic trees (i.e. Page, 1993; Ronquist, 1996; 2001). Ronquist (2001; 2002) suggested a “solution” for this problem by giving weights to all, or some, of the dichotomic resolutions into a polytomy based on the “confidence” one may have on each resolution. This approach, however, suffers, in my opinion, from one major dilemma, that is, that the same phylogeny is accepting and rejecting different hypotheses (i.e., such dichotomic resolutions contradict each other). My goal in this study is, then, to suggest a different technique for dealing with polytomies based on the fit of each resolution to the general pattern.

Methods

Two controlled data sets were used with six and seven areas, each with four dichotomic trees and three with a trichotomy. A real data set with seven areas, four dichotomic phylogenies and three phylogenies each with a trichotomy were also used. In the software TREEFITTER (Ronquist, 2002) heuristic searches were performed (hold=1000, neighbourhood=20). For each data set the first search was done weighting the polytomy resolutions. The subsequent searches were done in the following manner: a search with each resolution of a given polytomy was done separately without weighting. The resolution with the best fit in the general reconstruction was the only one held and added to the data set; the resolutions of the next phylogeny were evaluated with the same procedure. Three orders of entrance were evaluated.

Additionally, with the real data set, two polytomic phylogenies were analyzed separately. In this part, searches with two (of the three resolutions of a trichotomy) topologies weighted were compared with the searches of the remainder resolution.


Results and discussion

With the first data set the topology obtained with all searches was the same, this is a byproduct of the high congruence among? the resolved clades used. In the second data set, the search with weight recovered two topologies, but with the other method one of the topologies was found in an order entrance with a fit of 21.41. The other topology was recovered with another entrance order with a fit of 22.39. What this result shows is not only that the better way to handle polytomies in biogeographic analysis is not weighting them, but that the order of entrance in the program TREEFITTER using this polytomy treatment could be critical and has to be randomize several times.

With the real data set the weighted and the randomized searches obtained the same topology, probably because the nature of the data. In the last two explorations the results (see table. 1) were not conclusive. With the first phylogeny (hereafter A, and the second phylogeny B) the best fitted topology was obtained with resolution 1, but in the searches with the pairs when 1 was used only two nodes of the reconstruction were recovered, and when the other two resolutions were used the same topology was obtained although with a worse score. With phylogeny B, the best fit and the structure of the reconstruction was due resolution 1 which could indicate that when one resolution is highly congruent with the rest of the data that is the one which confers the structure.


Table1. Results of the comparison between the search of weighted pairs of a trichotomy and the remainder resolution. With the phylogeny A the best fits are obtained were the one resolution is present, but the same topology recovered with 1 is recovered with 2-3. With phylogeny B, in all the cases the the structure of the reconstruction where 1 is present is the same, that's why only two nodes are shared in all comparisons.

Phylogeny A

Phylogeny B

Weighted pairs

fit

Single un-weighted

fit

Shared nodes

Weighted pairs

fit

Single un-weighted

fit

Shared nodes

1-2

23.51

3

24.20

3

1-2

23.19

3

25.19

2

2-3

23.83

1

23.20

6

2-3

23.82

1

23.24

2

1-3

23.18

2

24.20

3

1-3

22.86

2

25.19

2

Although the results obtained with this explorations do not allow certainty in the causes of the results one thing has to be remarked: as parsimony is about finding the best fit of the available characters, the way the polytomies have to be resolved is not giving false confidence grades to the resolutions but on the fit with the rest of the data.


Bibliography

  1. Page, R. D. M. 1993. Component 2.0.

  2. Ronquist, F. 1996. DiVa.

  3. Ronquist, F. 2001. TreeFitter 1.3b.

  4. Ronquist, F. 2002. Parsimony analysis of coevolving species associations. In: Cospeciation (R. D. M. Page, Ed.). University of Chicago Press, Chicago.

sábado, 23 de agosto de 2008

Endemism vs Richness: an example

Introduction

Biodiversity hotspots have a prominent role in conservation biology (Myers et al., 2000), but it remains controversial to what extent different types of hotspot are congruent (Bonn et al., 2002). Several authors states that the richness (species number per area) is equivalent to endemism area (Thomas & Mallorie, 1985; Soria-Auza & Kessler, 2008). However, Orme et al. (2005) disagrees with this statement. The most rich areas is not congruent with endemism centers. A simple form to estimate richness in an area is to calculate the species number per area. Several approaches and methodologies has been proposed to estimate the richness in an area. Among them, Chao (1984; 1987), Burnham & Overton (1978, 1979), Heltshe & Forrester (1983), Smith & van Belle (1984), and Raaijmakers (1987) – see DIVA-GIS manual - . An area of endemism is an area of nonrandom distributional congruence among taxa (Platnick, 1991). Several authors have developed techniques to identify areas of endemism. N.D.M. implements an optimality criterion based on the presence or absence of species in a given grid within an area (number of species that compose the area, species found nowhere else).

Methodology

Georeferenced records were collected from Bolívar and Miranda-Esquivel endemism analyses (2009). These data were organized in the DIVA-GIS v. 5.4 software. A richness analysis (species number per area) was conducted using the DIVA-GIS. The Chao 2 richness estimator (Chao, 1987) was used to estimate the species number per area. Chao 2 is based on the number of samples for an area. To create samples, DIVA-GIS divides each grid-cell into 4 or 9 sub-areas. The grid size used in the richness analyses was 1 per 1 and 0.5 per 0.5.
The endemism analyses were performed using the software N.D.M. v. 2.5 (Goloboff, 2006). The analyses were performed using a 0.5 per 0.5 grid size (n=2000; postchk; m=10; M=100) and 0.25 per 0.25 grid size (n=2000; postchk; m=10; M=15). Several searches were conducted in N.D.M. Using different parameters to identify changes in the resultanting endemic areas. The observation of endemic areas vs most richest areas were performed.

Results

The South-Western zone is the most richnest area in my study area (see Fig. 1). Medium richness level are found in the West and Central region. 90% of species number are in these zones (more that 3400 species). In the two richness analysis (1º x 1º and 0.5º x 0.5º), the rich species areas are partially different. However, the general pattern is similar between them, It showing significant richness levels in the Southern-Western regions.

The analyses using a grid size of 0.5º x 0.5º generates 53 endemic areas (Fig. 2). Among them, five endemic areas shows an maximum Endemicity Index (EI =/> 100). 43 endemic areas does spread the South-Western region from studied area. Likewise, others endemic areas is found along to Western and Central region. The second analysis shows identical endemic areas (Fig. 3). The South-Western region is the most endemic one. Further, the endemic areas with lower EI than the endemic areas in the South-Western region are found in the Western and Central region.

Endemic and richness pattern are similar on a resolution of 0.5º per 0.5º vs 1º per 1º grid size. The most richnest areas are congruent with the most endemic areas (EI => 100; Chao 2 =>3400). Further, endemic areas with EI medium (EI among 40.0 to 70.0) are placed in the same regions that the areas with a richness levels medium (Chao 2 among 1700 to 3400).







In the analyses using 0.25 per 0.25 vs 0.5 per 0.5, the endemism and richness pattern shows some incongruence. Although the general endemic/rich areas are recovered, the most endemic areas are not identified as the most richest areas (Fig. 2). Likewise, some endemic areas are not estimated as rich areas (see Apendix 1).








Discusion

In my results the fit between the endemic centers and richness is condicionated by the resolution of the used analyses. Likewise, the hierarchy resultanting from optimality criterion (Szumik & Goloboff, 2004; 2007) can be considered as equivalent to the degree of richness using the estimator Chao 2. Using several parameter, the results in N.D.M. are not affect the similarity between the analyses. So, the richness is equivalent to endemism but it similarity is subject to the resolution level of analysis.

My results that supports a rather weak relationship between richness/endemism indices agrees with the recent observation that patterns of avian species richness are determined by the distribution of widely distributed species, rather than restricted range species (Lennon et al., 2004). In Aves, the endemic species richness is thought to be a product of either refugia from past extinctions or of high rates of ecological and allopatric speciation.

This incongruence on different resolutions have important implications for understanding the ecological, evolutionary mechanisms that underlie the origin and maintenance of biodiversity (Orme et al., 2005).

Aditionally, the lack of congruence among approaches has implications for the use of areas or hotspots in conservation. If congruence among hotspots types are high then it may not matter which index of diversity was used to guide conservation policy, because any such index could act as an effective surrogate for other aspects of diversity (Orme et al., 2005).

Bibliography

Myers, N., Mittermeier, R. A., Mittermeier, C. G., da Fonseca, G. A. B. & Kent, J. (2000). Biodiversity hotspots for conservation priorities. Nature 403, 853-858.
Bonn, A., Rodriguez, A. S. L. & Gaston, K. J. Threatened and endemic species: are they good indiators of patterns of biodiversity on a national scale? Ecol. Lett. 5, 733-741 (2002).
Thomas, C. D. & Mallorie, H. C. Rarity, species richness and conservation: butterflies of the Atlas Mountains in Morocco. Biol. Conserv. 33, 95-117 (1985).
Berg, A. & Tjernberg, M. Common and rare Swedish vertebrates — distribution and habitat preferences. Biodivers. Conserv. 5, 101-128 (1996).
Jetz, W., Rahbek, C. & Colwell, R. K. The coincidence of rarity and richness and the potential signature of history in centres of endemism. Ecol. Lett. 7,
1180-1191 (2004).
Lennon, J. J., Koleff, P., Greenwood, J. J. D. & Gaston, K. J. Contribution of rarity and commonness to patterns of species richness. Ecol. Lett. 7, 81-87 (2004).
Jetz, W. & Rahbek, C. Geographic range size and determinants of avian species richness. Science 297, 1548-1551 (2002).
Orme, C. D. L., Davies, R. G., Burgess, M., Eigenbrod, F., Pickup, N., Olson, V. A., Webster, A. J., Ding, T., Rasmussen, P. C., Ridgely, R. S., Stattersfield, A. J., Bennett, P. M., Blackburn, T. M., Gaston, K. J., & Owens I. P. F. Global hotspots of species richness are not congruent with endemism or threat. Nature 436, 1016-1019 (2005).
Soria-Auza, R. W., & Kessler, M. The influence of sampling intensity on the perception of the spatial distribution of tropical diversity and endemism: a case study of ferns from Bolivia. Diversity and Distributions 14, 123–130 (2008).
Szumik, C.A., Cuezzo, F., Goloboff, P.A., & Chalup, A.E. An optimality criterion to determine areas of endemism. Systematic Biology 51, 806-816 (2002).
Szumik, C.A, & Goloboff, P.A. Areas of Endemism: An Improved Optimality Criterion. Systematic Biology 53, 968-977 (2004).

jueves, 21 de agosto de 2008

Inferring the Geographic Range Evolution



Introduction
Some methods in biogeography are based on the assumption that there is a single branching pattern among areas caused by vicariance and that this pattern is common to many different groups of organisms (Nelson, 1974; Rosen, 1976; Nelson and Platnick, 1981). Other approaches points to the reconstruction of the distribution history of individual groups (taxon biogeography) and in the search for general area relationships (area biogeography); the latter use character optimization methods which allow the reconstruction of ancestral distributions without constraining area relationships to hierarchical patterns (Bremer, 1992; Ronquist, 1994). Among this methods are the Dispersal-vicariance analysis (Ronquist, 1997) that uses a Fitch Optimization and the Dispersal-extinction-cladogenesis (DEC) model (Rei and Smith, 2008) that implements a maximum likelihood optimization. The objective of the present study was to reconstruct the ancestral distributions using both approaches and to contrast the findings.

Methods
Sequence data information for the avian genera Pipilo and Toxostoma previously published (Zink et al., 1998; 1999) were used. For Pipilo sp. a mitochondrial region control, the cytochrome b and NADH dehydrogenase subunit 2 genes were considered. For Toxostoma only the mitochondrial region control and the cytochrome b genes were Included. Each gene for each taxon was analyzed separately. The sequences were aligned with Muscle software (Edgar et al., 2004) using the default parameters. The best-fit model of nucleotide substitution was determined using a hierarchical likelihood ratio test (Posada and Crandall, 2001) as implemented in the Modeltest software (Posada and Crandall, 1998). Maximum Likelihood (ML) optimization analyses were done in phyML software (Guindon and Gascuel, 2003). The distributions of the taxa and their ancestral area were described in terms of the areas proposed by Zink et al. (2000) with minor modifications (Figure 1): California plus Baja California (area A), Sonoran desert (area B), Chihuahuan desert plus Sinaloan shrubland (area C), and the highlands of southern Mexico (area D). To reconstructs the ancestral distributions for the areagrams, a dispersal-vicariance optimization (Ronquist, 1997) was undertaken in DIVA software (Ronquist, 1996) and a Dispersal-extinction-cladogenesis (DEC) model in Lagrange (Rei and Smith, 2008).

Results and Discussion
The final data sets for each gene included six taxa for Pipilo sp. and seven taxa for Toxostoma sp. (GenBank accession numbers available upon request). The lengths of the obtained alignments with Muscle software (Edgar et al., 2004) are presented in Figure 2. The Hasegawa-Kishino-Yano plus Γ distribution model (HKY + Γ model) (Hasegawa et al., 1985) was the best fit to each data with an α (shape parameter) value of 0.3 for the mitochondrial region control gene of Pipilo sp. and 0.02 for the remainder data sets. The ML phylogenetic trees are shown in Figure 3. The same relationships were found with each gene for each genus. There were differences in the branch lengths among genes. Only one areagram resulted for each genus as show in Figure 4. In the DEC model the most likely ancestral areas for Pipilo sp. and Toxostoma sp. were the area B and area D respectively, with other areas for each genus having lower likelihoods (−ln(L) values available upon request) (Figure 5). The dispersion-vicariance optimal distribution showed as the ancestral area for Pipilo sp, the union of the areas BD and for Toxostoma the combinations AD, BD and CD. Because in the DEC model widespread ranges are the direct outcome of dispersal events, some optimization (see Figure 5 for Pipilo sp. /NADH gen) are the outcome of solely dispersion and extinction. In all the scenarios, the number of biogeographical events required for explain the actual distribution are lower in the DIVA reconstruction that in any of the DEC model reconstructions; because some cladogenesis events are explained by DIVA as a result of vicariance from a widespread ancestor, and not by dispersals followed by extinctions in the original area. Like Fitch optimization, DIVA minimizes dispersal and extinction and it is based on Allopatric speciation (vicariance) rather than on sympatric speciation. In the other hand, DEC model assumes that if an ancestor is widespread, the speciation arises either between a single area and the rest of the range (Allopatric speciation), or within a single area (sympatric speciation) (Ree et al., 2005). Nonetheless, the results presented here show that DEC model preferred the former one with one daughter species always inheriting a single-area range, and the other inheriting the remainder. To compare how, the branch lengths affect the dispersion and extinction outcome, all the branch lengths in the phylogenetic tree for the NADH gene in the genus Pipilo were set in two separately analyses to 1.000 and to 0.001. The results showed an inverse relation between branch lengths and the dispersion/extinction rates: a long branch length indicated that such taxon had less change of disperse and goes extinct. Hence, under DEC model we have to assume that the he rate of evolutionary change is equal throughout the tree and, furthermore, that we can relate such change with the potential of a taxa to expand or reduce its geographic range. Finally, seems that the restriction of one area to the root could be problematic (and maybe only could work for island scenarios where we can refer to colonization and geography range expansion to the nearest islands. A pure dispersalism approach) when trying to search for general area relationships using different hypotheses and try to fitting areagrams to them. In our data analyses, the DEC model suggested different ancestral areas for each Genus, whereas DIVA considered both possibilities in each case.





Figure 1. General distribution of the areas: California plus Baja California (area A), Sonoran desert (area B), Chihuahuan desert plus Sinaloan shrubland (area C), and the highlands of southern Mexico (area D).



Figure 2. Legth of the obtained alignments for the genera Pipilo and Toxostoma



Figure 3. Maximun Likelihood trees for the genera Pipilo and Toxostoma using different genes.



Figure 4. Areagrams for the genera Pipilo and Toxostoma.



Figure 5. Optimal ancestral distribution for the genera Pipilo and Toxostoma using dispersal-vicariance optimization and the Dispersal-extinction-cladogenesis (DEC) model approach with different genes.

jueves, 31 de julio de 2008

"Pattern-based" methods

Ronquist and Nylin (1990) introduced the idea of “pattern-based” methods to refer to the plethora of techniques of cladistic biogeography that have as aim finding general patterns of area relationships without taking into account the evolutionary processes that configured those patterns. Except Brooks and his collaborators “pattern advocates” regard as the main point of biogeographic analysis what causes the congruence among taxon-area cladograms and not in what causes ambiguity (Wiley, 1987; Nelson and Platnick, 1981; Ebach, 2001). So, only when a pattern is found the investigator could discuss about the possible vicariant events have took place between the areas. In certain way this is preferable rather than assume that incongruence is due to any other biogeographic event instead vicariance. This cloudy approach is due to a misunderstanding of the relationship between phylogenetic analysis and biogeography, that is if someone wants to introduce transversal transmission needs a method able to test it as a hypothesis (see Sober, 1998). Other methods as paralogy-free-subtree analysis (Nelson & Ladiges, 1996) when remove the paralogy dismissed evidence in order to fix the data into the idea of what biogeographic history behaves. This constitutes a problem in order that it removes other possible vicariant sceneries. Although, any of the denominated “pattern-based” methods allow a clear evaluation of the biogeographical history it is preferable to assume as the only plausible explanation vicariance rather than create sentimental relates based on the incongruence.

lunes, 28 de julio de 2008

Historical Biogeography and pattern

The study of Historical Biogeography is divided in two items. The pattern and Procces approaches. The objective of pattern approaches is elucidate the hierarchy (relationships) among areas (biotes, areas of endemism).

Several methodologies has been proposed to identify these relationships. Brooks Parsimony Analysis - BPA - (Brooks, 1981, 1988), Component Analysis (Nelson & Platnick, 1978, 1981), Component Compatibility Analysis - CCA - (Zandee & Roos, 1987), and Paralogy-Free Subtrees - TASS - (Nelson & Ladiges, 1996), among others. However, The efficient of these methods is considered ambiguous in some issues.

The ideal approach in Historical Biogeography (pattern) must be include all available phylogenetic and distributional data. Thus, methods such as Component Analysis (Nelson & Platnick, 1978, 1981) and TASS (Nelson & Ladiges, 1996). The elimination of incongruent data is not desirable because this lost minimizes the realibility of results in the analysis.

The node analysis among area cladograms is also an debate point in Historical Biogeography. The comparison of internal nodes within area cladogram is not necessary. The area cladograms must be compared among them (area cladogram vs. area cladogram). The comparison among internal nodes is not produces relevant information about areas relationships.

Likewise, an optimization criterion is necessary to estimate the best area topology (hierarchy). Other point of debate is the inclusion of events methods in the seek of area relationships. The area topologies is a graphic representation of areas relationships. These topologies are not inferences about the evolutionary process in the areas.

sábado, 14 de junio de 2008

Historical Biogeography

Historical biogeography is the study of the historical relationships among biotes using its geological history, and the distributional and phylogenetic (historical) characteristics from organisms that composed them. This conceptual approach is multidisciplinary because the biotes are subject to several factors that affect its structure and characteristics. Several authors has been defined Historical Biogeography (e. g. de Candolle, 1820; Croizat, 1964; Nelson & Platnick, 1981). However, these definitions were construyed using different and limited views about factors that affect the biotes evolution. So, an approach based only on phylogenetic relationships (e. g. Morrone & Crisci, 1995) of taxa within study areas is unreliable because geological process (an inherent factor in the evolution of life) is not reviewed. On the other hand, definitions based on tectonic and geological process ignore the historical relationships among taxa.
My approach about Historical Biogeography is based in the multidisciplinary study of historical relationships among areas using all the avaliable evidence (geological, distributional, and phylogenetic).Thus, In agree with Andersson (1996) in that "the task of Historical biogeography is to reveal and explain the history of biotas and their historical connections. Theses historical relationships among biotes is defined as the sharing of descendants of the same ancestor" and others factors.

lunes, 26 de mayo de 2008

On Historical Biogeography

Historical biogeography is the study of the distribution of biodiversity over space and long periods of time. Its task is to reveal and explain the history of biotas defined as ancestor-descendant relationships. The biological importance of finding area relationships is to answer evolutionary questions, for instance, to distinguish modes of speciation (Wiley & Mayden, 1985) and reconstruct patterns of dispersal (Fritsch et al., 2001). In addition, phylogenetic studies can address the biogeographical implications of their results relating their findings to hypotheses of geological connections among areas (Wiens & Donoghue, 2004).

In the study of the distribution of biodiversity, areas of endemism are the entities compared (Linder, 2001) which are defined as the congruent distribution of at least two species (Platnick, 1991). This congruence does not demand complete agreement of those limits at all possible scales of mapping, but does require relatively extensive sympatry (Morrone & Crisci, 1995). In the same way, methods for the identification of areas of endemism that implements an optimality criterion directly based on considering the aspects of species distribution that are relevant to endemism have been developed (Szumik et al., 2002; Szumik & Goloboff, 2004).

Among the multiple approaches the Parsimony analysis of endemicity [PAE] (Rosen, 1988) appears as a tool of historical biogeography that allows the discover of patterns of organism distribution using biota similarity, but indeed, the main concern of this method is to classify areas according to the occurrence of taxa. Furthermore, it not make assumptions about processes in a subject where processes such as vicariance and extinction need to be taking into account. While vicariance refers to the geographical separation and isolation of a subpopulation, resulting in the original population's differentiation as a new variety or species., the extinction describes the process of a species becoming permanently disappearing in a local population. The dispersal is also not a rare thing taking place, where the range of the ancestral population was limited by a pre-existing barrier, which was crossed by some of its members (Crisci, 2001). In the other hand, The Panbiogeography (Croizat, 1958), although it uses distributional data as the previous approach, follows a methodology where the inferences done allow us to identify ancestral biotas and explain the distributions by tectonic and/or climate change. The method indeed, assumes the possibility of dispersal, vicariance and extinction, and its main concern is the history of biotas.

Minding the above said once you can easily imagine the factors, influencing the evolution of the species. Researchers study distributions of taxa in relation to their physical environment, historical biogeography attempts to reconstruct the origin, dispersal and extinction of taxa and biotas.

jueves, 13 de marzo de 2008

Physically linked genes and “dubium - signatum clade” (Enallagma – Zygoptera) relationships

Enallagma is a worldwide distributed genus of damselflies (Zygoptera: Coenagrionidae), which have been recognized by shared plesiomorphies and the absence of characters that distinguish related genera (May, 1997), so species relationships have been difficult to discern. Eventhough, the “dubium - signatum clade” is a stable and well supported group among different analysis using different kind of evidence (Brown etal., 2000; May, 2000). Previous analyses assign the relationships among the group as: ((pollotum,(dubium,signatum)),(sulcatum,vesperum)).

The objective of this work is to evaluate the effect of differential evaluation of genes physically linked in the phylogenetic reconstruction of the “dubium - signatum clade” (Enallagma – Zygoptera) under parsimony and maximum likelihood inferences.

For this purpose sequences of a mithocondrial region which contains the COI, tRNA and COII genes were taken from the genbank (AF064995, AF064992, AF065038, AF065034 , AF065033, AF065028, AF65013), and two morphological data matrices were checked (Brown etal., 2000; May, 2002). The shared characters were put only once. Some characters were split because were referring to two characters (i.e presence of the structure and the form (see May, 2002., character 19)) or the character states of a character were the combination of different independent characters recognisable by topological correspondence (Rieppel & Kearney, 2002) (see Brown. etal., 2000, characters 1, 20 and 22. The re-codification of the characters did not affect the relationships among the species of the group suggested in previous analyses. The data was analysed by partitions and in different merges, in order to evaluate the influence in the groups and the evolutionary model, under parsimony and Maximum likelihood criteria. The parsimony analysis was done with the software NONA (Goloboff, 1999) and the characters were mapped with the program winclada (Nixon, 2002). The evolutionary models were selected using hLRT as is implemented in modeltest (Posada & Crandall, 2001) and the MH searches were done with POUP* (Swofford, 2002)

To check the performance of different models of evolution into the same sequence with a real topology as reference, data matrices under different models and with different lengths were generated using the softawe Seq-gen version 1.3.2 (Rambaud, 2007). The data matrices were analysed with parsimony and MH as in the previous part.



Results

Real data

Gene

Model

Base rate

Alfa (G distribution)

COI

HKY + G

0.4322 0.2264 0.0721

0.1220

COII

TrN + G

0.3729 0.1737 0.1521

0.1582

tRNA

K80

equal


COI-tRNA

HKY+G

0.3856 0.2315 0.1138

0.0930

tRNA-COII

TrN+G

0.3652 0.1783 0.1584

0.1553

COI-tRNA.COII

TrN+G

0.3762 0.1846 0.1467

0.1630

Simulations

Matrix

Length

Simulated model

Calculated model

Nodes recovered by likelihood

Nodes recovered by parsimony

1

200

HKY

K2P

all

all

2

600

HKY

HKY

all

all

3

600

F81

F81

all

all

1 + 2

200 + 600


HKY

all

all

1 + 3

200 + 600


HKY

all

all

4

120

F81 + G (0.0075)

F81 + G + I

none

none

5

680

HKY + G (0.0591)

HKY + G + I

all

all

6

60

JC

F81

all

all

7

60

JC + I

JC

two

all

4 + 5

120 + 680


HKY + G + I

all

all

4 + 5 + 6

120 + 680 + 60


TrN + G + I

all

all

4 + 5 + 7

120 + 680 + 60


HKY + G + I

all

all


As other authors have exposed (Brown etal., 2002, May,2002) the group showed to be monophyletic unden both phylogenetic inferences (parsimony an ML) also with high jakknife support values, which shows that the amounts of favorable evidence is greater than the contradictory evidence (Goloboff etal., 2003). The inner relationships of the group were again the same under both inference methods but the dissent with the previous hypothesis in two nodes. In the present analysis the clade (pollutum,(dubium,signatum)) was recoved just for the morphological analysis (not shown), instead dubium appears as the sister group of the rest “dubium – signatum clade” and pollutum-signatum as a group. This result could be due complex morphology among the genera Enallagma, an specially to the lack of clear character definition (i.e states that could be anything, as other colour).


Although, the different partitions gave different results these were not contradictory (except for the morpology that was congruent with the previous hypothesis), and with the increasing in character number (length of the sequences) the resolution and support were also increasing. Instead, this beheaviour is not a rule for real data because the clades recovered by two genes could be different, in this case were the linkage is physical was expexted. The support values increased because the with the addition of new information in this particular case the number of synapomorphies increased as the number of (self-congruent) contradictory charactes ramains low.


Both, in real and simulated data the general development of the model caluculation was the same, to have the higher model among the partitions. In the simulated data, when the model simulation was done with variationof substitution among sites the model was not recovered by the hierarchical test, which is expected because as are different rates of evolution one could become inconsistent (Yang, 2006). Anyway, for most of the cases ML and parsimony recovered the true topology, ML got problems when the simulation model was very slow and other paramenters as invariants were involved.


Nevertheless, the different portions of the gene are evolving at differnt models and rates the groups are not sensitive to this liberty. The “dubium – signatum clade” is a very stable group, in which partitions do not compromise it as a unit and is also consistent in the way as with the increasing in character number the support and resolution of the clade also improve.




Estimating Branch lengths: A Bayesian approach


Villabona-Arenas, C. J.
Laboratorio de Sistemática y Biogeografía, Escuela de Biología
Universidad Industrial de Santander


Introduction

The Bayes theorem is used in Bayesian inference (BI) to calculate the posterior distribution of the parameter, the conditional distribution of the parameter given the data (Holder and Lewis, 2003). Bayesian Evolutionary Analysis Sampling Trees (BEAST) is a software for Bayesian MCMC analysis of molecular sequences which in contrast to other programs, it is orientated towards rooted, time-measured phylogenies. The RNA viruses have broadly similar substitution rates even having different genome organizations and biological properties that implies that both the error rate associated with RNA polymerase and the rate of viral replication are roughly constant (Holmes, 2003); therefore RNA viruses are suitable for BEAST framework. In this work I explored the change in the branch lengths estimates obtained with Maximum Likelihood and Bayesian methods using simulated and real data sets.


Methods

Seq-gen version 1.3.2 (Rambaud, 2007; http://tree.bio.ed.ac.uk/software/seqgen/) was used to simulate aligned sequences [The length of sequences was 1000 nucleotides and the model was Hasegawa-Kishino-Yano, 85 (HKY85)], producing three replicate alignments for three set of simulation parameters used (Figure 1). The Table 1 presents the taxa used for the analyses with real data. The data set included 11 published partial nucleoprotein gene sequences of Rabies viruses (RABV) isolated in Colombia during 1994-2005 deposited in Genbank and the CTN-181 reference RABV strain as out-group. The RABV nucleotide sequences were aligned with Muscle 3.6 software package (Edgar, 2005) using default parameters. The alignments were used to reconstruct a Maximum Likelihood tree (ML) with phyML 2.4.4 software (Guindon and Gascuel, 2003) and Bayesian Inference (BI) with beast software (Drummond and Rambaut, 2007). A bootstrapping with 1000 replicates was used to place confidence values on groupings within the ML tree. For BI two approximations were used: one specifying the different points in time for the sequences and another one without them. The MCMC search was run for 1,000,000 generations, sampling the Markov chain every 1000 generations and using a coalescent tree prior that assumes a constant population size back through time. The 25% trees were discarded as “burn-in” summarizing the posterior distribution of tree topologies and branch lengths finding the maximum credibility tree and the mean node height for each of the clades. Each BI analysis was performed three times

Results

Figure 2 and Table 2 presents the topologies and Branch lengths obtained with ML and BI for each set of simulations respectively. Maximun likelihood recover the true topology in all nine simulations while Bayesian Inference just all cases A and B; in case C IB recover the true topology one of three times. Both methods obtained similar branch lengths values and close to the initial ones for simulations A and B; ML recovers also the branch lengths for simulation C, but BI did not. The ML tree for RABV is presented in Figure 3; The BI trees are presented in Figure 4 and 5. The three trees have the same grouping. BI recovers the same branches as ML when not specifying years; when specifying points in time, the branch length changes according to them.

Discussion


IB presented the rapidly evolving sequences in simulation C as being closely related regardless of their true relationships; this situation supports that the method can suffer from Long branch attraction. Because MCMC is a stochastic algorithm that produces sample-based estimates of a target distribution and the BEAST implementation assumes calibrated trees the method interprets this similarity as a descend-relationship increasing the probability that both taxa be sample a sisters.
In the case of branch length in the previous mentioned simulation, BEAST uses as basic model for rates among branches a strict or relaxed molecular clock. Because of the strong assumption that the rate of evolutionary change of the specified sequences is approximately constant over time, the method no recover the branches well, mainly because it try to adjust the encounter differences to an arrangement when a strong variation among the branches of the tree is not quite common. In the other hand, BI works perfectly when there is not such variation in rates as seeing in closely related species or within populations.
As Figures 3 and 4 show, when the assumptions go according with the main requirements, BI behaves as ML. when dates are incorporate into the model, provide a source of information about the overall rate of evolutionary change that is seen is this case, and a change in the estimated branch lengths in contrast where not years were specified. As present here there are scenes where IB does not work well; in general when working with the coherent framework of the method, IB can be used for evolutionary parameter estimation. Even though not time data implementation is allowed fro ML methods, it recovered branch lengths and correct topologies in all the evaluated scenarios evidencing it as a method to accurately describe molecular sequence variation.




























miércoles, 12 de marzo de 2008

The phylogeny of Falconidae ¿morphological or molecular? A view from PBS

Introduccion

The phylogenetic analyses are subject to inherent factors related to the nature of the data. Among them, incongruence is found in the obtained topologies because different kinds of data used in the analysis. About clade quality, there are two aspects that are frequently mentioned: support and stability (Brower, 2006).

A measure of node support frequently used is the Bremer support or BS (Bremer, 1994). BS is a statistical parameter of a particular data set and it is quantified as the extra length needed to lose a branch in the consensus of near most parsimonious trees. This approach is based solely on the original data, opposed to the data permutation used in the bootstrap procedures (Bremer, 1994).

There are two forms of calculate BS, the first approach is to find the most parsimonious tree(s) for a given data set, and then examining sets of trees of increasing length (referred to as the ‘‘tree decay’’ method). The second method is by the employment of anticonstraint trees (Bremer, 1994).

An extension based in the Bremer's method is the Partitioned Branch Support or PBS (Baker & DeSalle, 1997; Baker et al., 1998), this approach is used when a data set is divided into partitions (morphological-molecular, gene-gene). PBS first estimates the support to each partition and combined data, and after is possible to estimate incongruence between partitions. So, the overall BS for a given branch is the sum of the BS derived from each of the data partitions for the most parsimonious tree(s).

Falconidae (diurnal raptors) is a family within Falconiformes groups. The phylogenetic relationships of Falconidae have been debated along time because morphological and molecular characters generate different results (Griffiths, 1994, 1997, 1999; Griffiths et al., 2004). The objective of this study is to elucidate the phylogenetic relationships within Falconidae using PBS to estimate the support of different data (morphological and molecular).


Methods

15 species representing of Falconidae and two outgroups (Pelecanus onocrotalus and Gampsonyx swainsonii) were chosen. The morphological data were collected from Griffiths (1994, 1999) and molecular sequences from Griffiths (1999) and Griffiths et al. (2004). The RAG-1 sequences were downloaded from GenBank (AY461396 – AY461410, DQ881819 and EF078725). The sequences of specimens were alignment using the software MUSCLE 3.6 (Edgar, 2005). The matrices of costs used in the alignments were generates using TTG version 1.0 (Villabona-Arenas, 2008), available by the author.

The phylogenetic analysis were developed using the software T.N.T. version 1.1 (Goloboff, Farris & Nixon, 2001). Heuristic searches, Bremer Support, and Partitioned Bremer Support were elaborated following the methodologies from Hovenkamp (2005) and Arias et al. (2007). The Partitioned Bremer Support was made using a T.N.T. macro created by Pablo Goloboff (available in http://tux.uis.edu.co/labsist/intro.html). TreeView version 1.6.6 (Page, 1996) was used to view the tree generated.


Results

The phylogenetic analysis using morphological data generates 60 most parsimonious trees of length 45 (Fig. 1), the strict consensus tree is shown in Fig. 2. Nodes that define the various morphological species groups are generally supported by low Bremer support values.



In molecular data (Rag-1) is found one tree (Fig. 3). The resulting groups of this topology were monophyletic, and they were supported in BS. The nodes in molecular topology were different to the results of Griffiths et al. (2004) because the different species (outgroups) used in this analysis. However, the Falco group is recovered in the topology (F. sparverius is not within of Falco group).


In combined analysis generates one most parsimonious tree (Fig. 4). Here, the groups appear supported with a high Bremer Support. In molecular and combined analysis the same nodes were recovered.

There is high incongruence between these two data partitions (Table 1). The results using PBS to estimate the support of partitions indicates that the nodes generated by morphological data were not supported (the PBS in whole nodes of morphological analysis were negatives), so, the Falconid group is collapsed totally. On the other hand, the nodes have a high support in the molecular topology.


Discussion

The measures of support in Phylogenetic Systematics are appropriates to estimate the fit of different kinds of data in phylogenetic analysis. There are several methods to estimate support, Bremer Support among them. A advantage of BS is a statistical parameter of a particular data set, rather than being an estimate based on pseudoreplicated subsamples of the data (like bootstrapping and jackknifing), and thus is not dependent on the data matching a particular assumed distribution (Brower, 2006).

The poor support in nodes of morphological analysis shows that the syringeal data are not posses phylogenetic signal sufficiently, this phenomenon is due to the amount characters that supported a node are not high related to the characters that are not supported this. So, the node is supported for few characters within matrix. I disagree with Griffiths (1994) who stated that the syringeal characters “can be used to resolve phylogenetic questions at the generic and family levels of the Falconidae”. Also, the syringeal morphology is relatively conservative within genera and there may not be enough variation within speciose genera to resolve relationships (Griffiths, 1994). So, these characters were not highly informative in this study.

In molecular and combined analysis, the recovered nodes show high BS. Also, the PBS for molecular partition is high (1-812). The very high values in PBS in the nodes (Milvago chimachima, Polihierax semitorquatus), and (Daptrius americanus, Falco sparverius) is due the great phylogenetic signal of molecular data against the morphological data.

The implementation of PBS in phylogenetic analysis is extensive in literature (Baker & DeSalle, 1997; DeSalle & Brower, 1997; Baker et al., 1998; Gatesy et al., 1999). Brower (2006) reviewed the advantages and disadvantages of Bremer Support (BS) and Partitioned Bremer Support (PBS). PBS posses some disadvantages in some issues. For example, if the size of the partitions are different. In this study, the morphologic matrix contains 23 characters and the molecular matrix 2936 characters. So, the different size in the data set could influence in the results of analysis. Also, PBS also appears to be sensitive to missing data, and can shift dramatically among partitions as missing data are filled into the matrix. In morphological analysis, Falco vespertinus not posses syringeal characters, because is taxon is not sampled. However, preliminary runs without F. vespertinus were not affecting the results.

I agree with Brower (2006) who stated that PBS is a efficient tools to estimate the support degree in data sets because it is a more direct and less sophisticated way to document the accumulation of character support for a particular branch in a particular phylogenetic hypothesis. In the same way, the phylogenetic relationships within Falconidae are more supported for molecular data than morphological data. A interesting point may be study more morphological characters (osteological).


Bibliography

Arias, J. S., Garzón, I. J., & Miranda, R. D. (2007) Sistemática Filogenética: Introducción a la práctica. División Editorial y de Publicaciones UIS. Colombia.

Baker, R. H., & DeSalle, R. (1997) Multiple sources of character information and the phylogeny of Hawaiian Drosophila. Systematic Biology, 46, 654–673.

Baker, R. H., Yu, X., & DeSalle, R. (1998) Assessing the relative contribution of molecular and morphological characters in simultaneous analysis trees. Mol. Phyl. Evol., 9, 427-436.

Bremer, K. (1994) Branch support and tree stability. Cladistics, 10, 295-304.

Brower, A, V. Z. (2006) The how and why of branch support and partitioned branch support, with a new index to assess partition incongruence. Cladistics, 22, 378-386.

Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32(5), 1792-97.

Gatesy, J., O’Grady, P., & Baker, R. H. (1999) Corroboration among data sets in simultaneous analysis: hidden support for phylogenetic relationships among higher level artiodactyl taxa. Cladistics, 15, 271-313.

Griffiths, C. S. (1994) Monophyly of the Falconiformes based on syringeal morphology. Auk, 111, 787-805.

Griffiths, C. S. (1997) Correlation of functional domains and rates of nucleotide substitution in cytochrome b. Mol. Phyl. Evol., 7, 353-365.

Griffiths, C. S. (1999) Phylogeny of the Falconidae inferred from molecular and morphological data. Auk, 116, 116-130.

Griffiths, C. S., Barrowclough, G. F., Groth, J. G. & Mertz, L. (2004) Phylogeny of the Falconidae (Aves): a comparison of the efficacy of morphological, mitochondrial, and nuclear data. Mol. Phyl. Evol., 32, 101-109.

Hovenkamp, P. (2005) Branch Support. (Available in http://www.nationaalherbarium.nl/taskforcemolecular/PDF/branch%20supports.pdf).

Lambkin, C. L., Lee, M. S. Y., Winterton, S. L., & Yeates, D. K. (2002) Partitioned Bremer support and multiple trees. Cladistics, 18, 436-444.

Lee, M. S. Y., & Huggall, A. F. (2003) Partitioned likelihood support and the evaluation of data set conflict. Systematic Biology, 52, 15-22.

Page, R. D. M. 1996. TREEVIEW: An application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 12: 357-358.