lunes, 28 de marzo de 2016

Optimization of continuous characters in parsimony




The node state in a phylogenetic tree can be easily calculate using discrete characters considering that the state is one option among a finite number of possibilities. However, calculate a node state using continuous characters is very complicated because the numbers of states are infinite (Goloboff et al, 2006). For this reason, continuous characters must be discretized. Farris (1970) proposes to treat continuous characters as additive characters. Several algorithms have been implemented (Goloboff, 1993 ; Thiele, 1993) to discretized continuous characters. Thus, I evaluated a Goloboff's algorithm and Thiele's algorithm by the Robinson-Foulds Metrics and Consistency Index (CI) expecting that the level of homoplasy decreases using continuous characters added to discrete characters. In order to, I simulated a tree of 25 tips (Figure 1) using geiger R library (Harmon et al, 2008), also simulated 3 different types of continuous characters and DNA sequence with 500 bp, model JC using seq-gen (Rambaut and Grassly, 2001). I recovery the tree in TNT (Goloboff, 2008) using only continuous characters, only discrete characters, continuous and discrete characters, different numbers (3, 10, 20) of states for thiele's algorithm and Goloboff algorithm, after I estimate the CI (Figure 2, Figure 3, Figure 4) and compare the trees with the original using Robinson-Foulds metric (Steel and Penny 1993) (Table 1).


Figure 1. Used simulated tree for generate continuous and discrete data.


Figure 2. Consistency Index for continuous (continuos) and discretes characters (discretos)

This could be because only three continuous characters were used. However, when more discrete continuous characters were used the degree of homoplasy decreased (Figure 3), this is consistent with the results obtained by Goloboff et al (2006).


 
Figure 3. Consistency Index for discrete characters (discretos) and continuous plus discrete characters (mix goloboff).


Figure 4. Consistency Index for total evidence using the Goloboff algorithm and thiele algorith with 3, 10 and 20 numbers of states.


Table 1. Robinson-Foulds Metric.





Goloboff algorithm yielded best results in the optimization way of characters. The problem with the Thiele’s algorithm is that always is subject to the subjectivity of the choice of the number of possible states. However, the Goloboff’s algorithm is not necessarily the best option for the reconstruction of phylogenies using total evidence. Robinson-Foulds metric (Table 1) showed that the smallest difference between the tree recovered and the original was obtained using the Thiele’s algorithm.

A copy of the scripts and used data can be found in https://github.com/dpabon/bio_comparada/tree/master/continuous_parsimony 

References

Farris, J., 1970. Methods for computing Wagner trees. Syst. Zool. 19,83–92. 
Goloboff, P., 1993b. Character optimization and calculation of tree lengths. Cladistics 9, 433–436.
Goloboff, Pablo A., Camilo I. Mattoni, and Andrés Sebastián Quinteros.' Continuous Characters Analyzed as Such’. Cladistics 22, no. 6 (December 2006): 589–601. doi:10.1111/j.1096-0031.2006.00122.x.
Goloboff, Pablo A., James S. Farris, and Kevin C. Nixon. ‘TNT, a Free Program for Phylogenetic Analysis’. Cladistics 24, no. 5 (1 October 2008): 774–86. doi:10.1111/j.1096-0031.2008.00217.x.

Harmon Luke J, Jason T Weir, Chad D Brock, Richard E Glor, and Wendell   Challenger. 2008. GEIGER: investigating evolutionary radiations. Bioinformatics 24:129-131.

Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13: 235-238.

Steel M. A. and Penny P. (1993) Distributions of tree comparison metrics - some new results, Syst. Biol.,42(2), 126-141

Thiele, K., 1993. The Holy Grail of the perfect character: the cladistic
treatment of morphometric data. Cladistics 9, 275–304.