sábado, 16 de febrero de 2019

Philosophy postures in systematics: Which is the best approach to do science?


The diversity of viewpoints and positions around all study topics is very important in the science world. This variety thoughts allows to evaluate different perspectives of the same problem using different methodologies, and generate a great framework (Pavese and De Bièvre, 2015). Nevertheless, science must go hand in hand with objectivity (Hanna, 2004), therefore we must ask ourselves, which is the best approach to carry out scientific research?

Firstly, we must located on a radical idea: Falsification (Popper, 2002a). For this Popperian vision we only know what we don’t know, thus, being literals, it’s impossible discover absolutes trues, and we can only corroborate and falsify hypotheses (Popper, 2002b). My viewpoint about Popper’s falsification in science is: we must use it only as a reminder of why we shouldn't take for granted the current theories, thus, the search of knowledge remains standing. However, beyond this last idea, I don’t think Popper’s thought has a direct applicability nowadays.

Now, we can understand that the impulse of science is to understand more and more things, but, how do we trust what we know? The knowledge search through science needs statistics, and frequentists with their classical interpretation of probability based on finite observations of experimental events, have contributed greatly (Bayarri and Berger, 2004). Nevertheless, this vision can generate problems like wrong interpretations or the impossibility of  assign probabilities to unrepeatable events (Bayarri and Berger, 2004; Box and Tiao, 1992). Moreover, Bayesian vision doesn’t have these problems being a better option because probabilities are based on a prior  knowledge, it means, there is always uncertainly because we never know all facts but we can assign a value of how much knowledge have about results (Briggs, 1999; Schoot et al., 2014). 

For this reason, I must emphasize that we can’t know everything but we can know a lot, therefore, methods based on the Bayesian philosophy, give us, in my opinion, the best approximation to the scientific truth, especially in systematics, where the evolutionary history is a large set of inferences.


References

- Bayarri, M. J. and Berger, J. O. (2004). The interplay of Bayesian and frequestist analysis. Statistical Science, 19(1), 58-80.
- Briggs, A.H. (1999). A Bayesian approach to stochastic cost‐effectiveness analysis. Health Econ.18, 257-261. doi:10.1002/(SICI)1099-1050(199905)8:3<257::AID-HEC427>3.0.CO;2-E
- Box, G. E. and Tiao, G. C. (1992). Bayesian inference in statistical analysis. New York: John Wiley & Sons.
- Hanna, J. (2004). The Scope and Limits of Scientific Objectivity. Philosophy of Science71(3), 339-361. doi:10.1086/421537
- Pavese, F. and De Bièvre, P. (2015). Fostering diversity of thought in measurement science. In F. Pavese, W. Bremser, A. Chunovkina, N. Fischer and A. Forbes (Ed.), Advanced Mathematical and Computational Tools in Metrology and Testing X (pp. 1–8). Singapore: World Scientific
- Popper, K. (2002a). The logic of scientific discovery. London, England: Routledge. doi:https://doi.org/10.4324/9780203994627
Popper, K. (2002b). Conjectures and refutations: The growth of scientific knowledge. London, England: Routledge. doi:https://doi.org/10.4324/9780203538074
- Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J. and Aken, M. A. (2014). A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research. Child Dev, 85, 842-860. doi:10.1111/cdev.12169







Philosophy and Systematics


When we think the role of philosophy in systematics, we can talk about Popperian falsification and corroboration, frequentism and Bayesian philosophy different philosophical currents, bases for phylogenetic methods.  However, if we want to know the relations between groups of organism, which one is the better current to do phylogeny? 


Popperian falsification and corroboration never search the best classification hypothesis, it looks hypothesis with the highest degree of corroboration (Bock, 1973); in other way frequentism has a statistical approach, assess the expected frequencies of good and bad results of a repeated number of measures (Sober, 2008; VanderPlas, 2014). As frequestism, Bayesian philosophy based on statistics, but using a priori and a posteriori probabilities (degrees of knowledge) that let us include ranks of certainty about statements (Stevens, 2006).


Some authors highlight the influence of that Popperian falsification in systematics and others defend its use as the appropriate approach to the search of phylogeny (Farris, 2012), the same with frequentism (Sober, 2008). However, Bayesian is the most reliable way to do phylogeny, because estimate the best probability of classification hypothesis (posterior) and evaluating prior probabilities, giving a background knowledge to the topology (Ronquist, et al. 20014) and in my point of view allows a broader approach.

 
References


Bock, W. J. (1973). Philosophical foundations of classical evolutionary classification. Systematic Zoology. Vol. 22 (4), 375–392.

Farris, J. (2012). Popper: not Bayes or Rieppel. Cladistics

Ronquist, F. Huelsenbeck, J. P. Britton, T. (2004). Bayesian supertrees. In Bininda-Emonds, O. R. P. Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Amsterdam, Netherlands. Kluwer.

Sober, E. (2008). Evidence and Evolution: The logic behind the science. New York, United States of America. Cambridge University Press.

Stevens, M. (2006). The Bayesian Approach to the Philosophy of Science. In Borchert, D. (Ed.), The Encyclopedia of Philosophy. Michigan, United States of America. Thompson Gale.

VanderPlas, J. (2014). Frequentism and Bayesianism: A Practical Introduction, Part III. Pythonic Perambulations. Visible in: http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/


 

Me and the comparative biology


Systematics as a discipline, plays an important role in comparative biology, it allows us to understand species phylogenetic history, taking into account their evolutionary processes and patterns1. There is a universe of philosophical ideas discussed in the systematic literature, where even the reasoning process from data, to the final tree hypothesis can have a direct impact on our phylogenetic analyzes1-2. Hence, having a philosophical position should be the beginning to our study2, how we know what we know and how we decided to address that knowledge has been debated along history3. 

knowledge understood as the data obtained from the evidence, made me choose an induction approach, where multiple instances observed of a phenomenon will lead us to a more general view, in spite of deductivism, that falls into a syllogism2. The contemporary systematics employs different forms of logic in combined forms nowadays2,4. Therefore, the method that best fits my scientific inference is a Bayesian approach, the analyzes can rarely be sure of a result, but they can be very safe. Its inference differs from traditional statistical inference in preserving uncertainty5-7.

First, in order to carry out the respective phylogenetic analyzes, it’s necessary to define the research question of interest. The conceptual framework needs to be clarified and subsequently, generate a quantifiable hypothesis6-7, the probability of how possible it’s depends on all previous information available. Observation alone can not give a posterior probability, we need to have a prior probability as well3,6-7. Also, it’s necessary the establishment of interest parameters given our current data and models. The choice of a prior is based on how much information we believe we have prior to the data, and how accurate we believe that information to be6-7. Therefore, the posterior distribution reflects our updated knowledge, balancing prior knowledge with observed data7. 

The use of Bayesian statistics allows us, to update knowledge instead of testing a null hypothesis over and over again. If the evidence changes, knowledge will also change or be more reliable3,6,8-9, thus that’s why I prefer it.


References

1. Funk, V. & Brooks, D. Phylogenetic systematics as the basis of comparative biology. (Smithsonian Institution Press, 1993). 
2. Wiley, E. & Lieberman, B. Phylogenetics: Theory and Practice of Phylogenetic Systematics. (John Wiley & Sons, 2011).
3. Sober, E. Evidence and Evolution: The Logic Behind the Science. 3-107 (Cambridge University Press, 2008).
4. Hume, D. 1748. An Enquiry concerning Human Understanding, T. Cadell, London.
5. Glickman, M. & Dyk, D. Basic Bayesian Methods. Topics in Biostatistics 319-338 (2007). doi:10.1007/978-1-59745-530-5_16
6. Van de Schoot, R. et al. A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research. Child Development 85, 842-860 (2013).
7. Smith, A., Skene, A., Shaw, J., Naylor, J. & Dransfield, M. The implementation of the bayesian paradigm. Communications in Statistics - Theory and Methods 14, 1079- 1102 (1985).
8. Gelman, A. Simulation of a Statistics Blogosphere - Statistical Modeling, Causal Inference, and Social Science. (2010). At https://andrewgelman.com/2010/03/22/simulation_of_a/ 
9. Cameron, D.Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference (Addison-Wesley Data & Analytics). (Addison-Wesley Professional, 2015). 


viernes, 15 de febrero de 2019

Bayesian's philosophy on Systematics

First, I’ll talk about the ideas and how we obtain them. Idea is defined as "an understanding, though, or picture in your mind" by Cambridge Dictionary (1), so I considered an idea as something that we obtained from previous knowledge but, how we get this knowledge? For that let's talk about Tolksdorf (2), he takes two different views and confronts them, the first is the knowledge as things that you can do, that suggest that theories are no-knowledge; the second defines knowledge as a cycle where you know that you really know. The author proposes knowledge as a thing that helps us to change from "beliefs" to "performance". So, in my opinion, the knowledge is based on experience, and theories are real-knowledge when are based on real data, that look more like the prospect of Plato (3), when he use epistemic analysis to differentiate between real stage with different opinion and which one is true; and look too like a kind of inductivism but explained by Hurley (4) where theories resulting from induction have a probability to be true or not.
The knowledge its taken as evidence for some theories but, how we test that theories or hypothesis? Well, to test hypothesis has to be based on personal postured, here I will talk about Bayesianism because it's my way to think. Maybe could be contradictory think to inductivism and Bayesian at the same time (5), but I referred to inductivism as the source of knowledge and Bayesian is the methodology to test a hypothesis based on prior knowledge (6).
So, how Bayesian works? Bayesian approach is "personal choice statistic" (7), firstly we have the empirical data and then we chose a prior, secondly evaluate how prior model your data and then calculate the posterior probability, that says how "good" or "bad" is your hypothesis for your data. A prior is a distribution's prospect for your data, here is the "personal choice", there are so many articles that develop this theme of choice prior (8) and the most are based on prior information and parameters assumption, but it's not my objective talk about this theme.
Given the above definitions of knowledge and Bayesian approach, therefore it’s time to discuss Systematics and Bayesian phylogenetics. Systematics is the discipline of biology that compares biological progress between animal, plants and all life's forms (9). Phylogenetic analysis studies the relationship between the organism and its principal aim is to find the common history and this could derive in a classification of species or clades -taxonomy- (10). Finally, Bayesian phylogenetics takes other statistics such as likelihood to estimate their prior tree and compare with data -evince of common history- (11), it seems to me a good way to estimate the true relationship between taxa. Furthermore, there's another conflict in phylogenetics relationship, how find the true phylogeny? That is impossible because the relationship could change even when we add or change data or evaluate different species (12). Finally, we just have probabilities, because we can't model all variables of evolution, and we won't ever know is that the real truth, this is the reason that I use the Bayesian approach to evaluate a phylogenetic hypothesis.


Reference
(1) Idea Meaning in the Cambridge English Dictionary. (n.d.). Retrieved September, 2018, from https://dictionary.cambridge.org/dictionary/english/idea
(2) Hetherington, S. (2011). Chapter One Knowledge, Ability, and Manifestation. In Conceptions of Knowledge (Vol. 4). Berlin, Germany: Technische Universität Berlin Institut für Philosophie.
(3) Plato. Book I, 344c. Plato Republic. Indianapolis: Hackett.
(4) Hurley, P. (2014).Chapter One: Basic Concepts. in A concise introduction to logic (Vol. 7). Nelson Education. pp 33-39
(5) Dorling, J., & Miller, D. (1981). Bayesian Personalism, Falsificationism, and the Problem of Induction. Proceedings of the Aristotelian Society, Supplementary Volumes, 55, 109-141.
(6) Hawthorne, J. (1993). Bayesian induction is eliminative induction. Philosophical Topics, 21(1), 99-138.
(7) Fienberg, S. E. (2006). When did Bayesian inference become" Bayesian"?. Bayesian analysis, 6-20.
(8) Yang, Z. (2008). Empirical evaluation of a prior for Bayesian phylogenetic inference. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 363(1512), 4031-4039.
(9) Kitching, I. J., Forey, P. L., Williams, D., & Humphries, C. (1998). Chapter One: Introduction. in Cladistics: the theory and practice of parsimony analysis (No. 11). Oxford University Press, USA.
(10) Futuyama, D. (2005) Chapter Two: Phylogeny. in Evolutionary Biology (3 ed.). New York: W.H. Freeman.
(11) Huelsenbeck, J. P., Rannala, B. and Masly, P. () An Introduction to Bayesian Inference of Phylogeny. Rochester: Department of Biology, University of Rochester.
(12) Nascimento, F. F., dos Reis, M., & Yang, Z. (2017). A biologist’s guide to Bayesian phylogenetic analysis. Nature ecology & evolution, 1(10), 1446.

martes, 30 de enero de 2018

Selecting evolutionary models in phylogeny

Introduction

Both Maximum Likelihood (ML) and Bayesian Inference (IB) use evolutionary models, which are used to predict substitution rates in molecular sequences (through probabilities) along the branches of a phylogenetic tree. That is, a substitution model describes the process by means of which a sequence of characters is transformed into another set of homologous states over time (Lio and Goldman, 1998). ML and IB are based on the Likelihood function, which needs explicit models of evolution to capture the underlying evolutionary processes of the sequence data (Lou et al, 2010). The majority of models taken into account in these two methodological approaches consist of modifications of the GTR model, in which both the nucleotide change and the frequency thereof can take different values (Huelsenbeck et al, 2004). Given that the evolutionary model chosen for the ML and IB analyzes can exert significant influence on the obtained phylogenetic tree (Lou et al, 2010); I propose to evaluate the sensitivity of THE? phylogenetic reconstruction to the choice of the evolutionary model under the ML method, as well as the effectiveness of the information and hLRT criteria.

Methodology

Initially, a topology for a ten-terminal ultrametric tree was generated randomly, from which ten nucleotide sequences were simulated, with a total length of 1000 bp according to each parameter to be taken into account, generating three Groups "Models", "Models + G "," Models + I "and" Models + G + I ", each and every one of them in the SeqGen software v1.3.4.

For each group, the models to be taken into account were JC, K2P, F81, HKY, GTR. After having the nucleotide sequences the trees were reconstructed under the ML method, with ten replications in PhyML v3.1. The comparison of "Models" with respect to the rest of the groups was made through the Robinson-Foulds metric or symmetric difference.

Regarding the evaluation of the information criteria (BIC and AIC) and hLRT, the evolutionary model was calculated in JmodelTest v2.1.10, reporting reporting the number of models found that corresponded to the initial models with which the sequences were calculated, as well as the frequency in which each model and group was recovered according to the criteria.


Results and discussion

The evaluation of the models by group reports that the group "Model" and "Model_G" are the ones that present lower values of RF with respect to all the evolutionary models evaluated, these being, therefore, the most similar to the reference topology indicating that if At the moment of choosing the evolutionary model for a phylogenetic reconstruction, parameters such as I or I + G could be considered, adding noise, generating not so true topologies (Fig. 1). Likewise, the models, in general, show a similar behavior with respect to the RF values (Fig. 1), however, it is worth highlighting the particular model of JC that under the groups "Models" and "Models_G" are the most similar to the reference topology, perhaps because of the simplicity of it. However, it is important to keep in mind that these particular models depend very much on the heterogeneity given by the nucleotide sequence.






Fig 1. RF values for each model, according to the group given the reference topology.

The F81 model presented higher frequency with respect to the BIC criterion and, together with the GTR, it presented the highest frequency with respect to the AIC and hLRT criteria. In general, the frequency values for the different criteria were relatively similar, highlighting some differences in criteria such as BIC with F81 and AIC with HKY (Fig. 2). Likewise, F81 is the model reported by the criteria that coincides to a greater extent with respect to the initial models given by the calculated sequence (Fig 3.). With the exception of some particular cases, it could be said that in general the three criteria evaluated - or at least two of them in some cases - will always tend to choose the same model, being rare the case in which the three criteria choose models completely different, which agrees with that reported by Luo et al (2010) and Ripplinger et al, (2008).


Fig 2. Frequency of the models found by each criterion.

Fig 3. Frequency of the models by criteria given the models initially proposed by the nucleotide sequences.

Criteria such as AIC and BIC tend to choose between models with less variation, likewise hLRT tends to choose models with more parameters such as G, I and I + G (Fig. 4). It should also be noted that among the models with more parameters, AIC and BIC always tend to choose those with the lowest possible variation, such as Model + G or Model + I, with Models + G + I being the one with the lowest frequency. The three criteria find the group "Models" as the one that most matches the initial models proposed with the sequences (Fig. 5).

Fig 4. Frequency of the groups given all the criteria used.

Fig 5. Groups with higher frequency by criteria given the groups initially proposed by the nucleotide sequences.

References

-Luo et al (2010). Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets. BMC Evolutionary Biology. 10: 242.
- Ripplinger J, Sullivan J (2008). Does choice in model selection affect maximum likelihood analysis? Syst Biol, 57 :76-85.
- Posada D, Crandall KA (2001) Selecting the best-fit model of nucleotide substitution. Syst Biol, 50 :580-601.
- Posada D (2008). jModelTest phylogenetic model averaging. Mol Biol Evol. 25 :1253-1256.
- Rambaut A, Grassly NC (1997). Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci, 13 :235-238.
- Guindon S. et al. (2010). New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 59(3):307-21.