Teaching Evolution: A.W.F. Edwards: The coral of life

April 3, 2018 • 1:45 pm

by Greg Mayer

Our second installment of Teaching Evolution is a paper by A.W.F. Edwards on the history and logical justification of methods of phylogenetic inference. In teaching evolution, the idea of the history of life is very important. Most students intuitively see the closer genealogical relationship between, say, a man and an ape than a dog, or among any of those as compared to a salmon. But the precise logic of doing so, especially when the degree of genealogical propinquity is less evident, is not easy to convey. I now teach this subject using a likelihood-based logic of justification, and Edwards was a pioneer in this area. Although we are accustomed now to think and speak of the phylogenetic tree as a “tree of life”, Darwin at first referred to it in his notebook as “the coral of life”, which is a more apt analogy, in that only the tips are alive, while the bases of the branches are dead.

For the first installment of what Jerry has called our “mini-MOOC” on evolution– an extract from the Origin by Darwin– I left out the title I gave to that week’s topic in my course: “Unity of type and adaptation”. I’ve now revised the title of that installment to include this in its title. Unity of type and adaptation were the two great classes of organic phenomena that Darwin sought to explain with his theory of descent with modification; with the chief means of modification– natural selection— accounting for the fit of organic beings to their conditions of existence, i.e. their adaptations. Thus Darwin proposed to solve these two great unsolved problems of biology in the first half of the 19th century with a single, unified explanatory theory.

A.W.F. Edwards in Cambridge, by Joe Felsenstein, used with permission.

Anthony William Fairbank Edwards (b. 1935) is a British statistician, geneticist, and evolutionary biologist. He is a Life Fellow of Gonville and Caius College and Emeritus Professor of Biometry at the University of Cambridge. An undergraduate student of R. A. Fisher, he has written several books and numerous scientific papers. He is best known for his pioneering work, with L. L. Cavalli-Sforza, on quantitative methods of phylogenetic analysis, and for strongly advocating Fisher’s concept of likelihood as the proper basis for statistical and scientific inference. He has also written extensively on the history of genetics and statistics, including an analysis of whether Mendel’s results were “too good” (they were). His most influential book is Likelihood (expanded edition, 1992), in which he argues for the centrality and sufficiency of likelihood as an inferential principle, often using genetic examples to illustrate his argument.

Edwards, A. W. F. 1996. The origin and early development of the method of minimum evolution for the reconstruction of phylogenetic trees. Systematic Biology 45:79-91.

Study Questions:
1. What was Edwards’ purpose in writing this paper?

2. What is Ockham’s razor? What is the “Darwin principle”? What is the relationship between them?

3. What are some of the various ways in which a method of minimum evolution may be used to estimate phylogeny? What, according to Edwards, is the justification for any of these methods?

[For further discussion of the history of phylogenetic methods, see chapter 10, “A digression on history and philosophy”, in Joe Felsenstein‘s Inferring Phylogenies (Sinauer, 2004).]

More on biology and race

August 29, 2017 • 9:15 am

by Greg Mayer

Jerry posted yesterday on an article at Quillette by Bo Winegard, Ben Winegard and Brian Boutwell on biology and race, commending it for its sensibleness. I thought I’d chime in with my own thoughts. Jerry’s a population geneticist and I’m a herpetologist, but our views turn out to be quite similar.

So, here, in a nutshell, is what biology has to say about race. To begin with, race is not a technical term in biology—it is used loosely for any differentiated subdivision of a species. For example, there is a fruit fly in Wisconsin that feeds on hawthorn and apple, and the flies that feed on the different trees are somewhat different, and so people refer to the “hawthorn race” and the “apple race”. Often, as in fact is true in this case, the term “race” is used because people aren’t quite sure exactly how different the forms are from one another.

In zoology, the term “geographic race” does have a well-defined meaning. It means that if you look at an individual of a species, you can tell where it is from, or conversely, that if you tell me where the individual is from, I can tell you what it looks like. For example, there’s a species of lizard in Jamaica that if you brought one back and showed it to me, I could tell you whether it’s from the vicinity of Kingston, or Montego Bay, or Negril, etc. Lizards from these various places are members of the same species because they interbreed with one another where they are in geographic proximity; they are geographic races because I can tell where they are from by looking at them. Geographic races, if they are given taxonomic names, are called subspecies.

With regard to humans, most of the genetic variability is within populations, not between local populations or races. This was pointed out by Dick Lewontin in 1972 (Dick, of course, was Jerry’s dissertation adviser, and my de jure adviser). However, just because most of the variation is within populations doesn’t mean you can’t tell where someone is from by looking at him. The geneticist A.W.F. Tony Edwards later called the mistaken notion that a majority of variation being within populations precludes identification of population membership “Lewontin’s Fallacy”. [I’ve no idea where I got the idea he was called “Tony”. I’ve never met him, and people who do know him have assured me he’s called “Anthony”.]

As a former student of Lewontin’s, I’m not especially fond of Edwards’ choice of term, but nonetheless Edwards is entirely correct. It is of crucial importance to note that the scientific questions asked by Lewontin and Edwards were different. Lewontin asked “What proportion of genetic variation (in the analysis of variance sense) in humans is within and among populations?” The answer is that roughly 85% is within populations, the rest among local populations and races. That is the answer Lewontin gave in 1972, and it is entirely correct, confirmed by much more molecular data since that time. Edwards asked “Can individual humans be assigned to races from genetic data?”, or, alternatively, “Can human races be diagnosed (in the taxonomic sense of subspecies)?” The answer is yes, they can. Edwards shows that his answer to his question is entirely compatible with Lewontin’s answer to Lewontin’s question. A paper by Rosenberg et al. (2002) clearly illustrates for a large data set the truth of both Lewontin and Edwards’ answers to their respective questions. Lewontin goes on from his finding (with which Edwards entirely agrees), to argue further that this level of difference between races is not worthy of taxonomic recognition. Edwards doesn’t actually express an opinion about whether human races should be recognized taxonomically, but does show that the 85/15 division of within/among population variation is no bar to doing so.

One thing a bit off in the Quillette piece is their claim that Lewontin’s conclusion “was based on a peculiar way of measuring genetic variation.” It was not; it was based on a perfectly natural and obvious way of measuring genetic variation, and, indeed, Dick was right, as Edwards acknowledged. The distinction between single and multi-locus genotypes mentioned by Winegard et al. does not at all nullify Lewontin’s conclusion as to the apportionment of variation. What Edwards showed very clearly is that multi-locus genotypes allow individuals to be reliably assigned to populations, even when most of the variation is within populations. In understanding patterns of genetic variation in humans, it is very important to see that Lewontin and Edwards asked different questions, and that they are both right in their answers to their respective questions.

Lewontin and Edwards agree on the moral equality of human beings; Edwards just doesn’t want that moral equality to depend on any contingent facts of genetic similarity. Lewontin wouldn’t want it to, either, but sees the high genetic similarity among human races (genetic similarity is much lower among races in some other species) as empirical reinforcement for his moral conclusion. The problem with basing human moral and civil equality on empirical claims about human biological similarity is that such claims may prove to be mistaken. Because it does not depend on some empirical finding which new data may put into question, I think Edwards has the more robust basis for his moral conclusion.

As Edwards sums up:

“But it is a dangerous mistake to premise the moral equality of human beings on biological similarity because dissimilarity, once revealed, then becomes an argument for moral inequality.”

Edwards, A.W.F. 2003. Human genetic diversity: Lewontin’s fallacy. BioEssays 25:798–801. pdf

Lewontin, R.C. 1972. The apportionment of human diversity. Evolutionary Biology 6:381-398. pdf

Rosenberg, N.A., J.K. Pritchard, J.L. Weber, H.M. Cann, K.K. Kidd, L.A. Zhivotovsky, and M.W. Feldman. 2002. Genetic structure of human populations. Science 298:2381-2385. pdf

Why I am not a Bayesian*

April 16, 2015 • 8:45 am

JAC: Today Greg contributes his opinion on the use of Bayesian inference in statistics. I know that many—perhaps most—readers aren’t familiar with this, but it’s of interest to those who are. Further, lots of secular bloggers either write about or use Bayesian inference, as when inferring the probability that Jesus existed given the scanty data. (Theists use it too, sometimes to calculate the probability that God exists given some observations, like the supposed fine-tuning of the Universe’s physical constants.)

When I warned Greg about the difficulty some readers might have, he replied that, “I tried to keep it simple, but it is, as Paul Krugman says about some of his posts, ‘wonkish’.” So wonkish we shall have!


by Greg Mayer

Last month, in a post by Jerry about Tanya Luhrmann’s alleged supernatural experiences, I used a Bayesian argument to critique her claims, remarking parenthetically that I am not a Bayesian. A couple of readers asked me why I wasn’t a Bayesian, and I promised to reply more fully later. So, here goes; it is, as Paul Krugman says, “wonkish“.

Approaches to inference

I studied statistics as an undergraduate and graduate student with some of the luminaries in the field, used statistics, and helped people with statistics; but it wasn’t until I began teaching the subject that I really thought about the logical basis of the subject. Trying to explain to students why we were doing what we were doing forced me to explain it to myself. And, I wasn’t happy with some of those explanations. So, I began looking more deeply into the logic of statistical inference. Influenced strongly by the writings of Ian Hacking, Richard Royall, and especially the geneticist A.W.F. Edwards, I’ve come to adopt a version of the likelihood approach. The likelihood approach takes it that the goal of statistical inference is the same as that of scientific inference, and that the operationalization of this goal is to treat our observations as data bearing upon the adequacy of our theories. Not all approaches to statistical inference share this goal. Some are more modest, and some are more ambitious.

The more modest approach to statistical inference is that of Jerzy Neyman and Egon Pearson. In the Neyman-Pearson approach, one is concerned to adopt rules of behavior that minimize one’s mistakes. For example, buying a mega-pack of paper towels at Sam’s Club, and then finding that they are of unacceptably low quality, would be a mistake. They define two sorts of errors that might occur in making decisions, and see statistics as a way of reducing one’s decision making error rates. Although they, and especially, Neyman, made some quite grandiose claims for their views, the whole approach seems rather despairing to me: having given up on any attempt to obtain knowledge about the world, they settle for a clean, well-lighted place, or at least one in which the light bulbs usually work. While their approach makes perfect sense in the context of industrial quality control, it is not a suitable basis for scientific inference (which, indeed, Neyman thought was not possible).

The approach of R.A. Fisher, founder of modern statistics and evolutionary theory, shares with the likelihood approach the goal of treating our observations as data bearing upon the adequacy of our theories, and the two approaches also share many statistical procedures, but differ most notably on the issue of significance testing (i.e., those “p” values you often see in scientific papers, or commentaries upon them). What is actually taught and practiced by most scientists today is a hodge-podge of the Neyman-Pearson and Fisherian approaches. Much of the language and theory of Neyman-Pearson is used (e.g., types of errors), but, since few or no scientists actually want to do what Neyman and Pearson wanted to do, current statistical practice is suffused with an evidential interpretation quite congenial to Fisher, but foreign to the Neyman-Pearson approach.

Bayesianism, like the Fisherian and likelihood approaches, also sees our observations as data bearing upon the adequacy of our theories, but is more ambitious in wanting to have a formal, quantitative method for integrating what we learn from observation with everything else we know or believe, in order to come up with a single numerical measure of rational belief in propositions.

So, what is Bayesianism?

The Rev. Thomas Bayes was an 18th century English Nonconformist minister. His “An Essay Towards Solving a Problem in the Doctrine of Chances” was published in 1763, two years after his death. In the Essay, Bayes proved the famous theorem that now bears his name. The theorem is a useful, important, and nonproblematic result in probability theory. In modern notation, it states

P(H∣D) = [P(D∣H)⋅P(H)]/P(D).

In words, the probability P of an hypothesis H in the light of data D is equal to the probability of the data if the hypothesis were true (called the hypothesis’s likelihood) times the probability of the hypothesis prior to obtaining data D, with the product divided by the unconditional probability of the data (for any given problem, this would be a constant). Ignoring the constant in the denominator, P(D), we can say that the posterior probability, P(H∣D), (the probability of the hypothesis after we see the data), is proportional to the likelihood of the hypothesis in light of the data, P(D∣H), (the probability of the data if the hypothesis were true), times the prior probability, P(H), (the probability we gave to the hypothesis before we saw the data).

The theorem has many uncontroversial applications in fields such as genetics and medical diagnosis. These applications may be thought of as two-stage experiments, in which an initial experiment (or background set of observations) establishes probabilities for each of a set of exhaustive and mutually exclusive hypotheses, while the results of a second experiment (or set of observations), providing data D, are used to reevaluate the probabilities of the hypotheses. Thus, knowing something about the grandparents of a set of offspring may influence my evaluation of genetic hypotheses concerning the offspring. Or, in making a diagnosis, I may include in my calculations the known prevalence of a disease in the population, as well as the test results on a particular patient. For example, suppose a 95% accurate test for disease X is positive (+) for a patient, and the disease X is known to occur in 1% of the population. Then, by Bayes’ Theorem

P(X∣+) = P(+∣X)⋅P(X)/P(+)

= (.95)(.01)/[(.95)(.01)+(.05)(.99)]

= .16.

The probability that the patient has the disease is thus 16%. Note that despite the positive result on a pretty accurate test, the odds are more than four to one against the patient actually having condition X. This is because, since the disease is quite rare, most of the positive tests are false positives. [JAC: This is a common and counterintuitive result that could be of practical use to those of you who get a positive test. Such tests almost always mandate re-testing!]

So what could be controversial? Well, what if there is no first stage experiment or background knowledge which gives a probability distribution to the hypotheses? Bayes proposed what is known as Bayes’ Postulate: in the absence of prior information, each of the specifiable hypotheses should be accorded equal probability, or, for a continuum of hypotheses, a uniform distribution of probabilities. Bayes’ Postulate is an attempt to specify a probability distribution for ignorance. Thus, if I am studying the relative frequency of some event (which must range from 0 to 1), Bayes’ Postulate says I should assign a probability of .5 to the hypothesis that the event has a frequency greater than .5, and that the hypothesis that the frequency of the event falls between .25 and .40 should be given a probability of .15, and so on. But is Bayes’ Postulate a good idea?

Problems with Bayes’ Postulate

Let’s look at simple genetic example: a gene with two alleles (forms) at the locus (say alleles A and a). The two alleles have frequencies p + q = 1, and, if there are no evolutionary forces acting on the population and mating is at random, then the three genotypes (AA, Aa, and aa) will have the frequencies p², 2pq and q², respectively. If I am addressing the frequency of allele a, and I am a Bayesian, then I assign equal prior probability to all possible values of q, so

P(q>.5) = .5

But this implies that the frequency of the aa genotype has a non-uniform prior probability distribution

P(q²>.25) = .5.

My ignorance concerning q has become rather definite knowledge concerning q² (which, if there is genetic dominance at the locus, would be the frequency of recessive homozygotes; as in Mendel’s short pea plants, this is a very common way in which we observe the data). This apparent conversion of ‘ignorance’ to ‘knowledge’ will be generally so: prior probabilities are not invariant to parameter transformation (in this case, the transformation is the squaring of q). And even more generally, there will be no unique, objective distribution for ignorance. Lacking a genuine prior distribution (which we do have in the diagnosis example above), reasonable men may disagree on how to represent their ignorance. As Royall (1997) put it, “pure ignorance cannot be represented by a probability distribution”.

Bayesian inference

Bayesians proceed by using Bayes’ Postulate as a starting point, and then update their beliefs by using Bayes’ Theorem:

Posterior probability ∝ Likelihood × Prior probability

which can also be given as

Posterior opinion ∝ Likelihood × Prior opinion.

The appeal of Bayesianism is that it provides an all-encompassing, quantitative method for assessing the rational degree of belief in hypotheses. But there is still the problem of prior probabilities: what should we pick as our prior probabilities if there is no first-stage set of data to give us such a probability? Bayes’ Postulate doesn’t solve the problem, because there is no unique measure of ignorance. We must choose some prior probability distribution in order to carry out the Bayesian calculation, but you may choose a different distribution from the one I do, and neither is ‘correct’: the choice is subjective.

There are three ways round the problem of prior distributions. First, try really hard to find an objective way of portraying ignorance. This hasn’t worked yet, but some people are still trying. Second, note that the prior probabilities make little difference to the posterior probabilty as more and more data accumulate (i.e. as more experiments/observations provide more likelihoods), viz.

P(posterior) ∝ P(prior) × Likelihood  × Likelihood × Likelihood × . . .

In the end, only the likelihoods make a difference; but this is less a defense of Bayesianism than a surrender to likelihood. Third, boldly embrace subjectivity. But then, since everyone has their own prior, the only thing we can agree upon are the likelihoods.  So, why not just use the likelihoods?

The problem with Bayesianism is that it asks the wrong question. It asks, ‘How should I modify my current beliefs in the light of the data?’, rather than ‘Which hypotheses are best supported by the data?’. Bayesianism tells me (and me alone) what to believe, while likelihood tells us (all of us) what the data say.

*Apologies to Clark Glymour and Bertrand Russell.

Further Reading

The best and easiest place to start is with Sober and Royall.

Edwards, A.W.F. 1992. Likelihood. Expanded edition. Johns Hopkins University Press, Baltimore. An at times terse, but frequently witty, book that rewards careful study. In many ways, the founding document of likelihood inference; to paraphrase Darwin, it is ‘origin all my views’.

Gigerenzer, G., et al. 1989. The Empire of Chance. Cambridge University Press, Cambridge. A history of probability and statistics, including how the incompatible approaches of Fisher and Neyman-Pearson became hybridized into textbook orthodoxy.

Hacking, I. 1965. The Logic of Statistical Inference. Cambridge University Press, Cambridge. Hacking’s argument for likelihood as the fundamental concept for inference; he later changed his mind.

Hacking, I. 2001. An Introduction to Probability and Inductive Logic. Cambridge University Press, Cambridge. A well-written introductory textbook reflecting Hacking’s now more eclectic, and specifically Bayesian, views.

Royall, R. 1997. Statistical Evidence: a Likelihood Paradigm. Chapman & Hall, London. A very clear exposition of the likelihood approach, requiring little mathematical expertise. Along with Edwards, the key work in likelihood inference.

Sober, E. 2002. Bayesianism– Its Scope and Limits. Pp. 21-38 in R. Swinburne, ed., Bayes’ Theorem. Proceedings of the British Academy Press, vol. 113. An examination of the limits of both Bayesian and likelihood approaches. pdf (read this first!)