JAC: Today Greg contributes his opinion on the use of Bayesian inference in statistics. I know that many—perhaps most—readers aren’t familiar with this, but it’s of interest to those who are. Further, lots of secular bloggers either write about or use Bayesian inference, as when inferring the probability that Jesus existed given the scanty data. (Theists use it too, sometimes to calculate the probability that God exists given some observations, like the supposed fine-tuning of the Universe’s physical constants.)
When I warned Greg about the difficulty some readers might have, he replied that, “I tried to keep it simple, but it is, as Paul Krugman says about some of his posts, ‘wonkish’.” So wonkish we shall have!
by Greg Mayer
Last month, in a post by Jerry about Tanya Luhrmann’s alleged supernatural experiences, I used a Bayesian argument to critique her claims, remarking parenthetically that I am not a Bayesian. A couple of readers asked me why I wasn’t a Bayesian, and I promised to reply more fully later. So, here goes; it is, as Paul Krugman says, “wonkish“.
Approaches to inference
I studied statistics as an undergraduate and graduate student with some of the luminaries in the field, used statistics, and helped people with statistics; but it wasn’t until I began teaching the subject that I really thought about the logical basis of the subject. Trying to explain to students why we were doing what we were doing forced me to explain it to myself. And, I wasn’t happy with some of those explanations. So, I began looking more deeply into the logic of statistical inference. Influenced strongly by the writings of Ian Hacking, Richard Royall, and especially the geneticist A.W.F. Edwards, I’ve come to adopt a version of the likelihood approach. The likelihood approach takes it that the goal of statistical inference is the same as that of scientific inference, and that the operationalization of this goal is to treat our observations as data bearing upon the adequacy of our theories. Not all approaches to statistical inference share this goal. Some are more modest, and some are more ambitious.
The more modest approach to statistical inference is that of Jerzy Neyman and Egon Pearson. In the Neyman-Pearson approach, one is concerned to adopt rules of behavior that minimize one’s mistakes. For example, buying a mega-pack of paper towels at Sam’s Club, and then finding that they are of unacceptably low quality, would be a mistake. They define two sorts of errors that might occur in making decisions, and see statistics as a way of reducing one’s decision making error rates. Although they, and especially, Neyman, made some quite grandiose claims for their views, the whole approach seems rather despairing to me: having given up on any attempt to obtain knowledge about the world, they settle for a clean, well-lighted place, or at least one in which the light bulbs usually work. While their approach makes perfect sense in the context of industrial quality control, it is not a suitable basis for scientific inference (which, indeed, Neyman thought was not possible).
The approach of R.A. Fisher, founder of modern statistics and evolutionary theory, shares with the likelihood approach the goal of treating our observations as data bearing upon the adequacy of our theories, and the two approaches also share many statistical procedures, but differ most notably on the issue of significance testing (i.e., those “p” values you often see in scientific papers, or commentaries upon them). What is actually taught and practiced by most scientists today is a hodge-podge of the Neyman-Pearson and Fisherian approaches. Much of the language and theory of Neyman-Pearson is used (e.g., types of errors), but, since few or no scientists actually want to do what Neyman and Pearson wanted to do, current statistical practice is suffused with an evidential interpretation quite congenial to Fisher, but foreign to the Neyman-Pearson approach.
Bayesianism, like the Fisherian and likelihood approaches, also sees our observations as data bearing upon the adequacy of our theories, but is more ambitious in wanting to have a formal, quantitative method for integrating what we learn from observation with everything else we know or believe, in order to come up with a single numerical measure of rational belief in propositions.
So, what is Bayesianism?
The Rev. Thomas Bayes was an 18th century English Nonconformist minister. His “An Essay Towards Solving a Problem in the Doctrine of Chances” was published in 1763, two years after his death. In the Essay, Bayes proved the famous theorem that now bears his name. The theorem is a useful, important, and nonproblematic result in probability theory. In modern notation, it states
P(H∣D) = [P(D∣H)⋅P(H)]/P(D).
In words, the probability P of an hypothesis H in the light of data D is equal to the probability of the data if the hypothesis were true (called the hypothesis’s likelihood) times the probability of the hypothesis prior to obtaining data D, with the product divided by the unconditional probability of the data (for any given problem, this would be a constant). Ignoring the constant in the denominator, P(D), we can say that the posterior probability, P(H∣D), (the probability of the hypothesis after we see the data), is proportional to the likelihood of the hypothesis in light of the data, P(D∣H), (the probability of the data if the hypothesis were true), times the prior probability, P(H), (the probability we gave to the hypothesis before we saw the data).
The theorem has many uncontroversial applications in fields such as genetics and medical diagnosis. These applications may be thought of as two-stage experiments, in which an initial experiment (or background set of observations) establishes probabilities for each of a set of exhaustive and mutually exclusive hypotheses, while the results of a second experiment (or set of observations), providing data D, are used to reevaluate the probabilities of the hypotheses. Thus, knowing something about the grandparents of a set of offspring may influence my evaluation of genetic hypotheses concerning the offspring. Or, in making a diagnosis, I may include in my calculations the known prevalence of a disease in the population, as well as the test results on a particular patient. For example, suppose a 95% accurate test for disease X is positive (+) for a patient, and the disease X is known to occur in 1% of the population. Then, by Bayes’ Theorem
P(X∣+) = P(+∣X)⋅P(X)/P(+)
The probability that the patient has the disease is thus 16%. Note that despite the positive result on a pretty accurate test, the odds are more than four to one against the patient actually having condition X. This is because, since the disease is quite rare, most of the positive tests are false positives. [JAC: This is a common and counterintuitive result that could be of practical use to those of you who get a positive test. Such tests almost always mandate re-testing!]
So what could be controversial? Well, what if there is no first stage experiment or background knowledge which gives a probability distribution to the hypotheses? Bayes proposed what is known as Bayes’ Postulate: in the absence of prior information, each of the specifiable hypotheses should be accorded equal probability, or, for a continuum of hypotheses, a uniform distribution of probabilities. Bayes’ Postulate is an attempt to specify a probability distribution for ignorance. Thus, if I am studying the relative frequency of some event (which must range from 0 to 1), Bayes’ Postulate says I should assign a probability of .5 to the hypothesis that the event has a frequency greater than .5, and that the hypothesis that the frequency of the event falls between .25 and .40 should be given a probability of .15, and so on. But is Bayes’ Postulate a good idea?
Problems with Bayes’ Postulate
Let’s look at simple genetic example: a gene with two alleles (forms) at the locus (say alleles A and a). The two alleles have frequencies p + q = 1, and, if there are no evolutionary forces acting on the population and mating is at random, then the three genotypes (AA, Aa, and aa) will have the frequencies p², 2pq and q², respectively. If I am addressing the frequency of allele a, and I am a Bayesian, then I assign equal prior probability to all possible values of q, so
P(q>.5) = .5
But this implies that the frequency of the aa genotype has a non-uniform prior probability distribution
P(q²>.25) = .5.
My ignorance concerning q has become rather definite knowledge concerning q² (which, if there is genetic dominance at the locus, would be the frequency of recessive homozygotes; as in Mendel’s short pea plants, this is a very common way in which we observe the data). This apparent conversion of ‘ignorance’ to ‘knowledge’ will be generally so: prior probabilities are not invariant to parameter transformation (in this case, the transformation is the squaring of q). And even more generally, there will be no unique, objective distribution for ignorance. Lacking a genuine prior distribution (which we do have in the diagnosis example above), reasonable men may disagree on how to represent their ignorance. As Royall (1997) put it, “pure ignorance cannot be represented by a probability distribution”.
Bayesians proceed by using Bayes’ Postulate as a starting point, and then update their beliefs by using Bayes’ Theorem:
Posterior probability ∝ Likelihood × Prior probability
which can also be given as
Posterior opinion ∝ Likelihood × Prior opinion.
The appeal of Bayesianism is that it provides an all-encompassing, quantitative method for assessing the rational degree of belief in hypotheses. But there is still the problem of prior probabilities: what should we pick as our prior probabilities if there is no first-stage set of data to give us such a probability? Bayes’ Postulate doesn’t solve the problem, because there is no unique measure of ignorance. We must choose some prior probability distribution in order to carry out the Bayesian calculation, but you may choose a different distribution from the one I do, and neither is ‘correct’: the choice is subjective.
There are three ways round the problem of prior distributions. First, try really hard to find an objective way of portraying ignorance. This hasn’t worked yet, but some people are still trying. Second, note that the prior probabilities make little difference to the posterior probabilty as more and more data accumulate (i.e. as more experiments/observations provide more likelihoods), viz.
P(posterior) ∝ P(prior) × Likelihood × Likelihood × Likelihood × . . .
In the end, only the likelihoods make a difference; but this is less a defense of Bayesianism than a surrender to likelihood. Third, boldly embrace subjectivity. But then, since everyone has their own prior, the only thing we can agree upon are the likelihoods. So, why not just use the likelihoods?
The problem with Bayesianism is that it asks the wrong question. It asks, ‘How should I modify my current beliefs in the light of the data?’, rather than ‘Which hypotheses are best supported by the data?’. Bayesianism tells me (and me alone) what to believe, while likelihood tells us (all of us) what the data say.
*Apologies to Clark Glymour and Bertrand Russell.
The best and easiest place to start is with Sober and Royall.
Edwards, A.W.F. 1992. Likelihood. Expanded edition. Johns Hopkins University Press, Baltimore. An at times terse, but frequently witty, book that rewards careful study. In many ways, the founding document of likelihood inference; to paraphrase Darwin, it is ‘origin all my views’.
Gigerenzer, G., et al. 1989. The Empire of Chance. Cambridge University Press, Cambridge. A history of probability and statistics, including how the incompatible approaches of Fisher and Neyman-Pearson became hybridized into textbook orthodoxy.
Hacking, I. 1965. The Logic of Statistical Inference. Cambridge University Press, Cambridge. Hacking’s argument for likelihood as the fundamental concept for inference; he later changed his mind.
Hacking, I. 2001. An Introduction to Probability and Inductive Logic. Cambridge University Press, Cambridge. A well-written introductory textbook reflecting Hacking’s now more eclectic, and specifically Bayesian, views.
Royall, R. 1997. Statistical Evidence: a Likelihood Paradigm. Chapman & Hall, London. A very clear exposition of the likelihood approach, requiring little mathematical expertise. Along with Edwards, the key work in likelihood inference.
Sober, E. 2002. Bayesianism– Its Scope and Limits. Pp. 21-38 in R. Swinburne, ed., Bayes’ Theorem. Proceedings of the British Academy Press, vol. 113. An examination of the limits of both Bayesian and likelihood approaches. pdf (read this first!)