by Greg Mayer
I (and Jerry) have been quite pleased by the reaction to my post on “Why I am not a Bayesian“. Despite being “wonkish“, it has generated quite a bit of interesting, high level, and, I hope, productive, discussion. It’s been, as Diane G. put it, “like a graduate seminar.” I’ve made a few forays into the comments myself, but have not responded to all, or even the most interesting, comments– I had a student’s doctoral dissertation defense to attend to the day the post went up, plus I’m not sure that having the writer weigh in on every point is the best way to advance the discussion. But I do have a few general observations to make, and do so here.
First, I did not lay out in my post what the likelihood approach was, only giving references to key literature. No approach is without difficulties and conundrums, and I’m looking forward to finding the reader-recommended paper “Why I am not a likelihoodist.” Among the most significant problems facing a likelihood approach are those of ‘nuisance’ parameters (probability models often include quantities that must be estimated in order to use the model, but in which you’re not really interested; there are Bayesian ways of dealing with these that are quite attractive), and of how to incorporate model simplicity into inference. My own view of statistical inference is that we are torn between two desiderata: to find a model that fits the data, yet retains sufficient generality to be applicable to a wider range of phenomena than just the data observed. It is always possible to have a model of perfect fit by simply having the model restate the data. In the limit, you could have the hypothesis that an omnipotent power has arranged all phenomena always and everywhere to be exactly as it wants, which hypothesis would have a likelihood of one (the highest it can be). But such an hypothesis contains within it an exact description of all phenomena always and everywhere, and thus has minimal generality or simplicity. There are various suggestions on how to make the tradeoff between fit (maximizing the likelihood of the model) and simplicity (minimizing the number of parameters in the model), and I don’t have the solution as to how to do it (the Akaike Information Criterion is an increasingly popular approach to doing so).
Second, there are several approaches to statistical inference (not just two, or even just one, as some have said), and they differ in their logical basis and what inferences they think possible or desirable. (I mentioned likelihood, Fisherian, Neyman-Pearson, Bayesian, and textbook hodge-podge approaches in my post, and that’s not exhaustive.) But it is nonetheless the case that the various approaches often arrive at the same general (and sometimes specific) conclusion in any particular inferential analysis. Discussion often centers on cases where they differ, but this shouldn’t obscure the at times broad agreement among them. As Tony Edwards, one of the chief promoters of likelihood, has noted, the usual procedures usually lead to reasonable results, otherwise we would have been forced to give up on them and reform statistical inference long ago. One of the remarks I did make in the comments is that most scientists are pragmatists, and they use the inferential methods that are available to them, address the questions they are interested in, and give reasonable results, without too much concern for what’s going on “under the hood” of the method. So, few scientists are strict Bayesians, Fisherians, or whatever– they are opportunistic Bayesians, Fisherians, or whatever.
Third, one of the differences between Bayesian and likelihood approaches that I would reiterate is that Bayesianism is more ambitious– it wants to supply a quantitative answer (a probability) to the question “What should I believe?” (or accept). Likelihoodism is concerned with “What do the data say?”, which is a less complete question, which leads to less complete answers. It’s not that likelihoodists (or Fisherians) don’t think the further questions are interesting, but just that they don’t think they can be answered in an algorithmic fashion leading to a numerical result (unless, of course, there is a valid objective prior). Once you have a likelihood result, further considerations enter into our inferential reasoning, such as
There is good reason to doubt a proposition if it conflicts with other propositions we have good reason to believe; and
The more background information a proposition conflicts with, the more reason there is to doubt it.
(from a list I posted of principles of scientific reasoning taken from How to Think about Weird Things). Bayesians turn these considerations into a prior probability; non-Bayesians don’t.
Fourth, a number of Bayesian readers have brought attention to the development of prior probability distributions that do properly represent ignorance– uninformative priors. This is the first of the ways forward for Bayesianism that I mentioned in my original post (“First, try really hard to find an objective way of portraying ignorance.”). I should mention in this regard that someone who did a lot of good work in this area was Sir Harold Jeffreys, whose Theory of Probability is essential, and which I probably should have included in my “Further Reading” list (I was trying not to make the list too long). His book is not, as the title would suggest, an exposition of the mathematical theory of probability, but an attempt to build a complete account of scientific inference from philosophical and statistical fundamentals. Jeffreys (a Bayesian) was well-regarded by all, including Fisher (a Fisherian, who despite, or perhaps because of, his brilliance got along with scarcely anyone). These priors have left some unconvinced, but it’s certainly a worthy avenue of pursuit.
Finally, a number of readers have raised a more philosophical objection to Bayesianism, one which I had included a brief mention of in a draft of my OP, but deleted in the interest of brevity and simplicity. The objection is that scientific hypotheses are not, in general, the sorts of things that have probabilities attached to them. Along with the above-mentioned readers, we may question whether scientific hypotheses may usefully be regarded as drawn from an urn full of hypotheses, some proportion of which are true. As Edwards (1992) put it, “I believe that the axioms of probability are not relevant to the measurement of the truth of propositions unless the propositions may be regarded as having been generated by a chance set-up.” Reader Keith Douglas put it, ” “no randomness, no probability”. Even in the cases where we do have a valid objective prior probability, as in the medical diagnosis case, it’s not so much that I’m saying the patient has a 16% chance of having the disease (he either does or doesn’t have it), but rather that individuals drawn at random from the same statistical population in which the patient is situated (i.e. from the same general population and showing positive on this test) would have the disease 16% of the time.
If we can array our commitments to schools of inference along an axis from strict to opportunistic, I am nearer the opportunistic pole, but do find the likelihood approach the most promising, and most worth developing further towards resolving its anomalies and problems (which all approaches, to greater or lesser degrees, suffer from).
Edwards, A.W.F. 1992. Likelihood. Expanded edition. Johns Hopkins University Press, Baltimore.
Jeffreys, H. 1961. The Theory of Probability. 3rd edition. Oxford University Press, Oxford.
Schick, T. and L. Vaughn. 2014. How to Think About Weird Things: Critical Thinking for a New Age. 7th ed. McGraw-Hill, New York.