Further thoughts on the Rev. Bayes

April 19, 2015 • 11:37 am

by Greg Mayer

I (and Jerry) have been quite pleased by the reaction to my post on “Why I am not a Bayesian“. Despite being “wonkish“, it has generated quite a bit of interesting, high level, and, I hope, productive, discussion. It’s been, as Diane G. put it, “like a graduate seminar.” I’ve made a few forays into the comments myself, but have not responded to all, or even the most interesting, comments– I had a student’s doctoral dissertation defense to attend to the day the post went up, plus I’m not sure that having the writer weigh in on every point is the best way to advance the discussion. But I do have a few general observations to make, and do so here.

Apparently not the Rev. Bayes.
Apparently not the Rev. Bayes.

First, I did not lay out in my post what the likelihood approach was, only giving references to key literature. No approach is without difficulties and conundrums, and I’m looking forward to finding the reader-recommended paper “Why I am not a likelihoodist.”  Among the most significant problems facing a likelihood approach are those of ‘nuisance’ parameters (probability models often include quantities that must be estimated in order to use the model, but in which you’re not really interested; there are Bayesian ways of dealing with these that are quite attractive), and of how to incorporate model simplicity into inference. My own view of statistical inference is that we are torn between two desiderata: to find a model that fits the data, yet retains sufficient generality to be applicable to a wider range of phenomena than just the data observed. It is always possible to have a model of perfect fit by simply having the model restate the data. In the limit, you could have the hypothesis that an omnipotent power has arranged all phenomena always and everywhere to be exactly as it wants, which hypothesis would have a likelihood of one (the highest it can be). But such an hypothesis contains within it an exact description of all phenomena always and everywhere, and thus has minimal generality or simplicity. There are various suggestions on how to make the tradeoff between fit (maximizing the likelihood of the model) and simplicity (minimizing the number of parameters in the model),  and I don’t have the solution as to how to do it (the Akaike Information Criterion is an increasingly popular approach to doing so).

Second, there are several approaches to statistical inference (not just two, or even just one, as some have said), and they differ in their logical basis and what inferences they think possible or desirable. (I mentioned likelihood, Fisherian, Neyman-Pearson, Bayesian, and textbook hodge-podge approaches in my post, and that’s not exhaustive.) But it is nonetheless the case that the various approaches often arrive at the same general (and sometimes specific) conclusion in any particular inferential analysis. Discussion often centers on cases where they differ, but this shouldn’t obscure the at times broad agreement among them. As Tony Edwards, one of the chief promoters of likelihood, has noted, the usual procedures usually lead to reasonable results, otherwise we would have been forced to give up on them and reform statistical inference long ago. One of the remarks I did make in the comments is that most scientists are pragmatists, and they use the inferential methods that are available to them, address the questions they are interested in, and give reasonable results, without too much concern for what’s going on “under the hood” of the method. So, few scientists are strict Bayesians, Fisherians, or whatever– they are opportunistic Bayesians, Fisherians, or whatever.

Third, one of the differences between Bayesian and likelihood approaches that I would reiterate is that Bayesianism is more ambitious– it wants to supply a quantitative answer (a probability) to the question “What should I believe?” (or accept). Likelihoodism is concerned with “What do the data say?”, which is a less complete question, which leads to less complete answers. It’s not that likelihoodists (or Fisherians) don’t think the further questions are interesting, but just that they don’t think they can be answered in an algorithmic fashion leading to a numerical result (unless, of course, there is a valid objective prior). Once you have a likelihood result, further considerations enter into our inferential reasoning, such as

There is good reason to doubt a proposition if it conflicts with other propositions we have good reason to believe; and

The more background information a proposition conflicts with, the more reason there is to doubt it.

(from a list I posted of principles of scientific reasoning taken from How to Think about Weird Things). Bayesians turn these considerations into a prior probability; non-Bayesians don’t.

Fourth, a number of Bayesian readers have brought attention to the development of prior probability distributions that do properly represent ignorance– uninformative priors. This is the first of the ways forward for Bayesianism that I mentioned in my original post (“First, try really hard to find an objective way of portraying ignorance.”). I should mention in this regard that someone who did a lot of good work in this area was Sir Harold Jeffreys, whose Theory of Probability is essential, and which I probably should have included in my “Further Reading” list (I was trying not to make the list too long). His book is not, as the title would suggest, an exposition of the mathematical theory of probability, but an attempt to build a complete account of scientific inference from philosophical and statistical fundamentals. Jeffreys (a Bayesian) was well-regarded by all, including Fisher (a Fisherian, who despite, or perhaps because of, his brilliance got along with scarcely anyone). These priors have left some unconvinced, but it’s certainly a worthy avenue of pursuit.

Finally, a number of readers have raised a more philosophical objection to Bayesianism, one which I had included a brief mention of in a draft of my OP, but deleted in the interest of brevity and simplicity. The objection is that scientific hypotheses are not, in general, the sorts of things that have probabilities attached to them. Along with the above-mentioned readers, we may question whether scientific hypotheses may usefully be regarded as drawn from an urn full of hypotheses, some proportion of which are true. As Edwards (1992) put it, “I believe that the axioms of probability are not relevant to the measurement of the truth of propositions unless the propositions may be regarded as having been generated by a chance set-up.” Reader Keith Douglas put it, ” “no randomness, no probability”. Even in the cases where we do have a valid objective prior probability, as in the medical diagnosis case, it’s not so much that I’m saying the patient has a 16% chance of having the disease (he either does or doesn’t have it), but rather that individuals drawn at random from the same statistical population in which the patient is situated (i.e. from the same general population and showing positive on this test) would have the disease 16% of the time.

If we can array our commitments to schools of inference along an axis from strict to opportunistic, I am nearer the opportunistic pole, but do find the likelihood approach the most promising, and most worth developing further towards resolving its anomalies and problems (which all approaches, to greater or lesser degrees, suffer from).

Edwards, A.W.F. 1992. Likelihood. Expanded edition. Johns Hopkins University Press, Baltimore.

Jeffreys, H. 1961. The Theory of Probability. 3rd edition. Oxford University Press, Oxford.

Schick, T. and L. Vaughn. 2014. How to Think About Weird Things: Critical Thinking for a New Age. 7th ed. McGraw-Hill, New York.

54 thoughts on “Further thoughts on the Rev. Bayes

    1. I imagine he has. So have I.

      I have an awful lot of respect for Richard’s academic work…but he lays the snark and spite on so thick in his non-academic writing that it’s hard to get past it.

      I think what Greg wrote here is probably as much of a response to Richard’s rant as it deserves. Greg at least seems to have addressed any significant points Richard made that were worth addressing.


      1. I’ve read a fair bit of Carrier on Christianity and found it informative. But is he capable of respectful disagreement? Does he suffer from Asperger’s?

      2. I agree Carrier can be very arrogant, but in my estimation he’s got the upper hand in this argument. I don’t think any of Greg’s responses (are they responses to Carrier?) are effective rebuttals. Particularly I think Carrier is right regarding what Greg says here: ““What should I believe?” (or accept). Likelihoodism is concerned with “What do the data say?”, which is a less complete question, which leads to less complete answers.”

        These are the same question. What does it mean to ask “what does the data say?” if not “what hypothesis does the data support?(and therefore this is the hypothesis the data tells you that you should believe)”. When we ask what the data says we are effectively asking what explanation (as in hypothesis) the data best supports. And since we base our beliefs on evidence(data that supports a hypothesis over and above competing hypotheses), we are asking what we should believe in light of the data.

        Forget Carriers condescending blather and focus on the argument. In my (admittedly limited) understanding Carrier is right.

        1. The two questions are not the same. “What does the data say?” does indeed mean “What hypothesis does the data support?”. (In fact ‘support’ is used as a technical term, meaning the natural logarithm of the likelihood.) But this is quite different from “What should I believe?”. The medical diagnosis example is a good case in point. The data quite strongly support the hypothesis that the patient has the disease. That’s what the data say. But in this case, with a valid objective prior, we see that the odds are more than 4 to one against him having the disease. So our rational belief would be contrary to what the data say. In cases where we don’t have a prior probability distribution on which we all agree, we can agree about what the data say based on the likelihoods, and then come to whatever belief our subjective priors lead us to.


          1. > “come to whatever belief our subjective priors lead us to.”

            This is a nice sounding dismissal, but it doesn’t actually address the crux of Carrier’s point which is that “subjective” is not the same thing as “arbitrary.” As I’ve suggested in other comments, it seems that the solution is to see (and test) the effects of various **reasonable** priors, not to ignore them all together. Your medical case is a good example (again): if you and I disagree about the background/base rate for the prior (say I take family history into account and you don’t), then our answers will differ. However, if the difference in our priors is minimal (particularly in comparison to the weight of the evidence), we are likely to come to similar conclusions — both of which will be more formal than a likelihood analysis of “further considerations” and more accurate than a null hypothesis test.

      3. Your comment made me curious enough to have a look at his post. Jeez, why be an asshole over statistics?

      4. Carrier may be a bit snarky, but I was able to follow his argument better. I direct response to his arguments would be useful to better understand Bayes theorem and its limits.


      5. Hi Ben,

        I dislike Carrier’s style and think that it is counter-productive (and that he reduces the impact of his writings because of it).

        I’m also pretty close to knee-jerk opposition to anything posted on FtB these days!

        But, I have to say that on this issue I think he’s largely in the right.

        At least, there is nothing wrong with Bayesian analysis. Whether Bayesian analysis or some other method of analysis is the more useful in a particular situation depends on the information one has. Thus one may often use other methods, but Bayesian analysis is indeed sound.

  1. I agree with many of the commenters in the previous thread who seemed to argue for a pragmatic approach using whatever tools were appropriate. I have been gradually turning to Bayesian methods over the last couple of years, especially after I realised that I had been unknowingly using inverse probability for many years in failure rate estimation. My eyes were really opened, however, quite recently when I worked through Sean Eddy’s example: http://www.nature.com/nbt/journal/v22/n9/full/nbt0904-1177.html
    This exemplifies the errors caused by using a point estimate of probability with small samples, for which such an estimate can be badly wrong. I found a frequentist solution to this, by considering actual possible probabilities of 0.01, 0.02…0.99, normalising to 1, then weighting the odds by these probabilities and summing them. Barring tiny rounding errors, I got the same answer as Eddy’s Bayesian calculation. Then I realised that my calculation was effectively the same as his, except that I used summation, whereas Eddy used integration.

    I think that a complete statistician needs to appreciate all approaches, in order to avoid the undeniable pitfalls in each.

    1. There’s a line at the end of that (excellent!) article that really caught my eye:

      Indeed, it is easy to verify that the correct answer to the table game problem is 10:1—write a computer program to simulate the table game many times, and count the frequency with which Alice versus Bob ends up winning after a match reaches a 5-3 score in Alice’s favor.

      It seems to me that that represents a superlative opportunity for statisticians, especially novice ones, to check their own work: build a computer model, let it run overnight, and make sure that the actual results it comes up with match what you think you’re predicting.

      For obvious reasons, brute force approaches have long been disdained in mathematics. But we’re now at a point in time where such approaches are relatively inexpensive and fast. The infamous Monty Hall problem, for example, is something that a freshman in a non-major introductory to programming class could easily code. It might not be an elegant solution…but it’s still the right solution. And, as a bonus, you’ve now got an exhaustive set of data you can dig through to try to get an handle on why that really is the answer.

      …come to think of it, there’re already more characters in this post than you’d need for code to simulate the Monty Hall problem….


      1. You might be interested in another fine paper, by Edwin Jaynes, proposing a solution to Bertrand’s paradox:

        The punch line is the confirmation of Jaynes’s solution by computer simulation.

  2. Given that both Sigmund Freud and Karl Marx had at different times declared themselves (respectively) “not a Freudian” and “not a Marxist”, I am left wondering if Bayes would regard himself as a Bayesian. 🙂

    1. You’d have to teach him a lot of stuff to get him caught up.

      I should (in light of my earlier remarks) see what *he* said the domain of the probability function is. Historically, most people did in fact think of it as events until fairly recently, as far as I can tell.

      The late Horacio Arlo-Costa at CMU suggested to me that there are in fact two traditions which are conflated – it was, supposedly, the measure theorists (Kolmogorov, Ville?) who pushed the frequency and propensity end (I think), for example.

  3. If we can array our commitments to schools of inference along an axis from strict to opportunistic, I am nearer the opportunistic pole

    I spent 1/4 of a year of my life studying “sadistics” to come to the same conclusion.
    As a working scientist, you have the data available to you, and you are trying to ADD new data. that is the nature of the game.
    IF you are in an extremely information-poor area, then the use of Baysean techniques can be useful to direct your experimental design to get maxamal reduction of uncertainty. But by the time you’ve acquired additional data, you’re probably into the realm where frequentist approaches are better.
    Different courses, different horses.
    I’ll re-iterate the importance of experimental design. It is startling – depressing, even – how many people “propose” experiments that would not adequately or efficiently address the question in hand. Very early in my “internet career” I recall someone asking me in all seriousness why it wasn’t obligatory for all oil wells to conduct continuous coring across the Cretaceous-Palaeocene boundary. Meanwhile Gerta Keller was doing field work examining appropriate ooutcrops in the Carribean (nce work if you can get it!) which was vastly more informative.
    Statstics – and statistical worldviews – are really important to the task of determining what “is”, but by the time you get to actually pushing rock fragments (fly fragments?) around under the microscope, they appear very distant.

  4. Bayesian inference is very attractive: it answers the very question you would like to answer, “What is the probability that this hypothesis is true?” And no one doubts that Bayes’ Theorem is valid. It is provable in a few lines of algebra.

    The real problem is, where did you get the prior probabilities? There are two issues:

    1. Is the event “this hypothesis is true” one which can be thought of as having a probability? Is it really there sensible to talk of the probability that the sun will come up tomorrow morning? Or the probability that string theory is correct?

    2. If you have a prior probability for hypothesis H, and you report to your readers the resulting posterior probability, are the prior probabilities that they would have going to be close enough to yours to make this calculation worth reporting?

    As simple and powerful as Bayesian methods are, these are reasons to pause and reflect.

    1. I think this issue is what causes Bayesian statistics to be misused the most. If you pick your priors properly and obfuscate ho yu get there you can make even unlikely scenarios look almost certain or very probable. You may not even realise you have done it. Not an expert and only a nodding acquaintance with his but I would like to know how priors can be picked and made to work where they are possibly unknowable. I am probably exposing extreme ignorance here but I d have trouble understanding this.

    2. Joe I find your question sort of odd because it is not clear to me they highlight any particular issue with Bayesian reasoning.

      1. Is the event “this hypothesis is true” one which can be thought of as having a probability? Is it really there sensible to talk of the probability that the sun will come up tomorrow morning? Or the probability that string theory is correct?

      But surely that question is misplaced if we can agree that Bayes theorem doesn’t tell you whether the thing is really true, it tells you what you should believe given your evidence?

      2. If you have a prior probability for hypothesis H, and you report to your readers the resulting posterior probability, are the prior probabilities that they would have going to be close enough to yours to make this calculation worth reporting?

      If they aren’t, you can discuss how you arrived at your priors and both parties can modify their priors accordingly if they receive new information. Of course this presumes people are rational and prepared to adjust their priors in light of new information and that they can reach some sort of agreement on what the priors should be. But there’s no method that is immune to deliberate misinformation, human irrationality and so on.

      I don’t really see the problem here.

    3. In my view this is still missing the logically prior (:)) question …

      *Can hypotheses be probable*, in the relevant sense, or is it an equivocation?

      As further example, John Locke (!!!) writes that *arguments* are probable (or not). This view as far as I know has no modern defenders per se.

      1. If hypotheses cannot be probable, then what is the foundation for any inferential statistics? Null hypothesis testing rejects or fails to reject based on (misplaced) probability. Confidence intervals are explicitly about probability distributions. Likelihoods give a ratio used to assess the probability of two hypotheses (though not directly, or apparently formally, given this discussion).

        The critique is not unique to Bayesian inference, but is rather a larger epistemological question.

      2. In Locke’s day, ‘probable’ could mean something like ‘able to withstand close analysis; logically valid’.

  5. Another nice post. Thanks, Greg.

    I don’t have much to add, except I do want to say a bit about this piece:

    “Once you have a likelihood result, further considerations enter into our inferential reasoning…. Bayesians turn these considerations into a prior probability; non-Bayesians don’t.”

    Ultimately, unless you’re a particular type of statistician or mathematician, you work on a problem to render a *decision*. Yes, prescribe this drug. No, reject this association. The “further considerations” you mention are indispensable to a sound decision making process. Broadly speaking then, Bayesian methods formalize (i.e. mathematicize) some of these further considerations more than non-Bayesian methods.

    But every school of inferential thought is ultimately still concerned with sound decision making, and I remain unconvinced that one school of thought is superior to another toward this end. I can laugh with everyone else at the idea that applied frequentists are too often p-value reading machines, immortalized here xkcd.com/1132/, but the laughter is too often aimed at a straw man. Statistical decision making cannot be reduced to a purely mathematical (or, more generally, an algorithmic) process. Maybe in time it will be (I doubt it), but we are nowhere near that point presently.

    I appreciate that Bayesian methods often formalize a bit more of the decision-making process than do non-Bayesian ones. But Bayesian methods are not necessary to make a sound decision. And they are certainly not sufficient (something some analysts tend to forget).

  6. > The more background information a proposition conflicts with, the more reason there is to doubt it.

    How is this anything other than a prior? The difference seems to simply be whether or not you put a number to that thinking.

    We can all laugh and agree that the xkcd comic ( https://xkcd.com/1132/ ) is greatly over-simplified, but I still think it is a valid concern. Under a likelihood approach, you get a ratio of 35:1 in favor of the sun having gone nova — it is is only the “Further Considerations” that you mention that make you doubt that the sun has gone nova — the data say that hypothesis is more likely to explain the data.

    I agree that Bayesian papers that simply state a prior by fiat, with no explanation or consideration, are suspect. The same thinking process of your further considerations should go into forming, and defending, a prior. In cases where there is legitimate disagreement over the prior, it is reasonable to present the results under two priors, e.g.: “We believe A with probability X likely under this favorable prior and Y% under this more skeptical prior — further discussion, debate, and data may help to illuminate this issue.” (The same applies to two models of an uninformative prior.)In the likelihood framework, debate over further consideration is just as likely, but explicitly using two priors allows seeing how different X and Y actually are (and allows testing to see how well they model the data).

    1. I wonder if there might be a clean way of separating the two? That is, start with a straight-up likelihood analysis and, when that’s done, hand the results off to whomever (possibly yourself) is interested in incorporating the effects of the prior required of a Bayesian analysis?


      1. That is pretty much how it is done (or should be) in Forensics. The scientist deals strictly with the likelihood ratio of the two hypotheses (e.g. guilty, innocent).

        It is often natural to use odds rather than probability, so you can then multiply your prior odds (e.g. of guilt) by the likelihood ratio to get your posterior odds, and each juror can do this individually.

      2. Teddy Seidenfeld (at CMU) has done some work comparing the results of various statistical techniques, now that one has the CPU time to do them. He can tell you (for example) how long it might take to (in his view) “do it right” vs. not. (He’s a Bayesian, but takes seriously the idea that it is computationally expensive, for example.)

    2. I thought something similar, and in a way we are all using intuitive Bayesian reasoning, and we would certainly do so in the case of the xkcd comic. But there is still an important difference between somebody saying, “stop, actually we know enough about the sun to be certain that it cannot spontaneously explode, so this is not a case for statistics in the first place” and a Bayesian taking the question as seriously as the frequentist in the comic, sitting down and entering a prior probability of 0% into Bayes’ equation to see what the answer is.

      “I am a Bayesian” needs to have a slightly stronger meaning than “everybody uses non-quantitative Bayesian logic as part of their intuitive reasoning”, not least because everybody’s intuitive reasoning also uses a myriad of other approaches such as model fitting and parsimony analysis.

    3. Some of the points you bring up here overlap with my comment above, so I feel a bit inclined to add my two cents.

      It is a sad state of affairs that some practitioners actually do make decisions like the buffoon in the xkcd cartoon, but that’s a problem with poor practice, not with the method or mode of thought. The fact is that we don’t (or rather *shouldn’t*) make decisions in a vacuum. Why should we treat data like they live in some sort of vacuum?

      Running with the xkcd example, there is nothing wrong with the literal “analysis” being done, in so far as the mathematics is correct. The problem comes when a decision is made based only on that particular calculation. This is not good practice, and also not the intent of frequentist inference.

      What you seem to be proposing in your last paragraph is a kind of sensitivity analysis to test the robustness of your statistical decisions to your choice of prior. That’s a wise move, in my opinion. But, speaking from my own experience, I am generally a lot less concerned about the robustness of a prior than I am about the validity of the underlying model, particularly its structural assumptions. In practice, I would like to see a lot more sensitivity analysis applied to these types of considerations (e.g. are effects really additive? Errors uncorrelated? Unmeasured factors not an issue?), rather than to whether or not a particular prior is optimal. In many models, there are a bunch of priors floating around, not just one. But there is usually only one (or at least a relatively small number of) model(s).

      1. I certainly agree that a test of the robustness of the prior is rarely as critical as tests of other aspects of the modelling. However, in the case where there is legitimate disagreement over the prior, particularly in the case when the goal is an uninformative prior, the appropriate solution seems to be explicitly defending a set of reasonable priors and then testing their effect and ability to describe the data. It does not seem reasonable to say: “I don’t know which of these N things is best, so I will ignore all of them” (Where N is the number of reasonable priors).

  7. If you acknowledge that a priori considerations must be made (such as model simplicity), then why not just place it in the prior, which we know is an entirely valid approach? Do you have an alternative approach that has as strong a mathematical motivation? Your concern with priors seems to be that we don’t always know how to choose the “right” prior (maybe can’t), but then if your integration of other considerations is even less based in mathematics, that hardly seems like a solution to the problem. And at least with Bayes, the integration of other considerations and any assumptions is transparent and open to challenge. Perhaps you can elucidate.

    1. One has to *first* answer the objection that non-events are in the domain of the probability function, and how they connect to events.

      Glymour, for example, doesn’t as far as I know do the first at all. Seidenfeld (see above) argues through the Dutch Book argument, but some of us don’t find that persuasive.

      A review (~14 years ago) suggested that the second question above has not been settled even by the people who think it is possible.

  8. On the idea that levels of uncertainty cannot always be represented by a number, don’t infinitessimals also supports this?

    The probability that a bell which strikes at a random instant between ten and twelve will strike at exactly eleven o’clock is – if we’re forced to give assign a number to it – zero. But this runs against what looks like a plausible constraint: zero should be reserved for things that aren’t possible, or things we know beyond all doubt to be false. The bell striking at exactly eleven isn’t impossible; for some sense of “might”, it might happen. But we don’t have the maths required to represent this “might” with any number other than zero.

    1. There’s some work using nonstandard analysis in probability theory partially for that reason, actually. There’s also some (stemming from Popper, IIRC) which allows P(A|B) to be defined even when B=0 as well.

      1. Hah! Glad to hear. I wish them well, and I like the sound of that stuff, although my knowledge of maths is miniscule.

  9. My own view of statistical inference is that we are torn between two desiderata: to find a model that fits the data, yet retains sufficient generality to be applicable to a wider range of phenomena than just the data observed. It is always possible to have a model of perfect fit by simply having the model restate the data.

    The fumbling analogy I make is to image processing.

    One can display a bit map (restate the data). But it also always possible to do image compression (make a model) with a fair image after decompression (prediction). However, since nature isn’t chaotic a further option is to use vector graphics to reimage naivist fashion. This is abstracting laws (recurring, useful, image elements) in an explicit fashion.

    My impression is that if bayesianism was as powerful as claimed, it could do ‘image compression’ in ‘vector graphics’ fashion. And in so doing become an omnipotent description, which simplifies but predicts nothing. (Because all phenomena would simply be updated posteriors of uninformative priors, despite resulting in various ‘vector elements’ ad hoc laws.)

    But we don’t see that.

    1. I’m not sure that I follow your analogy. For one, an updated posterior *is* predictive. It takes all of the knowledge gleaned to that point and says: “This is what I believe about this phenomenon.” You can then use that belief to predict what is likely to come next (and update your beliefs based on that next trial as well).

      Also, all inference starts from uninformative priors. A null hypothesis test starts with the assumption of the null, which is usually some boringly generic value (0, 0.5, no difference, etc.). When comparing new data to some old standard, the null can be set to whatever (reasonable, not arbitrary) value we want.

      The likelihood approach is no different — it just refuses to formalize the prior to a number. However, as Carrier points out in his piece (referenced in another comment), the likelihood method implicitly uses priors when it limits which hypotheses to even calculate a likelihood for. Bayesian approaches just make all of these approaches explicit — and testable.

      1. “an updated posterior *is* predictive”.

        Agreed. I think my analogy says so.

        Is the problem that I don’t think a jumbled set of ad hoc hypotheses can constitute a theory except, by chance, in rare cases?

        I think we can all agree that baynesian methods have not, and likely cannot, replace all other statistics by a “Statistic Of Everything” (a SOE, as it were).

        “The likelihood approach is no different — it just refuses to formalize the prior to a number.”

        I don’t understand. In Greg’s earlier post he described his likelihood method as comparing two hypotheses against each other.

        And the rest of that part seems to be a comment on something I wasn’t discussing. (The likelihood method vs bayesian methods.) On that I would refer to my comment in the previous thread, how I would use the different methods.

        1. Sorry about that, I must have misinterpreted what you were implying when you said “simplifies but predicts nothing.” My response was that Bayesian updating explicitly allows prediction, but perhaps we are not in disagreement.

          I am not sure what you mean by a “Statistic of Everything.” Of course there are good tools suited to every task. However, I think the discussion has largely been along the lines of whether or not Bayesian approaches are philosophically defensible without an informative prior, and if not then what should be used in their place. (I think they are good methods in those cases.)

          Likelihood approaches, including as Greg describes them, do compare two hypotheses. However, they only ask about the P(data|Hypothesis) for each of the hypotheses they are interested in (i.e. have not implicitly assigned vanishingly low priors to). The ratio of the likelihoods then tells you how relatively likely the data were given each hypothesis. In cases with clear, informative priors, Greg agrees with (and to my understanding applies) Bayesian updating methods. His contention seems to be that whenever there is any debate about the prior (particularly when there is minimal information), we should ignore it completely (perhaps I am over-simplifying) or at least not assign a value to it, and instead use “further consideration” without assigning formal values to our priors.

          I disagree and think that uninformative priors are still useful. And, as I have said elsewhere in this thread, that if there is disagreement, that is reason to be explicit and test alternatives — not to dismiss all possibilities.

        2. Sorry, I just realized that I missed this part, which is likely more relevant than the part I replied to:

          > “Is the problem that I don’t think a jumbled set of ad hoc hypotheses can constitute a theory except, by chance, in rare cases?”

          I don’t think it is the goal of any statistical tool to jumble together hypotheses into a theory. A theory is an overarching explanation for a large number of observed phenomena. Statistics can test theories when they make hypotheses about the world, but they cannot create those hypotheses (much less theories) themselves. If we are in disagreement here, it is only over the expectations placed on a statistical toolkit. (Perhaps we are in agreement, and your statement is a large point about the difference between Bayesians and others that I am missing.)

  10. Moving on, here the second point would go toward what I was fumbling at in the last thread, the quality of data and hypotheses:

    Once you have a likelihood result, further considerations enter into our inferential reasoning, such as

    There is good reason to doubt a proposition if it conflicts with other propositions we have good reason to believe; and

    The more background information a proposition conflicts with, the more reason there is to doubt it.

    The first point is more about the tension and competition between data and theories.

    we may question whether scientific hypotheses may usefully be regarded as drawn from an urn full of hypotheses, some proportion of which are true.

    I don’t think we can question that it is useful. Dawkins makes an analogy between what would be bayesian learning and evolution precisely so. E.g. alleles (after variation has acted) is the genome’s prior hypotheses, that are updated (as is the environment) in the next generation.

    [ http://www.skeptics.com.au/articles/dawkins.htm ]

    One may question if evolution makes ‘scientific hypotheses’, but I think Jerry would say that empirical and updated hypotheses are such in his broad definition of “science”.

    One may also question if it works as well as advertised in other situations. (As I did in my previous comment.)

  11. “Inside every Non-Bayesian, there is a Bayesian struggling to get out.” — Dennis Lindley (British statistician)

  12. The defining characteristic of Bayesian inference is the routine use of probability distributions to describe our uncertainty about the state of the world.

    We all use probability distributions to describe what might happen in the world, e.g. when we throw a fair six-sided die. Bayesian inference additionally uses them to model our uncertainty about the world, e.g. of whether the die really is fair in the first place.

    In the Monty Hall problem (mentioned in the previous thread) or similar the true state of the world is that the prize is behind either door A, B or C, so the “true” probabilities are 1 for the door it is behind and 0 for the other two. However, even an anti-Bayesian would consider it natural to say that, from our point of view, the probabilities are all 1/3. Hence in this case using probabilities that describe our knowledge of the world, rather than the true state of the world, are the natural way to approach the problem.

    The key feature of Bayesian inference is to use this probabilistic approach *all the time*, even in cases where we have no meaningful prior knowledge with which to construct the prior distribution.

    1. I agree with you, though think your point should be elaborated on further, because I think the rebuttal would be that things like the Monty Hall problem are drawn from some “urn” of games (so to speak) so “randomness” still applies even though in any single game there is ground truth. I think the counter to that though is that unless we’re regressing to quantum phenomenon, there is no such thing as “random drawings.” Just *apparent* randomness induced by independently evolving unobservables that causally influence the observations. Ergo, each game is *not* drawn *randomly* form some kind of urn even though it might be useful to think of it that way. So to concede that probability applies in any such cases is tacit endorsement for probability applying to uncertainty in general, which, as you point out, Bayesian’s consistently use.

  13. Yes, the “subjective” probabilities in the Monty Hall game can of course be justified in an entirely frequentist sense either by repeatedly playing the game or by writing a few lines of code to simulate it.

    They are relative frequency probabilities at the point before the prize is placed and the experiment starts, in much the same way as the 16% example in the penultimate paragraph of the post.

    Bayesians (or at least some) have worked on making sure that Bayesian subjective probabilities do indeed agree with relative frequencies whenever possible for more complicated cases than this.

Leave a Reply to Ben Goren Cancel reply