More debunked or questioned psychological studies

December 31, 2022 • 9:45 am

From the site armin gravitas, characterized as “a simulacrum standing in for Gavin Leech“, a consultant, we have a three-year old piece that gives many examples of once widely-accepted psychological claims that didn’t stand up to (or were severely weakened by) attempts at replication. There are many more than the few I give below, but I’ve chosen a couple that I’ve written about or that readers may be familiar with.  On the webpage below (click to access), each weakened or refuted claim comes with a link to the original paper or book making the claim, and then a list of the studies that failed to replicate it.

I would avoid citing any of the research listed below, including the Dunning-Kruger effect: a staple of internet discourse characterized on Wikipedia as

. . . a cognitive bias whereby people with low ability, expertise, or experience regarding a certain type of task or area of knowledge tend to overestimate their ability or knowledge. Some researchers also include in their definition the opposite effect for high performers: their tendency to underestimate their skills.


The fields and claims:


No good evidence for many forms of priming, automatic behaviour change from ‘related’ (often only metaphorically related) stimuli.

  • No good evidence of anything from the Stanford prison ‘experiment’. It was not an experiment; ‘demand characteristics’ and scripting of the abuse; constant experimenter intervention; faked reactions from participants; as Zimbardo concedes, they began with a complete “absence of specific hypotheses”.


  • No good evidence from the famous Milgram experiments that 65% of people will inflict pain if ordered to. Experiment was riddled with researcher degrees of freedom, going off-script, implausible agreement between very different treatments, and “only half of the people who undertook the experiment fully believed it was real and of those, 66% disobeyed the experimenter.


  • At most weak use in implicit bias testing for racism. Implicit bias scores poorly predict actual bias, r = 0.15. The operationalisations used to measure that predictive power are often unrelated to actual discrimination (e.g. ambiguous brain activations). Test-retest reliability of 0.44 for race, which is usually classed as “unacceptable”. This isn’t news; the original study also found very low test-criterion correlations.


  • No good evidence that taking a “power pose” lowers cortisol, raises testosterone, risk tolerance.

    That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

After the initial backlash, it focussed on subjective effect, a claim about “increased feelings of power”. Even then: weak evidence for decreased “feelings of power” from contractive posture only. My reanalysis is here.


  • Mixed evidence for the Dunning-Kruger effect. No evidence for the “Mount Stupid” misinterpretation.


  • In general, be highly suspicious of anything that claims a positive permanent effect on adult IQ. Even in children the absolute maximum is 415 points for a powerful single intervention (iodine supplementation during pregnancy in deficient populations).


  • No good evidence that tailoring teaching to students’ preferred learning styles has any effect on objective measures of attainment. There are dozens of these inventories, and really you’d have to look at each. (I won’t.)


  • The effect of “nudges” (clever design of defaults) may be exaggerated in general. One big review found average effects were six times smaller than billed. (Not saying there are no big effects.)


  • No good evidence that brains contain one mind per hemisphere. The corpus callosotomy studies which purported to show “two consciousnesses” inhabiting the same brain were badly overinterpreted.


  • At most extremely weak evidence that psychiatric hospitals (of the 1970s) could not detect sane patients in the absence of deception.


  • No good evidence for precognition, undergraduates improving memory test performance by studying after the test. This one is fun because Bem’s statistical methods were “impeccable” in the sense that they were what everyone else was using. He is Patient Zero in the replication crisis, and has done us all a great service. (Heavily reliant on a flat / frequentist prior; evidence of optional stopping; forking paths analysis.)

h/t: Luana

22 thoughts on “More debunked or questioned psychological studies

  1. The one that hurts the most to lose is Dunning-Kruger. I see anecdotal signs of it everywhere, but what to do about that?

    1. You can keep your Dunning-Kruger, we see evidence of it every day (at least I do regularly). They didn’t actually torpedo it, it was called ‘mixed evidence’. I think they could, and should, have left that one out.
      Note that the original Dunning-Kruger was submitted to the IgNobel competition (and won an IgNobel prize).

  2. Interesting – but it’s depressing that so many of these are widely accepted as fact and cited in public discourse.

  3. Speaking as a cognitive psychologist, this list is nonsense if it is meant to reflect on the science of psychology. Scientific psychology is what debunked many of these myths years ago (e.g., learning styles, the brain, …). They have been taught as such for decades in introductory psychology. Several others demonstrate a remarkably naive view about psychology and about science. Yes, there were many flaws in the prison experiment, but participants did comply and who is to say many of the so-called flaws wouldn’t operate in a real-life prison environment? Obedience studies have not discredited Milgram’s research and critics seem to not be aware or to ignore the many variations that he tried, including several to reduce levels of obedience. For example, proximity matters and having a co-experimenter stop (confederate of researcher) leads almost everyone to stop. Does anyone really doubt the power of authority?

    As for the so-called replication crisis, it is not limited to psychology nor do I find it particularly convincing when thinking about the millions of studies in psychology. As for non-psychology disciplines, drug companies often have difficulty replicating basic science findings on which they plan to develop products. Nor in my view, are problems of replication surprising, perhaps especially in some disciplines. Since Jerry is an evolutionary biologist, how are things going with replicating findings that purport to relate genes to behaviour? But is this really surprising given the vast number of genes likely to be involved, each with a small (miniscule?) effect? Similar issues arise with human behaviour and causes even at a molar level when you think about the myriad influences on our development and current behaviour, including genes as just one category of influence.

    1. I’m not sure what you’re trying to say. I put up this list merely to show that the replication crises applies to many pieces of “conventional wisdom” that people quote–stuff that isn’t as solid as people think.

      So you’re wrong about my intent, and you’re also rude in implying that I’m not saying that my field (or any field) is immune to this. In fact, I’ve said many times that many ecology or evolution experiments, particularly field work, may be one-offs that aren’t repeatable.

      You clearly haven’t read the posting rules here, and so go back and read them. And lay off the host.

  4. I am a retired psychologist who had a second career for 12 years teaching Psych at a college after my career in Industry. Many of these on the list have been debunked for many years or even decades. Some were never accepted by Psychology but were given some credence by the general public. I often had to work hard to debunk many student beliefs. Two students even left my class because we debunk parapsychology which has nothing at all to do with psychology. We debunked most of the others on this list too. One of the real problems in our society is that we constantly debunk the Implicit bias so called test but it is so well entrenched within the woke crybully community including Industry and government that it becomes difficult and tiresome to constantly refute it on science grounds. We only accept test reliabilities in the .8 or above for ability tests (we prefer .9) and .75 or so for personality tests. The ones reported for the Implicit bias test are laughable. Also, the validity of the instrument to predict bias is totally unacceptable with average validities in the .1 range. We only accept the utility of a test if it has a validity coefficient of at least .3 or better preferably with a large N. I could go on about all of the others on the list but it would be a very long post. Suffice to say the author of these replications is often beating a dead horse. The real problem is within the general public or segments of the public who tend to believe nonsense.

    1. Thanks that’s helpful context. Part of the problem though is indeed with personality & social psychology researchers. My introduction to the replication crisis came from Andrew Gelman’s blog (sorry) and his dissection of the statistical problems with precognition, himmicanes, bottomless soup bowls, and other nonsense generated by psychology researchers and published by psychologists who edited top journals. One of Gelman’s frequent subjects was Susan Fiske, who edited PNAS for many years and published (e.g., air rage) or supervised (e.g., power pose) some of the most egregious examples of nonsense that wouldn’t replicate. Her contributions to that mess can’t be attributed to problems within the general public.

    2. As I said above, I put these up because lots of people don’t KNOW they’ve been debunked. It looks as if the psychologists appearing here are pretty defensive, and it wasn’t my intent, as you can see, to impugn the field. In fact, if the implicit bias test is debunked, you guys better shout louder because it’s still touted in many colleges and universities, including mine.

      1. Nothing to do with being defensive, unless you think that rejecting the equivalence of science and Maori ways of knowing is being defensive. It has to do with rejecting false claims, in this case that these examples reflect the discipline of psychology. As for implicit associations, it is not being touted by psychologists or even by its original proponents for the way it is being used in the wider society. And we all know how difficult it can be to debunk misguided ideas outside academia, such as the nonsense about the nonbinary nature of sex, no matter how loudly we shout!

        1. It certainly looks defensive, and, as I said, you were rude in your first post. You are imputing to me things I didn’t say, like psychologists are touting these tests. And of course you had to get in your little zinger about evolutionary genetics, as if I were somehow implying it’s free from error.

          Were I you, I’d take a break from this thread and avoid trying to dominate it.

      2. If studies in psychology, or any field for that matter, are debunked it should reinforce the field, I’d say.
        And yes, those debunked studies are to be widely exposed, if the debunking has not been debunked in turn, that is. Which is not unimaginable.

  5. In response to Jim and Brian: perhaps a good reason for WEIT to report on the Replicability-Index findings is that the discrediting of some of these “maxims” hasn’t travelled very well outside of psychology. I work in linguistics, where the notion of priming is treated by some linguists as fact (“psychology shows/proves…” etc.). So solid meta-analyses and links to studies that show major constraints on priming are really useful.
    And as far as “implicit bias” goes: in higher education institutions and some public sector offices, implicit bias training is still seen, and offered, as a viable measure.
    In short, there’s still a need to give air to this kind of work.

  6. I give psychology some slack … I consider it like economics a dismal science. Not that the practitioners are inherently bad at it, but because the topics are difficult/complicated and in themselves can be affected by previous outcomes.

    1) Psychology is still a young discipline and may remain so for a couple more centuries.
    2) Perhaps like other academic disciplines there is a lot of pressure to remain relevant, publish or perish so to speak.

  7. On split brain patients, I don’t think the Pinto et al paper’s interpretation is substantially better than Roger Sperry’s earlier one. Sperry said there were two conscious subjects, which I agree is somewhat misleading. But I don’t think “One Conscious Agent with Unintegrated Visual Perception” really captures the data well, either. Better to say, at least under experimental conditions, the question of a specific number of consciousnesses is ill-posed. Their perceptions and actions are integrated in specific ways, to varying degrees, that need to be spelled out to some extent, if you don’t want to mislead lay readers.

    Now, when you allow split-brain patients to move their heads around, use both hands together, and other everyday life tactics that allow them to integrate information across their hemispheres, I don’t have any problem with saying they are one person with one consciousness. As long as nobody reads too much into that (e.g., dualism).

  8. I read about “nudge theory” on Wiki. There are many industries that have “nudge units” including the World Bank. It reminds me of DEI- a lot of wasted time and money. William Gibson’s novel Pattern Recognition uses nudge-based ideas and themes.

  9. I am a physicist and an occasional reader of your blog. While psychology is obviously not my field, let me make one comment about Milgram – you might be interested in the Netflix documentary “Don’t Pick Up the Phone” which is in many ways a real life example of Milgram’s experiment. It is a bizarre story – and while quantifying the fraction of human beings who can be manipulated in this way is arguably difficult, the real life example suggests that it is not that hard to find people who can be manipulated.

  10. There have been many independent and successful replications of the Milgram-experiment, F.I.:

    One nice variation of the Milgram-experiment was conducted by Sheridan and King (1972) with 26 participants (13 women and 13 men): they used a puppy as the victim. Even though the cute puppy was visible to the subjects and enough actual shock was delivered to cause the puppy to yelp and jump in pain, 100% of the female subjects were fully obedient (pushed the shock button right up to the maximum intensity), while only 54% of the males were obedient.

    I wonder if the participants thought the puppy was acting.

  11. About the Libet experiment: “We still don’t have free will (since random circuit noise can tip us when the evidence is weak), but in a different way.” Could someone explain what this means?

Leave a Reply