Scientific evidence for psychic powers?

October 31, 2010 • 7:07 am

A respected peer-reviewed journal in psychology, The Journal of Personality and Social Psychology, is about to publish a paper that presents scientific evidence for precognition.  The paper, by Daryl Bem of Cornell University, is called Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect,” and you can download a preprint on his webpage.  I’ve scanned the paper only briefly, and am posting about it in hopes that some of you will read it carefully and provide analyses, either here or elsewhere.

The paper purports to show that a choice that you make in a computer test can be influenced by stimuli you receive after you’ve already made the choice.  This implies you have some way, consciously or unconsciously, of detecting things that haven’t yet happened.  In an article in Psychology Today, Have scientists finally discovered evidence for psychic phenomena?“, psychologist Melissa Burkley at Oklahoma State University summarizes two of Bem’s studies:

However, Bem’s studies are unique in that they represent standard scientific methods and rely on well-established principles in psychology. Essentially, he took effects that are considered valid and reliable in psychology – studying improves memory, priming facilitates response times – and simply reversed their chronological order.

For example, we all know that rehearsing a set of words makes them easier to recall in the future, but what if the rehearsal occurs after the recall? In one of the studies, college students were given a list of words and after reading the list, were given a surprise recall test to see how many words they remembered. Next, a computer randomly selected some of the words on the list as practice words and the participants were asked to retype them several times. The results of the study showed that the students were better at recalling the words on the surprise recall test that they were later given, at random, to practice. According to Bem, practicing the words after the test somehow allowed the participants to “reach back in time to facilitate recall.”

In another study, Bem examined whether the well-known priming effect could also be reversed. In a typical priming study, people are shown a photo and they have to quickly indicate if the photo represents a negative or positive image. If the photo is of a cuddly kitten, you press the “positive” button and if the photo is of maggots on rotting meat, you press the “negative” button. A wealth of research has examined how subliminal priming can speed up your ability to categorize these photos. Subliminal priming occurs when a word is flashed on the computer screen so quickly that your conscious brain doesn’t recognize what you saw, but your nonconscious brain does. So you just see a flash, and if I asked you to tell me what you saw, you wouldn’t be able to. But deep down, your nonconscious brain saw the word and processed it. In priming studies, we consistently find that people who are primed with a word consistent with the valence of the photo will categorize it quicker. So if I quickly flash the word “happy” before the kitten picture, you will click the “positive” button even quicker, but if I instead flash the word “ugly” before it, you will take longer to respond. This is because priming you with the word “happy” gets your mind ready to see happy things.

In Bem’s retroactive priming study, he simply reversed the time sequence on this effect by flashing the primed word after the person categorized the photo. So I show you the kitten picture, you pick whether it is positive or negative, and then I randomly choose to prime you with a good or bad word. The results showed that people were quicker at categorizing photos when it was followed by a consistent prime. So not only will you categorize the kitten quicker when it is preceded by a good word, you will also categorize it quicker when it is followed by a good word. It was as if, while participants were categorizing the photo, their brain knew what word was coming next and this facilitated their decision.

There are at least four explanations for these results:

1.  They’re real: we have previously unsuspected abilities to detect the future.

2.  They’re fraudulent: Bem rigged the experiment or made up the data.  I’m assuming this isn’t the case.

3.  They’re wrong because of some flaw in the experiment (or in the computer programs) that made these results artifactual.

4.  The results are statistical outliers that got published simply because they represent one of those cases in which we reject the null hypothesis (i.e., the hypothesis that we have no ability to predict the future), even though it’s true. This is called a “type one error” in statistics.  When experimental results give such an error of 5% or less (i.e., exceed the “significance threshold”), scientists do reject the null hypothesis and claim that something else is going on (in this case, that there’s precognition).  But with a threshhold of 5%, you’ll make a mistake one time in twenty.  (That’s the basis of the old science joke, “95% of your experiments fail; the other 5% you publish in Nature.”)

So maybe Bem’s results represent type I errors.  This is the conclusion of Psychology Today blogger Daniel R. Hawes. And indeed, the probability values in Bem’s experiments aren’t all that tiny (see his Table 7): several of them are between 1% and the critical 5% threshold.  But—assuming Bem published all of his studies, and didn’t leave out the ones that didn’t show precognition—they’re consistent: the effects (though very small, about a 3% increase in “hits” over what’s expected by chance), are always in the same direction. Even though the “precognition effects” aren’t large, this consistency demands explanation.

But before we have explanation, we must have replication.  Now that this result is in the open, it’s up to other scientists to see if similar studies give similar results.  Only then should we start worrying about the possibility of unknown “powers.”

In a comment on the Psychology Today article, hokum-debunker James Randi has challenged Bem to meet his conditions for demonstrating paranormal phenomena, a demonstration that comes with a million-dollar prize. Randi:

I find this to be a very interesting reader response. After the usual magnificently uninformed comments: “Time is strange, gravity doesn’t make sense, and matter is mostly empty space. There is no such thing as time. Everything, what we call past, present and future, is happening in the Now” we find far more cogent remarks, along with the suggestion that author Bem should go for my Foundation’s million-dollar prize. Of course he should, but he won’t. We’ve made him the offer, many others have, as well, but he chooses to ignore it. It’s there, it’s real, the grubbies constantly claim it doesn’t exist, but it persists. Dr. Bem, give us a call. Accept the challenge, under your conditions, thoroughly fair, proper, definitive, observed and controlled, and you don’t have to invest a penny. Isn’t that attractive to you? And just think of the book sales along with the currency. Yes, a million US dollars still buys a lot… Make us all happy, won’t you?

Hello…? Dr. Bem? You there…?

Do read the paper first if you want to take apart the study.

119 thoughts on “Scientific evidence for psychic powers?

  1. Worry not; some colleagues and I at the University of Edinburgh are arranging a replication attempt right now! Will keep you posted with any developments…

    1. I’ve little doubt that your group will fail. What is a little interesting is why these people did not. Of course, when we think back to the PEAR studies and how much work it took to discover some of their potential sources of errors, it shouldn’t be a surprise to learn that reading the paper won’t tell us much.

      But thanks for the double-check anyway, definitely in the best tradition of science.

      1. Eh, I think I have a bit more faith in the Scots than to think that researchers at their premiere university are prone to failure.

        My bet is that their research will both be successful and show results contrary to those reported by Bem.

        Looking for something and not finding it is most emphatically a success, so long as the search was properly executed.



          1. Even that’s not how I’d phrase it.

            Getting a negative result is not failure. Indeed, it’s every bit as much a success as getting a positive result.



            1. No disagreement from me on that. I thought you thought they were being negative or disparaging. Language is funny and the internet makes it worse. 🙂

    2. Okay, would this be a valid test according to the article? Give a person a choice of four colors that they can pick from. They choose the one that they think you are going to display. You immediately display your choice. If they consistently match, they are a winner! Sounds like it would work great at the Roulette wheel. lol

      1. I think any test that demonstrates a choice on the part of the participants (presumably out of a relatively wide range of choices), making a random choice (by computer; making a choice yourself would confound with purported telepathy and potentially introduce common biases) and showing the result to them afterward would potentially be as valid.

  2. “Essentially, he took effects that are considered valid and reliable in psychology.”

    That’s part of the problem. Empiricism is of a tangentially unhinged nature in psychology.

    1. 🙂 Yes.

      If, on the off chance, these results DO prove repeatable, I’d first want to go back and re-examine what’s “considered valid and reliable in psychology” before accepting them.

    2. Seriously? In my experience experimental psychologists understand far more about the experimental method than, say, most medical researchers.

      1. Agreed. The subfield of psychophysics, for instance, is as empirically driven as any branch of the biological sciences.

    3. There’s a great deal of solid empirical work being done in psychology. I haven’t read the paper yet or done any background research, but I know that in particular priming is a very well-attested and empirically confirmed phenomenon.

      But based on the little bit about the experiment that’s been excerpted, my current thoughts are that these results are pretty much exactly what you’d expect assuming priming is symmetric in time — and that’s exactly what I’d assume.

      It seems, just from the excerpt, that he might just be doing a priming experiment where he calls the prime a “recall test.” I don’t see anything surprising or counterintuitive about the notion that when one sees a list of words and then a subset of the list (whether you call the smaller list a “test” or a “prime”) that they have better recall of the words appearing on both lists. I certainly don’t see why this would constitute precognition.

      I want to read the study in detail before concluding this is the case, though.

  3. I downloaded the paper, used the “Find” function, and typed “blinding”, “blinded” and even “blind”. Nothing. Tells me everything I need to know.

    1. I’m scanning the paper right now and since the input and tests were administered by and selected by a computer with a true hardware-based random number generator, blinding does not seem to be a big issue.

      That simplistic search should not tell others all they need to know.

    2. Given that the word ‘blind’ does in fact occur in the paper, I wonder how carefully you searched.

      More importantly, blinding is described without using the word (e.g. “the experimenter was uninformed as to condition”).

      Sorry, your approach is inadequate.

      1. Given that the word ‘blind’ does in fact occur in the paper, I wonder how carefully you searched.

        I know, in the context of a “blind alley”, but not in a context that had anything to do with blinding.

        And even if blinding isn’t necessary, wouldn’t you at least expect a paragraph explaining what they did to make it unnecessary? Or that it is implicitly taken into account?

        But yeah, my initial comment was not supposed to be entirely serious. Guess I should have indicated that a little better.

    3. I didn’t read the paper (and probably won’t), but I can tell you that sensory (psychophysical) and cognitive experiments (such as the one reported) are generally “blind” in the sense that the computer controls the random or pseudorandom nature of the trials. The experimenter is not supposed to have knowledge of the trial ordering until he/she is analyzing the data. After that, it comes down to adhering to a strict and objective analysis routine… and ethics, as science always does.

  4. As a psychic, I can assure everyone that all the necessary confirming experiments were done in 2015 and they came up positive. Where’s my million bucks, Jim?


  5. God said it, I believe it, that settles it:

    Now it happened, as we went to prayer, that a certain slave girl possessed with a spirit of divination met us, who brought her masters much profit by fortune-telling. This girl followed Paul and us, and cried out, saying, ‘These men are the servants of the Most High God, who proclaim to us the way of salvation.’
    —Acts 16:16-17

  6. I’m not that skilled in designing or evaluating tests so I just took a look over the first study he reported. It sounds like he’s checked for the obvious points of failure: used a range of random number generators (RNG) including a hardware-based true RNG (TRNG); ran the tests with computer-input to test for persistent biases; and gave explicit goals (guess the future location) rather than mine the results for any outliers.

    Some things make me wonder what the data really looks like. For instance, he writes: “on all the experiments using highly arousing erotic or negative stimuli a relatively large number of nonarousing trials must be included to permit the participant’s arousal level to “settle down” between critical trials.” How does he know this? Could it be that he ran the tests several times and then discarded a block of tests which had less time to “settle down” in a post-hoc way of excluding negative results?

    I’m also interested to see that women showed the greatest success rate and he mentions that he tried to re-run the tests with more explicit porn but doesn’t discuss the results. This doesn’t really fit my naive expectation and it sounds like it suprised him as well, yet he makes little comment on it. Another sign of fishing for hits?

    On the “plus” side, when he looks at male/female, and introvert/extrovert, the numbers either go to 50% or skew in his favour, there’s no hint that any group skewed negatively.

    1. It is woo, quantum woo to be more precise.

      First, the idea that there is a quantum theory where all probability is conserved, nothing material is lost, in quantum physics speak “unitarity is conserved”, not only over time but over the universe isn’t anything new. It is called “wavefunction of the universe” and is a somewhat controversial, I think, idea of Hawking and Hartle (and others).

      [Note, I’m simplifying a helluva lot here. As they are stated, you can criticize my sweeping claims in painstaking detail.]

      Second, the tests of that as reviewed in the article has IMHO nothing to do with the wavefunction of the universe _or_ the even more speculative idea of reverse causality itself. They are simply based on the often used equivocation between post-selection in “quantum entanglement experiments” and breaking in physics speak “the explicit Lorentz invariance”, special relativity, inherent in the theory itself.

      Simply speaking, a quantum system can, if it is treated gently, preserve information of how it was created even when its parts separates over long distances. When measured this information is read off. It is a popular pastime to pretend that this information is “communicated” instantly, even though there are simpler interpretations.

      Look at it this way: you can select random pair of cards and let a friend stuff them in pairwise numbered envelopes. Mail them to another friend. Then throw a dice to select a number, open the appropriate envelope and presto! You know what card your friend have in the same numbered envelope instantly, without having to wait for him to phone you with the information.

      The idea that this information, which is merely correlations between properties (here: pairs and colors), is “causally transmitted instantly” is what quantum woo-ers rely on. By extending that trick they can pretend that the information is sent “reverse causally”, even though there is still no causal signaling going on.

      The reason you see this mentioned in papers is because physicists can trick themselves as we all can and/or that they like to play with possibilities as long as they can’t be explicitly rejected.

      [Again, simplifying the hell out of the area.]

      Another correlation that I recently found out is that the mentioned physicist Paul Davies, and quantum luminaries like Anton Zeilinger, who supports such unwarranted causality speculation, are Templetonians!

      You can see deist Davies pulling in every thread that can give him teleology, such as cosmological reverse causality here, or the “shadow biosphere” that (seemingly at odds with the concept, but read their paper) would give him and theist Charles Lineweaver reason to doubt an easy Earth abiogenesis. Woo begets woo, so to speak.

      1. [not a physicist]

        Doesn’t bell’s theorem give the above interpretation a problem?

        Because, the above sounds like a local hidden variable interpretation.

        Yup, way off topic 8( just curious.

        [/not a physicist]

        1. Not hidden variable at all, the entangled properties are observed.

          For example, two entangled photons as proposed (but not actually used) in bell test experiments will have their spins correlated. The wavefunction carries the entangled spins as a superposition, until observation decohere them (in some modern theories) into orthonormal states (if we are looking for the spins).

          Actually, Wikipedia is a good cheat sheet, when you don’t have your old textbooks handy (^_^’):
          quantum entanglement of states; measurement of observable, non-hidden, properties as superposition of experimentally orthonormal states; decoherence as replacing old collapse theories; quantum superposition of states.

          If not a physicist, those pages may be scary though! The idea is to look for the sum (superposition of states) in the pages after it comes up in the entanglement and measurement description, and avoid reading the other then inconsequential stuff. Sorry I can’t be more specific, but without numbers and/or links to the relevant equations those pages are a mess to navigate!

          Hidden variables would be extra non-observed (really, non-observable) parameters, say a “woo-spin”, that would have to be included in the wavefunction state description to explain the behavior of observed property “spin”.

        2. I agree that Torbjorn’s analogy of the envelopes sounds a lot like Hidden Variables, which are ruled out by Bell’s Theorem…. but it was just an analogy, and apt enough for the present topic.

          Bell’s Theorem shows that it couldn’t have been some hidden trait that determines a priori what measurement you will make, but it doesn’t prove that the measurement of one of the pair “causes” the state of the other (and there are good philosophical, epistemological, and empirical reasons to believe that is not the case).

  7. Of course, we’ve all had the ‘deja vu’ and ‘j’amias vu’ experiences.

    And I don’t find the explanation for those experiences particularly compelling. More like ‘just so’ explanations, really.

    However, I wouldn’t propose an ability to mentally time travel without some pretty darned compelling evidence.

    Correct lottery numbers several weeks in a row ought to suffice. Heck, it would pay for the research.

        1. I’ll accept it when I see the headline: “Lottery market crash dives, psychics market soars”. (They are all frauds, you know. (O.o))

    1. Yeah, these guys should win every time, if only to demonstrate the validity of their powers.

      Some specific heads-ups about disasters would help do — many thousands of lives could be saved with advanced warnings.

      I guess they all had a day off on September 10, 2001 (and 9, and 8, etc.)

  8. I am no expert, but I think I see a problem with the word precognition experiment already. The nouns are grouped into four categories (foods, animals, ocuupations and clothes)of equal numbers. The subsequent selection of nouns by the computer is not truly random–it draws equal numbers of nouns from each category. If human memories tend to do the same (recall approximately equal numbers of nouns from each category), then this will bias the result, I think.

  9. I mean to say, there’s no way this is true precognition. Like ghosts, psychics, and Scooby Doo, there’s always a rational explanation.

    My money’s on the janitor.

    1. …and he’d have gotten away with it if it hadn’t been for those darned kids…

      Let’s have a Scooby Snack.

  10. I lack the statistical background to check what results he gives (I hope he allows access to his data, as he offers the computer program as well – kudos to him on that). I did see his use of “psi researchers” (an oxymoron at best), as well as using Dean Radin in a positive manner, both negatives but expected.

    He also looked at a lot of different things with all his experiments, and I’m a bit suspicious about that. It would have been better in my mind to focus on one experiment rather than try to check for all the things he did (erotic/non-erotic, etc – closed the paper and I forgot already). Not sure how well his “random” generators work – I know computer RNGs really aren’t, but not sure how the current generators still fit with what I know, so I could be wrong there.

    Given the dubious history of psychic research, and the misuse of quantum mechanics (I already saw (pg 13 “quantum physical processes embedded in the RNG”, there is more) – which makes me believe he has little clue about what he’s talking about (I seriously doubt his RNG uses something at the quantum scale to calculate the numbers, I think he’s just reaching, but again, I could be wrong) – let’s just say I am highly skeptical, but am willing to see what others can come up with.

    1. He does discuss this and according to the paper he used a hardware-based true RNG, the Araneus Alea I. These aren’t woo and by all accounts do work well.

      As ever the devil is in the details so all I can say is that at a high level, based on what little I know, it seems like he’s trying the right things.

      1. And that TRNG uses semiconductor technology to generate the Gaussian white noise they sample, and semiconductors rely on quantum effects… so yes, it is fair to say they are quantum processes involved in his RNG. (Though one could also say this is a quantum blog comment, since it also relies on semiconductor technology…)

    2. Would it be a good test to take a known set of ‘random numbers’ from nature or human knowledge? Would it not also be a better test to offer three possibilities?

      I have another test – one person chooses, a second person afterwards decides what the first person have predicted. Would that work?

    3. Robert Park’s book Voodoo Science lists arguments from minor statistical aberations — instead of unambiguous evidence — as a characteristic of pseudoscience.

      Of course, if the results are replicated in double-blind tests, then we have to look for what could be the mechanism for such an effect.

  11. I thought these attempts to prove non-causality died long ago. That said, I welcome the test. And I’m quite happy that Stuart Ritchie et al will attempt a repeat.

    [I’m not so happy with the hypothesis of the paper, which is both unnecessary to test for non-casual effects and completely woo based.]

    My quick take was to take a test “at random” and look at the data. Table 2 gives data that can be used in a binomial test. Unless I’m mistaken, my numbers come out that the observation of 2790 hits makes the true hit value r included in the 0.500 < r < 0.534 interval at the given z=2.44 (or roughly 99 % confidence) which I picked for mere convenience and comparison.

    Thus there is no difference between the outcome and a true random outcome to high confidence, or in other words causality is not broken in that test.

    Maybe I'm mistaken in my "research by Wikipedia", maybe Bem is. It seems to easy a mistake to do, not check against the actual confidence interval of the null hypothesis. But then again now someone needs to find my mistake.

    1. “to easy a mistake to do” – too easy a mistake to do.

      Maybe causality can’t be rejected, but self-referential mistakes should be thrown out by sheer force of irony!

      [Hmm. If there is no “psi”, is there at least an “irony will” character of nature?]

      1. Don’t beat yourself up!

        I am intrigued by their use of erotic images from the internet – so that’s the excuse psychologists come up with – “I was doing some research”!

        1. and exactly that explanation was given by a radio personality in the SF Bay area a couple of years back, when child porn was found on his computer.

        2. But I *like* beats, I love dancing!

          Maybe you have found the real reason behind the investigation? “I know what I want to do. Now how can I get this to be a study and get paid for looking at porn? … I know, how about a study that looks at teleological effects!?”

    2. Speaking of easy mistakes, my “a true random outcome” should be “a true random outcome _process_”, since that is the hypothesis in question.

    3. Yesh, in the paper he states that

      Across all 100 sessions, participants correctly identified the future position of the erotic
      pictures significantly more frequently than the 50% hit rate expected by chance: 53.1%, t(99) =
      2.51, p = .01, d = 0.25.

      Sorry man, but that’s totally doable by pure chance. I don’t feel like working it out because I just had a midterm on this topic and it might make me barf, but a 3% variance from the expected value is not that awesome unless it’s repeatable.

      1. Bootstrapping it:

        count 962) count <- count + 1 }

        When running this I get count's of about 1500 out of 1000000, which is pretty small if you ask me. Perhaps it might pay to study a little harder 🙂

        1. Oops, my R code got mangled. Regardless, if you simulate their experiment, only ~1500 times out of a million would you see that result or larger given their sample size.

  12. There is a fifth possibility:- The subject’s response to a stimulus sends information to the future that affects the (T)RNG so that the correct primer is shown.

    1. Ah! Martin posted this while I was reading it/writing my bit – he ‘predicted’ my view or perhaps ‘directed’ my choice!

    2. Slightly embarrassed to say that as an undergrad at Princeton I took a class where I worked with the PEAR lab people a bit – this was their thesis at the time, I recall. It was kind of interesting, even if wooish as all get-out.

  13. I hear shades of Rupert Sheldrake in this – morphic resonance. No doubt people will be talking of quantum effects as well. My intitial reaction is “Heck – the article is 61 pages long!”

    How do we discount the possibility, that the mind somehow ‘directs’ or ‘influences’ the choice that the machine makes via an as yet unknown mechanism, rather than predicting it?

    Were there some individuals who were much better at the tests than others, & some who were worse?

    Potential test of the experiment: Set up 100 computers to see if they can predict what the experimental computer will choose.

    Only got to page 11 so far – the rest will have to wait until Monday.

    1. It took me about 2 hours to go through this & try to digest it (not a fast reader on a screen).
      So – he covers my point on ‘psychokinesis’ p.13 – “the use of a true RNG opens the door to the psychokinesis interpretation: The participant might be influencing the placement of the upcoming target rather than perceiving it, a possibility supported by a body of empirical evidence testing psychokinesis with true RNGs (Radin, 2006, pp.154–160)” and dismisses it – “we have obtained positive results using both PRNGs and a true RNG, arguably leaving precognition/reversed causality the only nonartifactual
      interpretation that can account for all the positive results.”
      He also addressed the point I made up the page & got a machine to do the same test – surprise, they were about 50-50 in their choices (p.15).
      The inverse U is interesting – in experiment 7, the habituation/boredom factor making tests that were too long not possible (p.37). By the time we get to experiment 9 the number of participants has dropped -34 females and 16 males.
      Interestingly he says “all but the retroactive induction of boredom experiment independently yielded statistically significant results.” (p.44)
      What he has to say about experimenters & their understanding of statistics is also interesting, & the following might be taken for an exchange between the religious & the atheist person –
      “In three psi
      experiments specifically designed to investigate the experimenter effect, a proponent and a skeptic of psi jointly ran a psi experiment, using the same procedures and drawing participants
      from the same pool (Schlitz et al., 2005; Wiseman & Schlitz, 1997, 1999). In two of the three
      experiments, the proponent obtained a significant result, but the skeptic did not.” (p.49) Blow me down – as I envisaged, he brings up quantum entanglement.

      He seems reasonable in his discussion & I think he is measuring something (what?) but I leave it to the statisticians among you to decide if his numbers really are significant – to me I would be doubtful.
      An interesting article & I am sure we will hear more on this. I have to say I would like to know if it was approximately the same bunch of Cornell students eager to get an exam credit each time. I would also be interested to see them take the students who were best in their results & follow them up with further experiments. If there really IS some small effect of evolutionary advantage, then it does not have to be too great I suppose to survive. Could any of this be extended to other species or would this ‘ability’ be uniques to humans.

      This all reminds me of Woody Allen – “When I was in school, I cheated on my metaphysics exam I looked into the soul of the boy sitting next to me.”

      1. He also addressed the point I made up the page & got a machine to do the same test – surprise, they were about 50-50 in their choices (p.15).

        Which makes me wonder what distribution was assumed in those computer experiments. A simple uniform distribution? He says he corrects for user bias (it is quite well-known that people don’t actually choose randomly), but did he use computer simulations with built-in bias as well? How else could he test if his bias-correcting measures worked, or whether they have certain artifacts of their own?

  14. Recently read some William James. At the turn of the last century he and many psychologists were interested in psychic phenomena. 100+ years with little or no supportive evidence leads to a GREAT DEAL of SKEPTICISM. If these phenomena are true, why so hard to demonstrate? If there are 1000+ studies and one shows an effect, what does that suggest?

    1. Right. A p value of 0.05 means that there is a 1 in 20 chance that the result is incorrect. Even P<0.01 means that an expected 1 of 100 studies will produce a false result.

      Repeating this study by an independent third party (or several) will start to tease out the validity of the findings.

      My money's on the fact that sometimes, you get results that are contrary to what you expect. Especially when elements of random chance are inherent in the test.

      In Vegas, they call that "respecting the streak". Sometimes a player gets on a roll (in either direction, of course).

      I, too, am not all that impressed with a 3% deviation from the expected.

      1. Right. Extraordinary claims require extraordinary proof–I’d say six or more sigma rather than two or three. Also, I think there is a statistical bias in the procedure to begin with, along with goodness knows what else. Why were the subjects told it was an ESP experiment at the outset? There was no need for them to know.

        1. I agree, but he says “My approach to the problem of experimenter effects has been to minimize the
          experimenter’s role as much as possible, reducing it to that of greeter and debriefer, and leaving
          the experimental instructions and other interactions with the participant to the computer program.
          Moreover, I used several undergraduate experimenters in each experiment and deliberately gave
          them only informal training. This was to ensure that the experimental protocols are robust
          enough to overcome differences among experimenters so that the protocols have a better chance of surviving replications in other laboratories. Whether or not this strategy will be successful remains to be seen.”

          1. But we don’t know if his statistical tests were conjured up before, or after the results were in. Remarks like the following don’t give much confidence:

            The results from the last 50 sessions did not differ significantly from those obtained on the
            first 100 sessions, so all 150 sessions have been combined for analysis.

            This just sounds fishy to me – you can’t decide how to do your analysss after your results are in.

            1. Yes – I am not defending him, I am a bit dubious as he pleads special consideration because if you are sceptical you are unlikely to get a positive ‘psi’ result! Does this make it science or mumbo-jumbo? I am not sure what he measured but am prepared to believe he measured something rather than nothing – the question is was this some ‘psi’ (he bandys the word around too much) effect or not.

  15. I have only scanned the procedure sections of the experiments, but it seems to me that the logic of the experiments would be unchanged if the computer randomly selected the “targets” *before* the the actual trials, but the interpretation of the results (assuming similar results were obtained) would certainly change! Indeed, we would now suspect some artefact of the procedure. Furthermore, would not a better control condition be matched groups of participants whose performance was *not* followed by the random selection of targets (i.e., each is randomly matched to one of those who were followed by random selection). Indeed, Bem could do that now: randomly pair participants and score them according to each others’ post hoc random selections. By the logic of the experiments, shouldn’t these pairings also show “anomalous retroactive influences”? And, if not, why not?

  16. Interesting. There are similarly woo-ish studies about near-death experiences hawked on the website Skeptiko (not to be confused with Skeptico, which is a terrific skeptical blog). I’m not an expert in statistical methodology or anything like that, so my reaction is just one of general skepticism. I find it hard to believe that after, like John K says above, a century of failed research, a hawk team of keen researchers finally pulled off the impossible. Let’s see it analyzed and, most of all, replicated.

    I’m a little disappointed this got published, but publication doesn’t seem to mean a whole lot these days. I mean, if the research is sound, yeah, publish it. But personally, I would think that any claims of verifying supernatural or psychic phenomena would warrant an extra measure of skepticism.

  17. I consider my previous comment as perusing a sanity test, not attempting to take on the study’s hypothesis. With Coyne’s well considered admonition to not take apart the study before reading it I will add this as a physicist:

    1) Causality is well established. I expect extraordinary evidence for an extraordinary effect. This isn’t it.

    2) It is curious that the experiment series, which by its own results have ~ 99 % confidence, according to Bem reject causality in all cases. The chances for a single failure by pure chance in the series of 9 (or more) tests as stated in the abstract should be (1- (1-0.99))^9 ~ 1 – 9*0.01 ~ 0.9, I believe. This series is as the outcome is stated a victim of its own success.

    (This may be related to Deen’s comment #5 of not finding any double blinding, a necessary ingredient in trustworthy “epidemic” studies. But this speculation hinges on reading the study.)

    And this I have to add as a question to those who read the study and consider its innards:

    3) The hypothesis in the abstract seems to contradict the outcome as I saw it in my sanity test of Table 2 in the paper, or as reviewed in the article. The effect was small on the order of a few percent difference to the random expectation value of 50 %. But a non-causal “awareness” or “apprehension” of the situation should tend toward 100 % effect, i.e. non-randomness.

    Why would a “psi” hypothesis be such that a small and necessarily difficult to test effect is the expected outcome? “I don’t get it.” Or maybe I do. (O.o)

    1. OK, I see that this research tires me. 😀

      To avoid confusion I should have said in 3) that “The claimed effect was small”, and that “the situation should tend toward an expectation value of 100 %”.

  18. I’m sorry, Dr. Coyne. I know this is the wrong thread in which to post this. I can’t stop thinking about those boots with the peacock feathers sown on. Just so amazing. I’m really glad I found your blog.

  19. “The results showed that people were quicker at categorizing photos when it was followed by a consistent prime.”

    Picture this:
    Kitten1 => Happy1 => Kitten2 => Happy2 => Kitten3 => Happy3

    All that happened is that Happy1 primed for Kitten2, and Happy2 primed for Kitten3.

    I didn’t read the paper, but this is really the only explanation for how the experiment must have been set up.

  20. There’s another possibility:

    The results are real and they’re not statistical outliers, but they’re not caused by humans having precognition.

  21. Does anybody else find it suspect that only correctly remembered words were used in the measure?

    Unlike in a traditional experiment in which all participants contribute the same fixed
    number of trials, in the recall test each word the participant recalls constitutes a trial and is
    scored as either a practice word or a control word.

    Should we not also look at the effect on words that were not recalled? I.e., are there fewer forgotten words among the practice words? Why did he leave those out? And did he decide to leave them out before, or after the results were in? Why is it not in the “methods” section how the test is going to be scored?

    1. “And did he decide to leave them out before, or after the results were in?”

      The desirable results retroactively caused him to do whatever necessary to achieve them. Writing a conclusion summing up positive results reached back in time to facilitate the generation of such results. Psience, baby!

  22. This just in…

    A Replication of the Procedures from Bem (2010, Study 8) and a Failure to Replicate the Same Results

    Jeff Galak
    Carnegie Mellon University

    Leif D. Nelson
    University of California, Berkeley – Haas School of Business

    October 29, 2010

    We replicated the procedure of Experiment 8 from Bem (2010), which had originally demonstrated retroactive facilitation of recall. We failed to replicate the result. The paper includes a description of our procedure and analysis as well as a brief discussion for some reasons why we obtained a different result than in the original paper.

    1. That should be, in the title, Study, the number eight, and close parenthesis.

      Apparently that combination produces an emoticon. Could that have been deliberate on the part of the authors? Cute.

    2. Just looking quickly at their results, I’m not even convinced that it’s a failure to replicate. It’s not enough just to show that you got no effect (or even, as these authors found, an effect in the opposite direction). They need to show that what they found is significantly different from the relatively small positive effect that Bem got.

      With a small enough sample size you can always fail to find an effect if the putative effect is very small.

      1. You can also find a very strong effect when in fact there is no effect at all. That’s just the way things are with too small a sample size.

  23. The whole idea of finding a small effect – even a consistent small effect – when examining precognition in a large group of volunteers brings up a related point.
    With something like precognition/psi etc, what would such a result be suggesting? That there is a small amount of precognition within the general population (in other words you or I could have a very slight ability in this area such that we manage to guess 1 in 20 tests more than would be expected.
    Alternatively shouldn’t something like precognition be more like an ability that is unevenly distributed in the population? – such as ability at maths, languages, music, games, writing etc.
    In that case then a 3% variance from background in a large population sample might be picking up a small number of psi-able individuals from amongst a background of psi-unable – after all the whole field is based on the idea that some individuals do have abilities that should be rather more obvious than a tiny effect only detectable though a mass screening procedure.
    Would it be much more convincing having a small number of these psi-able cases showing very striking differences from the background rate?
    I guess what I’m thinking of is more along the lines of the famous Project Alpha experiment that got hoaxed by Randi and Banachek.

  24. Hello,

    We just submitted a reply to the Bem paper, outlining the problems with this study (see for the complete paper). Here is the abstract:

    Does psi exist? In a recent article, Dr. Bem conducted nine studies with over a thousand participants in an attempt to demonstrate that future events retroactively affect people’s responses. Here we discuss several limitations of Bem’s experiments on psi; in particular, we show that the data analysis was partly exploratory, and that one-sided p-values may overstate the statistical evidence against the null hypothesis. We reanalyze Bem’s data using a default Bayesian t-test and show that the evidence for psi is weak to nonexistent. We argue that in order to convince a skeptical audience of a controversial claim, one needs to conduct strictly confirmatory studies and analyze the results with statistical tests that are conservative rather than liberal. We conclude that Bem’s p-values do not indicate evidence in favor of precognition; instead, they indicate that experimental psychologists need to change the way they conduct their experiments and analyze their data.

  25. I found the call to submit to the JRF unnecessary. Why should Randi be the higher authority?

    Surely a better way is the traditional ‘peer review’ route. Put it out there for others to test. In the (in my view) unlikely event that this is true, and is confirmed as repeatable by third parties adhering to a scientifically sound methodology, I believe JRF should pay the prize.

    1. They didn’t call for anyone to submit, they just stated there is no evidence of such ability in humans, and that if you could get significant precognition in a sample of hundreds, someone should have claimed the JREF prize already, and thousands (if not millions) should be cashing out at casinos around the world.

      1. That last paragraph certain looks like someone thinks they should submit. I never said who called.

        “” we find far more cogent remarks, along with the suggestion that author Bem should go for my Foundation’s million-dollar prize. Of course he should, but he won’t.”

    2. Randi is not the higher authority – he’s just the one whose foundation is offering the big bucks. In many cases claims are initially tested by scientists such as Richard Wiseman. Randi’s expertise is in looking at a situation and thinking “how can I reproduce this trick and then see if the subject is doing the trick using any of the schemes I can think of”. Randi can also design experiments to prevent the subject from cheating and he takes the advice of others with regards to things such as chances of success given random chance. In a situation like this one I’d imagine Randi’s greatest criticism would be that the experiment was not properly designed because the people making the observations are not independent of the people designing the experiment and evaluating the results – in short, the experiment is way open to wishful thinking. So if Randi were to bother to write a response to the paper, I would bet on suggestions for experiments with better controls; however I wouldn’t expect that Randi would be asked to be a reviewer.

  26. And as a quirky afterthought – What about where someone is affected the OPPOSITE way. The article mentions 3% as possible significant. What if someone under test conditions regularly got in excess of MINUS 3% below the expected?

    1. Well that’s obvious – you ignore the data! That’s the way it’s always been in PSI research. As James Randi had pointed out numerous times, you can’t rely on just what’s claimed in the papers – you need to ask questions about the experimental conditions (and don’t be surprised if the authors withhold information you, whether deliberate or not).

  27. Decades ago, for my sins, I evaluated a paper in the Journal of the American Psychical Society by a couple of guys who then worked for Bell Labs. The paper purported to show that humans could sense the state of a remote computer register (1 or 0). The evaluation showed multiple problems characteristic of ‘research’ in parapsychology. Among the flaws was a failure to consider the (statistical) power of the experiments to detect the differences purportedly obtained, the use of an optional stopping rule, and after-the-fact data mining, hunting around for significant comparisons.

    I haven’t read this paper yet and know nothing of the experimenter, but I am dubious. Results like those described are almost always evanescent, fading the the bright light of replication attempts.

    1. Gack. The first footnote did me in. The “Dean Radin” who is thanked for his help was one of the co-authors of the paper I mentioned above. I may have to have a cookie or something to get through this.

      1. Statistical power is pretty high, meaning that any systematic biasing variable could be picked up. IOW, if there is an artifactual bias somewhere in the procedure, it’s more likely to produce statistically significant results with a powerful experimental design.

  28. I find it interesting that you effectively state as fact that the real James Randi posted a public challenge to Dr. Bem in the comments string of the PT website.

    I would imagine that if the real Randi were serious about this, he’d at least mention SOMETHING about the publication of Dr. Bem’s paper on his website. Strangely, he hasn’t even mentioned it.

    Or is it possible that someone posted under the guise of the name James Randi, since it takes no effort to provide a name and website of whomever one wants? We (public outside of PT) can’t see the email, so we can’t double check.

    So much for skepticism, if you do not apply it to your own observations. It’s just more biased belief.

    Trust me, I’m Abraham Lincoln.

Leave a Reply