“The first principle is that you must not fool yourself — and you are the easiest person to fool.”
–Richard Feynman, “Cargo Cult Science”
The quote above is, to me, the best pithy characterization I know of the principles of scientific inquiry. Because we humans are ridden with confirmation bias—the tendency to try to affirm as true what we want to be true—science is structured to prevent such self-deception. Our culture of of replication, of peer review, and of pervasive doubt are all devices that have evolved over time to prevent scientists from fooling themselves. A famous example was the use of two completely independent teams of researchers to look for the Higgs Boson, researchers who didn’t know what each other had found until the very end of the experiments.
One of the best tools for scientific research is “blinding”: making sure that researchers, when making observations, are as far as possible kept in the dark about any information that could bias their results.
Here’s one example. When I was studying whether cuticular hydrocarbons in Drosophila (waxy substances that coat the fly’s body and prevent desiccation) can act as isolating barriers (different species have different hydrocarbons, and could potentially recognize each other as either same- or different-species mates since males “taste” females before mating), I transferred hydrocarbons among individuals by crowding them together in vials.By putting five females of one species into a vial with 50 females of a different but closely related species having different hydrocarbons, you can profoundly alter a female’s hydrocarbon profile, putting on her about half the “foreign” hydrocarbons from a different species. In other words, you can perfume females of a given species with hydrocarbons from females of another species that males don’t like to court or mate with.
After doing this perfuming, I then asked undergraduates to watch males courting both the perfumed females as well as control females belonging to the male’s own species (also crowded, but with members of their own species), and to record various aspects of male “courtship interest”, including circling, licking, wing extension, and attempted copulation (male curls abdomen under and jumps the female from behind). After each half-hour observation period, we took the female and measured her hydrocarbon profile using a gas chromatograph.
To ensure that any difference were due to hydrocarbons and not some behavioral change effected by crowding, we did the same thing with dead females that had been flash frozen in liquid nitrogen. That doesn’t change their hydrocarbons, and males (who aren’t particular about whom they court), readily court dead females, and even try to copulate with them.
It was crucial in these studies that the courtship observers didn’t know the identity of the target females, for that could have conditioned how they recorded or identified various behaviors. In other words, the observers were “blinded.” And the results we got were clear: the hydrocarbons that we predicted would turn off males—or turn them on when their own females’ hydrocarbons were put on foreign species— had a huge effect on male courtship in the predicted direction. The references are below, which include a nice paper in Science.
This kind of blinding is of course an important feature of medical studies. The gold standard for testing new drugs and therapies is the “double blind” study, in which neither doctor nor patient knows which treatment is being given. It’s common sense, really.
Sadly, many studies in ecology and evolutionary biology that could involve blinding protocols don’t. That, at least, is the conclusion of a paper in Frontiers in Ecology and Evolution published last May by Melissa Kardish and her colleagues (reference below; free download). They begin this very short (2.5-page) paper by noting the abysmal failure of many researchers to use blinding when appropriate and possible:
For example, a survey of kin-recognition studies—a cornerstone of animal behavior—found that 71% of studies testing for kin recognition in ants did not report the use of blind observation, and, more disconcerting, studies that did not report blind observation reported significantly greater effect sizes than studies that reported blinding (van Wilgenburg and Elgar, 2013). Likewise, herbivory of woody plants was rated nearly 500% higher with unblinded methods compared to blinded methods (Kozlov et al., 2014).
That’s disturbing. And it apparently disturbed Kardish and her colleagues, so they did a survey of nearly 500 papers in journals publishing work on ecology and evoljtionary biology (EEB), seeing if blind studies were used when it was possible to do so. The sad results (my emphasis):
- The authors surveyed 492 recent articles in 13 journals publishing EEB papers, including nine speciality journals and four general ones: Science, Nature, Proc. Nat. Acad. Sci. USA, Evolution, The American Naturalist, Animal Behavior, Proceedings of the Royal Society, etc. Articles were selected before they were examined.
- For every selected study, the authors judged whether or not the results could in principle have been affected by observer bias, and then whether its authors reported blinding methods in the “materials and methods” section.
- Of the 492 articles selected, 248 had results that could have been affected by observer bias.
- Of those 492, a pathetic 13.3% (33 articles) actually stated that they used blind observations. (Of course it’s possible that some studies didn’t report blind observations that were made, but most authors would mention that, at it’s a plus.
- Finally, the impact factor of the journal had no effect on whether or not blinding was used.
For the word-adverse, the authors also provide this superfluous figure (I’m not sure why, but I put it in for grins):
The upshot? Researchers and journals have a ways to go. I agree with Kardish et al. that researchers must report whether or not observations were blinded when possible, and that reviewers and editors demand that information. If studies could have been blinded but weren’t, those should be taken with a grain of salt. Remember, observer bias and confirmation bias are temptations for all of us, and for the good of science we need to institute procedures to prevent them.
I need hardly add that confirmation bias in religion works exactly the opposite way as in science: it’s actually encouraged. Miracles are not to be doubted or investigated carefully, the truth claims of religions other than yours aren’t investigated too carefully, but are dismissed outright, and believers are always looking hard for evidence to support the tenets of their faith.
_______
Kardish, M. R., et al. (2015). Blind trust in unblinded observation in Ecology, Evolution and Behavior. Frontiers in Ecology and Evolution 3, http://dx.doi.org/10.3389/fevo.2015.00051
Coyne, J. and B. Charlesworth (1997). Genetics of a pheromonal difference affecting sexual isolation between Drosophila mauritiana and D. sechellia. Genetics 145: 1015-1030.
Coyne, J., et al. (1994). Genetics of a pheromonal difference contributing to reproductive isolation in Drosophila. Science 265: 1461-1464.
.

sub
Perhaps wide dissemination of this article will substantially increase the frequency of blinding (especially since it is usually easy to implement), in the same way that Hurlbert’s article back in 1984 appeared to greatly decrease the frequency of pseudo-replication in ecological studies.
I work in clinical trials where “blinding” of patients and research staff is common. However we tend to use the term “masking” because we deal with eye patients!
Hilarious and sweet at the same time!
In the newer volume of his autobiography, Richard Dawkins mentions as a young man going to a alternate medicine practitioner who was obviously a quack, and who denigrated double-blind testing. Dawkins says “I don’t know who invented the double-blind experiment, but he was a genius”.
During the Q&A session after his reading from his book at Kepler’s books in Menlo Park (Sunday morning, October 4th 2015), I misspoke and misinformed Dawkins that the first double-blind experiment was tests done by Benjamin Franklin and Antoine Lavoisier to test animal magnetism. RD thanked me profusely for pointing this out.
This turns out to be the first single blind experiment, with double blind experiments coming to the fore in the 20th person in 1907 thanks to W. H. R. Rivers and H. N. Webber in the investigation of the effects of caffeine. (Not as broadly renowned as Ben Franklin and Lavoisier, I’m afraid.) RD gets so much e-mail I haven’t really gone to the trouble of contacting him to report my error, but I hope he looks at this column now and then (or has discovered the truth on his own.)
Interesting, I always mention blinding when lecturing on experimental design, but have never bothered to check the history of the idea. I’ll be sure to mention Franklin next time there are any USers in the class.
Thanks for the (hi)story!
Re: I need hardly add that confirmation bias in religion works exactly the opposite way as in science: it’s actually encouraged. Miracles are not to be doubted or investigated carefully…
This is not quite true in Roman Catholicism which attempts after a fashion to carefully investigate miracles that would either entail the canonization of a saint, or a message from a heavenly being, usually the Virgin Mary. Live Science reports on the basic process here
(http://www.livescience.com/38033-how-vatican-identifies-miracles.html)
while Nordic Science notes criticism of the methodology used here (http://sciencenordic.com/pope%E2%80%99s-scientists-study-miracles)
The process of investigating whether miracles attributed to a potential saint were parodied nicely by Saturday Night Live’s Don Novello aka Father Guido Sarducci in a skit in which he complained that to get their first American saint, the Vatican lowered the number of required verifiable miracles from four to three, and Sarducci (allegedly the “gossip columnist” for the Vatican Newspaper) said to the audience “Confidentially, I heard that two of them were card tricks.”
Well, those attempts to “investigate” miracles are pretty lame: read Christopher Hitchen’s account of how he played the OFFICIAL Devil’s Advocate when the Vatican was beatifying Mother Teresa.
The biological miracles are pretty lame, but physical miracles have a much stricter significance standard, 5 stigmata.
One would have thought blinding would be instinctively used in all appropriate cases. I’d suggest that publishers and reviewers establish a simple checklist for submissions that indicate what kinds of observational techniques were used. This might just encourage researchers to do the right thing.
In the long term does it matter? Careers are rarely defined by one paper, and if they are that paper typically is held to higher scrutiny. For one, the result is most often reproduced by other researchers without too much effort and even more important other build on the result, not just reproduce it.
If a paper smashes any boundaries with majestic claims, I can assure you it will be tethered to an anchor ready to be dropped to the bottom of the sea if not all of the i’s and t’s are dotted and crossed. If a paper makes a subtle claim about a rather esoteric detail, few if any will make an effort to find the errors.
Full Cargo Cult essay can be found here:
http://barefootbum.blogspot.com/2010/03/cargo-cult-science.html
In physics, chemistry or molecular biology, most papers include descriptions of universal or widely applicable mechanisms or processes. As subsequent researchers seek to build upon this knowledge, they will naturally replicate prior experimental conditions precisely and verify (or challenge) the prior findings.
In ecology and psychology, this is often not true. In ecology, findings about a particular population may be treated as a data point in building a picture of universal principles – but that particular population may never be studied again. In psychology, it’s rarely true that research about one aspect of human behavior automatically entails verification of prior knowledge about another aspect of human behavior. So reproduction of results in these fields does not usually occur unless somebody sets out to do it explicitly (as they did recently in psychology, with disturbing results).
It is true in clinical trials that double blind placebo controlled studies are the most robust (at least in principle) but I just want to point out that sometimes it is neither ethical nor even possible to do it.
For example, some trials I have been involved in (as a CRM) tested chemotherapeutics to treat multiple myeloma and other neoplasms. There was no way to blind this – the side effects of the chemo are obvious so one can’t do a placebo that has no effects; this would “unblind” the study. In addition, the intense side effects of the chemo (nausea, fatigue, neuralgia, hair loss, etc) are difficult to manage for the patients and it would be unethical to give a placebo that generates them, especially in trials with people who are already very ill.
But blinding (and placebo control) are the “gold standard” in clinical trial it’s just sometimes not possible.
I sometimes try to make a feeble joke along these lines, e.g. its tricky to blind the study when the treatment is cutting someone’s leg off, or its hard to double-blind surgery so that the surgeon doesn’t know whether he’s cutting anything out or not.
I agree that it’s not always practical, but I also think this should be made more explicit in the papers. What worries me is not that a lot of papers didn’t blind – there were probably obstacles they encountered trying to set up the experiment – but that so few of them felt the need to even mention this handicap. It reveals an uncomfortable lack of self-awareness in the field.
I just happened to be reading some on a similar standard for defending religion. John Loftus explains the Outsiders Test for Faith (OTF). The only way to rationally test one’s culturally adopted religious faith is from the perspective of an outsider, a nonbeliever, with the same level of reasonable skepticism believers already use when examining the other religious faiths they reject.
This means a person like myself would be the appropriate one to defend fairly, any particular faith. This must be true since I am an atheist and have never been part of any religious organization.
Oy. That is the kind of result that makes you go blind.
I recall an experiment a few years ago, involving sniffer dogs. The Handlers were somewhat upset that they were not told in advance, which boxes contained the target material. Thankfully this is no longer the case and blind trials are simply blind trials. Dogs can cover a wide area very quickly and produce excellent results. Trouble is, they get bored within about 20 minutes. Just like their masters. T.
I find it strange that the handlers wouldn’t have confidence in their dog’s abilities. If the handlers knew which boxes had the target material, would they somehow signal the dog? Either way, this is another good anecdote on the importance of blind testing.
The best way to blind these trials is to replace the vowels with asterisks, so that nobody knows what species is being tested.
“Facilitated Communication” is a classic example of the power of double blind testing, and the power of self deception. It’s a process by which “facilitators” are purported to assist people with severe communication disabilities by touching them or holding their hand and then typing messages on their behalf. It is trivially easy to test in a blinded study (reveal information to the disabled subject, but not the facilitator), and it has been thoroughly debunked.
https://en.wikipedia.org/wiki/Facilitated_communication
Nevertheless, proponents STILL insist on its validity — “Proponents claim that testing is demeaning to the disabled person, that the testing environment creates performance anxiety, or that those being facilitated may purposely produce nonsense, refuse to respond or give wrong answers to counteract the negative attitudes of those who are skeptical of the technique.”
I think it’s notable that in most cases these false effects are not con tricks. The facilitators are sincere, and do not realize what they are doing (like the “Clever Hans Effect”).
James Randi has tested, for example, water dowsers in his “million dollar challenge” and other variants, and he reports that the majority appear to believe sincerely in what is happening. They are not (all) fraudsters – most are not consciously aware that their own muscles are responsible for moving the stick. Many seem to welcome the scientific test, and are genuinely surprised that dowsing does not work under blinded and controlled conditions.
I’ve seen a sincere relative do water dowsing. I hypothesize that using the apparently independent dowsing rod may allow the person to use faint memories or subtle topographic clues to find the target.
(Plus, of course, if there’s reasonably accessible water under 60% of an area, random efforts will be successful 60% of the time.)
I’ve run into this problem. If I’m taking measurements to compare to putative species, am I really measuring representative fruits, for example, or taking only the biggest ones because I know this plant should have bigger ones? Just knowing about the trying to compensate for the problem doesn’t solve it.
At times it’s been possible to choose, say, fruits by rolling dice or by using a random number generator to choose from a list, but mostly not. Using student workers can get rid of the problem of knowing what the answer should be, but (1) my access to student workers is limited and (2) that introduces new problems like measuring aborted fruits because of not realizing they’re aborted.
The good news is, if the differences are large they’ll show up despite the bias. The bad news ie, if the differences were that clear, I probably wouldn’t be doing the study.
Sorry. Reading over my two recent comments reveals that I can’t type accurately today.
Yes, it could be said that religion has been perpetual self ‘blinding’ with acute bias with nothing to do with fact and more to do with delusional self preservation.
As for the post, the last thing needed is science to be dropping the ball, self preservation is not exclusive to religion for sure and just as well there are checks as stated above, reassuring for non science bods like me.
sub
Richard Feynman also referred to his familiar “you must not fool yourself” principle in his Appendix F – Personal observations on the reliability of the Shuttle, which was part of his eye-opening report on the Space Shuttle Challenger disaster. The word “fool” appears four times in that document.
Thanks for the link. Interesting read.