The “decline effect”: can we demonstrate anything in science?

December 10, 2010 • 7:58 am

The Dec. 13 issue of The New Yorker has a provocative article about scientific “truth” by Jonah Lehrer, “The truth wears off: is there something wrong with the scientific method?” (You’ll need a subscription to read the whole thing.)  It’s about what Lehrer calls the “decline effect”: the fact that an initial demonstration of something in science tends to be weakened or even disappear when later workers try to replicate it.

Lehrer gives lots of examples—including the bogus demonstrations of ESP by J.B. Rhine at Duke—but concentrates on more recent studies.  One is from evolutionary biology: the work of Anders Møller on fluctuating asymmetry (FA) in barn swallows.  FA is the phenomenon in which a trait of an individual is asymmetrical in random ways (e.g. the right side may be larger or smaller than the left).  In the case of barn swallows, FA estimates the difference in length of the long feathers in their forked tails.  In 1991, Møller showed that females prefer to mate with males having more symmetrical tails (less FA), presumably because FA was an index of the genetic quality of a male (more FA, worse genes).

Initial studies of other species showed a similar negative relationship between FA and fitness, but then the effect began to decline: by 1998, fewer studies of FA showed positive effects, and the effects that were demonstrated became smaller.

Lehrer cites another study of many results in ecology and evolution:

In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evoutionary biology [see reference below]. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical synthesis of related studies), and discovered a consistent decline effect over time as many of the theories seemed to fade into irrelevance. . . . Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”

What causes these “declines”?  Lehrer suggests that it’s a combination of two things.  The first is publication bias: an initial enthusiasm for publishing only positive results.  (I’d add that this bias could swing toward publishing negative results after an initial discovery becomes something of a dogma: researchers are then motivated to disprove it.)  The second is “selective reporting”: not overt fraud, but what Lehrer characterizes as “one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results.  Steven Jay Gould referred to this as the ‘shoehorning’ process.'”

This, then, is not (as Lehrer’s title implies) an indictment of the scientific method of testing and replication per se, but of scientists and the culture of science.  Lehrer winds up criticizing the “slipperiness of empiricism,”  suggesting that much of what we “know”—even about things like the strength of gravity and the weak coupling ratio of neutrons—may simply be wrong.  His ending is provocative:

The decline effect is troubling because it reminds us how difficult it is to prove anything.  We like to pretend that our experiments define the truth for us.  But that’s often not the case.  Just because an idea is true doesn’t mean it can be proved.  And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.

I tend to agree with Lehrer about studies in my own field of evolutionary biology.  Almost no findings are replicated, there’s a premium on publishing positive results, and, unlike some other areas, findings in evolutionary biology don’t necessarily build on each other: workers usually don’t have to repeat other people’s work as a basis for their own.  (I’m speaking here mostly of experimental work, not things like studies of transitional fossils.)  Ditto for ecology. Yet that doesn’t mean that everything is arbitrary.  I’m pretty sure, for instance, that the reason why male interspecific hybrids in Drosophila are sterile while females aren’t (“Haldane’s rule”) reflects genes whose effects on hybrid sterility are recessive.  That’s been demonstrated by several workers.  And I’m even more sure that humans are more closely related to chimps than to orangutans.  Nevertheless, when a single new finding appears, I often find myself wondering if it would stand up if somebody repeated the study, or did it in another species.

But let’s not throw out the baby with the bathwater.  In many fields, especially physics, chemistry, and molecular biology, workers regularly repeat the results of others, since progress in their own work demands it.  The material basis of heredity, for example, is DNA, a double helix whose sequence of nucleotide bases codes (in a triplet code) for proteins.  We’re beginning to learn the intricate ways that genes are regulated in organisms.   The material basis of heredity and development is not something we “choose” to believe: it’s something that’s been forced on us by repeated findings of many scientists.  This is true for physics and chemistry as well, despite Lehrer’s suggestion that “the law of gravity hasn’t always been perfect at predicting real-world phenomena.”

Lehrer, like Gould in his book The Mismeasure of Man, has done a service by pointing out that scientists are humans after all, and that their drive for reputation—and other nonscientific issues—can affect what they produce or perceive as “truth.” But it’s a mistake to imply that all scientific truth is simply a choice among explanations that aren’t very well supported.  We must remember that scientific “truth” means “the best provisional explanation, but one so compelling that you’d have to be a fool not to accept it.”  Truth, then, while always provisional, is not necessarily evanescent.  To the degree that Lehrer implies otherwise, his article is deeply damaging to science.

_______

UPDATESOver at Wired, Lehrer explains his thesis a bit more and responds to readers’ questions.

I believe the reference to the “2001” work by Jennions is actually this two-authored paper published in 2002:  Jennions, M. D. and A. P. Møller. 2002.  Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Proc. Roy. Soc. B. 269:43-48.

74 thoughts on “The “decline effect”: can we demonstrate anything in science?

  1. It would seem to me that there’s a definite role for students to play, here.

    Dissertations, of course, need to be new and original research. But attempting to replicate recently-published findings would seem to me to be an excellent way of preparing for a dissertation.

    Cheers,

    b&

    1. not uncommonly, this is indeed the basis for many Masters theses, and has been for at least as long as I’ve been involved in academia (about 25 years).

      PhD theses, OTOH, are typically supposed to somehow generate new contributions to our understanding.

      one is not more valuable than the other IMO, and I’ve seen many Masters theses that started out as “replication projects” that ended up contributing far more to our understanding than the average PhD project.

      I would also like to agree with Jerry that the take-home lesson from Lehrer should NOT be that “science is untenable and fickle, so believe whatever you want”, but that we, as scientists, must continually strive towards tearing down cultural constructs (like only publishing positive results) that actually interfere with the process of the scientific method itself.

  2. I don’t think “the truth” is a scientific concept; it is in fact a religious one. All scientific theories are amenable to changes. Neutrinos had no mass according to the standard model of particles and fields. Now experiments indicate that neutrinos have mass. This is science, if we did not have to continuously refine and reformulate our theories, science would stop. The truth is a concept indicating something absolute, definitive. In science nothing is definitive. Even Newton’s inverse-square law of gravitation, long viewed as immutable, might be wrong for long or very short distances.

    1. I don’t think “the truth” is a scientific concept; it is in fact a religious one.

      Is that the truth?

      The truth is a concept indicating something absolute, definitive. In science nothing is definitive.

      Are you stating that definitively?

        1. boy,you guys are really pointing out the problem with science popularization.If I was to bring this discussion to a group of carpenters on lunch break,I would get laughed off the job site.and you wonder why 40% of americans dont accept evolution.keep it up geniuses

          1. Well, yes. The population gets bombarded with propaganda from the fundamentalists: science is a kind of religion, science is based on “belief,” science is a “social construct,”–this even went on in academic circles until the physicist Alan Sokal exposed this nonsense. The newest fad comes from the Magis Center for Faith and Reason, with their claim that there is a scientific proof of the existence of a god.

  3. And just because an idea can be proved doesn’t mean it’s true

    Wait wait wait wait wait.

    Wait.

    …Then what the hell are you doing when you prove something, if not stating that the conclusion is true given the premises??

  4. I predict this will be showing up in creationist literature within a year:

    I tend to agree with Lehrer about studies in my own field of evolutionary biology. Almost no findings are replicated, there’s a premium on publishing positive results, and, unlike some other areas, findings in evolutionary biology don’t necessarily build on each other: workers don’t necessarily have to repeat other people’s work as a basis of their own.

    1. Thought the same upon reading it. Jerry, do you mean that studies are not typically replicated or cannot be replicated? I think you mean the former.
      Also do you mean that studies don’t need to build on each other?

      Just asking.

      1. Not typically replicated. And there’s no payoff for replicating them, either, since (answering your second question), further work doesn’t first demand that earlier work be replicated. Why would anybody repeat a tedious and expensive study done, say, in one species of bird or lizard?

        1. So, this is a major problem. When I’m reviewing papers and reading the guidelines for how reviewers make recommendations to editors, their is always something about “novelty” (except for PLoS One). I think this is bad. But it is the way the system is set up. I’m not sure how to really change it.

          Moreover, studies that are replications of other studies (like occasionally happens even in evolution, when a replication studies gets submitted before the first study is published) are often viewed as inferior from the perspective of incentives. The incentives are tilted such that is better to be the first even if the quality suffers than to be second with a more rigorous treatment. To say nothing about an intentional replication. That is frequently viewed by granting agencies, editors, reviewers, peers, conference organizers, graduate students, etc. as a waste of time. If one must replicate, it typically must be in the course of doing additional work or challenging a very very provocative finding.

          I wish the incentives encouraged at least a little replication. Not too much mind you, as we don’t need people making careers off of only replicating studies. But some premeditated cross-validation is always good for science.

    2. No, this will:

      “Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.”

    3. We can’t ignore the fact that the scientific literature is littered with false results just because creationists might use it against us. The high-order results are not in question—as Jerry says, we will never discover that we are more closely related to orangutans than chimps. But many of the low-order details in the published literature really are false. The Ioannidis paper cited below gives a simple model of the publication process to demonstrate how that can happen even if everyone involved is being completely honest. It’s just statistically unavoidable.

      1. We can’t ignore the fact that the scientific literature is littered with false results just because creationists might use it against us.

        Oh gosh, no: scientists and science writers should carry on doing their jobs. I’m just exercising my cynicism about creationists.

        1. I went around with him and Larry Farfarman for nearly a year at Panda’s Thumb… What a couple of douche nozzles…

  5. It is important to distinguish “the scientific method” – an idealized system that we aspire to use as scientists, from the “scientific environment” which is the situation working scientists must confront if they want to actually get money to pay the rent and buy food. There is a big difference between them. The scientific method results in an increase in knowledge mainly through figuring out which ideas are incorrect. In contrast, figuring out which ideas are (provisionally) correct is how scientific careers are made. In the end, despite this apparent conflict of interests, science does lead to an increase in knowledge (although that doesn’t mean there is still a lot of room for improvement – for instance reforming the peer review process).

  6. There’s a big grain of truth here. Publication bias (and other biases) are serious and need to be taken into account when assessing the evidence for claims. As Jerry says, there’s a difference between the scientific method and the culture of the scientists to work at it. (The incentives of the workers may not be the same the readers/users of science.)

    But mostly I think people (public, press, even many scientists) need to realise there’s a huge difference between papers published at the ‘cutting edge’ of science (I prefer to call it the rough edge) and solidly established results and that be found in repeated experiments, re-analysed by many teams, etc…

    There’s a lovely paper by John Ioannidis of “Why most published research findings are false” which I think more people should read. The conclusion (in the title) is virtually inevitable given noisy data and the number of hypotheses and tests being examined around the globe. But it doesn’t mean the method is no good. It just means new research papers should be only the start of the process – to be followed by critical analysis and replication by independent (or differently biased!) peers.

    The news media should therefore hold back before publishing stories based on a single new result in a single paper, or risk publicising wrong results almost all the time. But the incentives within the news media world are different again.

      1. Iionnadis’ conclusions bolster my personal distaste for the declarative titles that became hip in ecology/evolution/behavior 20 years ago, e.g., “The red-tailed salamander recognizes second cousins.”

  7. That this occurs isn’t an indictment of the scientific method. Quite the reverse in fact.

    This is *why* we have science.

  8. I’m confused a bit. He is using the scientific literature to show that many of the original reported effects were not as strong or even disappear in later studies. How is this possible if the later studies are not meaningful in some way? Science is self-correcting and many if not most ideas eventually are dropped. It’s good to be reminded of this, but I just don’t see the big newsflash here.

    1. umkomasia,

      You aren’t confused, but spot on. It is only because of the utility of the scientific method for getting at the truth of the world that we see the effect Lehrer is talking about. Lehrer whiffs on this, just as creationists who bring up Piltdown Man whiff on the fact that it wasn’t the application of creationism (or the sort of blind skepticism Lehrer seems to embrace) that demonstrated Piltdown was a forgery but it was the application of the methods of science that revealed it.

      To paraphrase Churchill, it has been said the scientific method is the worst way of figuring out how our world works except all those other ways that have been tried from time to time

    2. Does anyone else suspect that this putative “decline affect” will not be able to be observed by other researchers and will itself be affected by the “decline affect”?

  9. I’ve heard several times that the original mission of the Nobel Institute was to replicate experiments of potential Nobel Prizewinners. That mission didn’t last long, but there, besides the impracticality, they probably rapidly concluded that by the time a given work was under consideration it had already been well established by other work that built on it.

  10. Although I agree that the discussion in professional circles is healthy, here’s where it does damage:

    In this morning’s local newspaper, there is a letter to the editor declaring that both religion and science require a “leap of faith”. Implying, of course, that religion’s “different way of knowing” is just as valid as well-established scientific principles.

    This article is helping to enable the proudly ignorant to stay that way.

    Does your religion declare the sun revolves around the Earth? That the Earth is flat and immovable? That each and every species on the planet was poofed into existence whole by a magic incantation? That’s OK — it’s just a different way of knowing. Science is wrong a lot.

    Damage done. Genie out of the bottle. No way to stuff it back in.

  11. I’m sure since Gravity and Evolution were demonstrated the effects have waned and now we don’t even recognize them as being true anymore. However, god is still out there – he has never been demonstrated and therefore he is as potent now as he ever was.

    The effect would be more notable in some cases than others and it’s not unreasonable to presume that data are selected to give the clearest impression. In most cases where experiments were properly crafted, the results are reproducible and over time the improvement in techniques and instrumentation may necessitate adjustments to some parameters. In half-assed experiments or frauds, competent experimentation will show that the effect is non-existent or not anywhere near as strong as claimed. In the physical sciences experiments are reliably repeated (even the notorious sub-particle work) – for example we have gravity, the Zeeman Effect, the Photoelectric Effect, Hertz effect, X-ray secondary electron emission, speed of light – these are only a very small number of phenomena which have been described and repeatedly observed, with only gradual improvements to the estimation of parameters as technology advances. Things become a bit trickier with evolution because, despite Ken Miller’s claims – evolutionary outcomes are not preordained by CeilingCat. So you may repeat an experiment and find that although you observed evolution yet again, your original stock did not evolve the same traits as in the previous experiment. Or you can run concurrent experiments and see that even though evolution is observed in all cases, each set of beasties evolved differently. In other fields things can get extremely tricky to demonstrate, and quite often people who claim to know things really don’t know Jack.

    1. However, god is still out there – he has never been demonstrated and therefore he is as potent now as he ever was.

      So what you’re saying is that theology is the homeopathic version of science — the less proof (in either the epistemologist’s or distiller’s senses), the more potent!

      😉

  12. Today’s big problems in Science are both, publication bias and submission bias. As stated “Almost no findings are replicated, there’s a premium on publishing positive results” but this trend is fortunately changing. I’m collaborating with The All Results Journals, a set of journals that focus on publishing scientific negative results. Check our lastest issues at:

    http://www.arjournals.com/ojs/index.php?journal=Chem&page=issue&op=current

    http://www.arjournals.com/ojs/index.php?journal=Biol&page=issue&op=current

  13. I was glad to see the issue of replication getting more press, but I found Lehrer’s article unsatisfying, since it didn’t really address *why* there are so many false positives beyond some hand-waving. And he seemed to conclude that there is nothing to be done about it, which is both defeatist and not true.

    I tried to sketch out in more detail the factors leading to the high false positive rate and what is required to do something about it over at my blog.

  14. Ugh. Lousy reporting. The article, on the surface, implies that natural laws have a limited shelf-life once discovered, and that this is somehow an inherent property of the universe. Although anyone with some scientific acumen can realize that he’s talking about a perceived effect based on data reporting, rather than an intrinsic property of nature, most casual readers would read “Scientists just proved that gravity don’t work as good anymore, an’ purty soon it’ll probably just quit workin’. Just like I said, everything was better back in my day.”

  15. Yes, the creationists will try to make use of this. We should then point out that the preferred religious method of handling this problem is to declare the initial demonstration to be dogma and to burn at the stake anyone who even suggests that replication is necessary. Let them explain how that produces better results.

  16. Interesting — although one has to wonder if the “decline effect” will itself show a decline effect. And: if it does, does this support or undermine the hypothesis?

  17. In Lehrer’s Wired post he responds to a question about whether global warming denialism is justified based as a result of reasoning behind his NYer article, arguing that there is basically a glut of papers from many fields supporting AGW, therefore denialism is unjustified.

    However, in his NYer article he says, as Jerry pointed out

    the law of gravity hasn’t always been perfect at predicting real-world phenomena.

    So what holds true for global climate change doesn’t for gravity? How can he be so sure, based on his own reasoning, that the models on which predictions about climate change are based are accurate?

    I can’t figure out how he reasoned his way around this.

    1. “the law of gravity hasn’t always been perfect at predicting real-world phenomena.”

      This quote shows clearly that Lehrer’s grasp of modern science is severely flawed: The fact that Newton’s law of gravity apparently cannot describe the rotation of certain galaxies is one of the most important arguments for the existence of dark matter. How can you discuss the scientific method if you don’t understand how science works?

      1. It’s all rather silly. Lehrer is saying lots of science is flawed. And then he stops and submits his piece for publication.

        Well, how do we know lots of science is flawed?

        Because of more and better science!

        I read stuff like this in the New Yorker and think, “Wow, that’s one half-baked idea!”

        Then I read other stuff in the New Yorker on which I know very little and think. “Wow, cool insight!”

    2. Newton’s predictions from his theory of gravitation were incredibly good. In most cases it was trivial to show how something conformed to Newton’s laws. When you get to global warming things are not so simple. We know there must be more energy being stored in the earth system due to CO2 alone, but we currently cannot predict how that energy will be distributed and what it’s effects will be (other than ultimately warming the surface); no one even has a good handle on the extent of the water vapor positive feedback. So basically the global warming due to increased CO2 is real, but if you meet a climate modeler just nod and say “that’s nice, but tell me more when you can make substantial verifiable predictions”.

  18. “Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.”

    We should substitute “is well evidenced” for “can be proved” in the above statements. “Prove” is often utilized as a weasel word, which is the case here. We rarely prove an idea, instead we evidence an idea.

    Choosing to not follow the overall weight and direction of evidence because following the overall weight and direction of the evidence can’t absolutely guarantee the correct conclusion every time by some impossible to achieve prefect measure of what qualifies as a correct conclusion is a fool’s choice. That there is a positive correlation between conclusions being strongly supported by the overall available evidences and conclusions that it is useful to treat as factually true is clear. That there is a negative correlation between conclusions being strongly opposed by the overall available evidences and conclusions that it is useful to treat as factually true is also clear. Nothing more is needed to justify empiricism.

  19. Somebody email this guy the Sagan quote about how some people believe the earth is flat, and some that it is a perfect sphere, but the person who believes that both groups are equally mistaken is more wrong than both.

    Yes, science is self-correcting. That some results are modified or discarded due to subsequent work is a feature, not a bug. It’s only when people turn certainty into a fetish that correcting mistakes can be seen as a bad thing.

  20. This shows the difference between Science and Scientism. We are all subject to prejudices and beliefs. Scientism is the belief system not the search for truth. This author calls 60 years of ESP research by Rhine as bogus. This is nonsense. Rhine has proved ESP exists. This is heretical to the true believers of Scientism. But now Lehrer has indicated science is built on less sold ground that scientists would like to believe.

  21. This reminds me of my mother-in-law and the butter margarine debate. She says first they told you that butter was bad for you and you should use marge, now they tell you marge is bad for you and you should use butter. She concludes that the scientists don’t know. Of course, if you look at their reasons, the scientists are not contradicting themselves. It only looks like a contradiction if you confuse the general with the specific. Butter was first discouraged and marge encouraged because of concerns about cholesterol, saturated fats, etc but later it was the reverse because marge came to be seen as a toxic laboratory concoction with other harmful effects. But one has to ask what the rejection is about. Is it about butter or marge (the general), or is it about cholesterol or trans fats (the specifics)?

    So my question is how much of the decline effect can be written off to the confusion of the general and the specific? I’m not a scientist but I imagine that even a small change in the specifics may lead to an *apparently* different generalisation. For example, doing the same swallow test at a different time and in a different place may produce different results regarding swallows’ choices in tail symmetry but that need not contradict the generalisation that swallows choose mates with better genes.

  22. I recommend Pure science Wiki. It is a free Internet platform for following the scientific method without being disrupted by the antiscientific obsession with status and prestige of academia.

Leave a Reply to simone Cancel reply

Your email address will not be published. Required fields are marked *