Will Provine died

September 3, 2015 • 12:00 pm

I’m saddened to report that historian of science and population geneticist Will Provine, a professor at Cornell, died on September 1 at 73.  His wife has posted an unbearably sad memoriam on her Facebook page, and Casey Bergman, one of our Chicago Ph.D. students and now a professor at Manchester, reported the news on his website An Assembly of Fragments.

Will was a student of my own Ph.D. advisor, Dick Lewontin—Dick’s first student who was a historian rather than a working scientist. (Dick went on to work with and mentor many other students of the history and philosophy of science.) Will’s Ph.D. thesis became a short book, The Origins of Theoretical Population Genetics (1971), that was (and remains) essential reading for students in population genetics.

Will was a delightful guy, but those who knew him quickly learned that he pulled no punches. He was, as they say, “strident”: strident about creationism and intelligent design, which he detested, strident about religion (he was a diehard atheist, although, as I recall, his father was a preacher), and strident in his later-life opposition to genetic drift, which he viewed—erroneously, I think—as a misguided concept. Religionists often quoted with disdain his remark about the incompatibility of science and religion, “You have to check your brains at the church-house door if you take modern evolutionary biology seriously.”

But opinionated as he was, he was a pleasure to talk to, ever friendly and helpful. As Casey wrote on his site:

I’m moved by his death to recall my experience of having Provine as a lecturer during my undergrad days at Cornell 20 years ago, where his dramatic and entertaining style drew me fully into evolutionary biology, both as a philosophy and as a profession. I can’t say I knew Provine well, but I can say our interactions left a deep impression on me.  He was an incredibly kind and engaging, pulling you onto what he called the “slippery slope” where religious belief must yield to rationalism.

And that is my impression, too.

A long time ago Will developed brain cancer—a glioma, as I recall, which is a deadly form of the disease. He spoke openly about it and gave the impression that he didn’t have long to live. But he beat the odds, and must have survived for at least 15 or 20 years after diagnosis.

I remember that when we held a retirement symposium for Dick Lewontin at Harvard, Will gave the opening talk, and was wearing on each side of his head a metal disk with a target on it—a target for the radiation therapy aimed at his tumor. At that time we thought he would die soon, and that, combined with his deeply moving tribute to Lewontin, brought many of us to tears. It was the only time in my life that I saw Dick in tears as well: he had to put his head in his hands.

But it is a great mercy that Will lived so long after that talk—which was years ago—and remained active to the last. I, and many others, will miss him.

William-Provine
                                           Will Provine (1943-2015)

On the poor reproducibility of psychology studies

September 3, 2015 • 10:45 am

I wrote a short post yesterday about a huge attempt to answer the question, “What proportion of results reported in psychology journals can be repeated?” This was a massive study in which dozens of psychology researchers simply went and repeated 100 studies published in three respectable experimental psychology journals: Psychological Science, Journal of Personality and Social Psychology, and Journal of Experimental Psychology: Learning, Memory, and Cognition.  The full paper, along with a one page summary, is published in Science (see reference and free download below); the authors call themselves the “Open Science Collaboration” (OSC). There’s also a summary piece in the New York Times, a sub-article highlighting three famous studies that couldn’t be repeated by OSC (including one on free will, which I wrote about yesterday), and a newer op-ed in the Times arguing that this failure to replicate doesn’t constitute a scientific crisis, but simply shows science behaving as it should: always scrutinizing whether published results are reliable.

Even before this paper was published, I argued that people should do in biology what these folks did in psychology: test experimental results that are impressive but rarely repeated. In psychology, as in evolutionary biology and ecology, significant findings aren’t often repeated, for doing so takes hard-to-come-by money and a concerted effort— an effort that isn’t rewarded. (You don’t get much naches or professional advancement by simply repeating someone else’s work.) Further, in biology (and presumably in experimental psychology), work isn’t often repeated as the normal by-product of building on previous results. For example, if you want to use new gene-replacement methods, you are obliged to indirectly replicate other people’s protocols before you can begin to insert your own favorite gene.

It’s thus been my contention that about half of published studies in my own field (I include ecology along with evolution) would probably not yield the same results if they were replicated. I’m excluding those studies that use genetics, as genetic work is easily repeated, particularly if it involves sequencing DNA.

Failures to repeat a published result don’t mean that the experimenters cheated, or even that the work was faulty. They could mean, for instance, that the results are peculiar to a particular location, time, or experimental setup, or that there’s a publication bias towards impressive results, so only the ones whose results are highly statistically significant get published. Finally, given the conventional probability ceiling of 0.05, 5% of all experiments will yield a significant deviation from chance (thus falsifying the null hypothesis), even when that null hypothesis is true.

On to the experiment. The OSC decided to finally test reproducibility in a quite rigorous way. A whole group of people agreed to test a passel of papers taken from three journals, winding up with 100 replicated experiments. To enforce rigor, they chose papers from only prominent journals (they wound up with exactly 100 replicates), replicated only the last study in each paper (so that they weren’t just replicating preliminary results, which are often reported first), and then did each replication, as far as they could, in an identical way as the initial study—with the exception that sometimes they had higher sample sizes, giving them even greater power to detect effects.

To the credit of the original authors, they provided the OSC team with complete data and details of their experiments, ensuring that the replications were as close as possible in design to the original results. There were many other controls as well, including the use of statisticians to independently replicate the probability values for the replication experiments.

All the original studies had results that were statistically significant, with p values (i.e., the chance of getting the observed effect as a mere statistical outlier when there was no real effect) below 5% (a few were just a tad higher). When the chance of getting a false positive is 0.05 or less, researchers generally consider the result “statistically significant,” which is a key to getting your paper published. That cutoff, of course, is arbitrary, and is lower in areas like physics, which, for experiments like detecting the Higgs boson, drops to 0.00001.

So what happened when those 100 psychology studies were replicated? The upshot was that most of the significant results became nonsignificant, and the effects that were found, even if nonsignificant, dropped to about half the size of effects reported in the original papers. Here are the salient results:

  • Only 35 of the original 100 experiments produced statistically significant results upon replication (62 did not, and three were excluded). In other words, under replication with near-identical conditions and often larger samples, only 35% of the original findings were judged significant.
  • That said, many (but not nearly all) of the results were in the same direction as those seen in the original studies, but weren’t large enough to achieve statistical significance. If the replications had been the original papers, most of them probably wouldn’t have been published.

Here’s a chart showing the correlation between the p values for the original papers and those for the replicates. Each dot plots the size of the effect seen in the replicate (Y axis) against the effect size for the same study in the original paper (X axis).  If a dot is green, the replicate was also statistically significant (as were all effects in the original study). Pink dots mean that the replicate study did not yield statistically significant results. This shows that effect sizes were generally lower than those of the original studies (most points fall below the diagonal line), and most of the replicates (62%, to be precise) did not show significant effects.

F1.large
(From the paper): Original study effect size versus replication effect size (correlation coefficients). Diagonal line represents replication effect size equal to original effect size. Dotted line represents replication effect size of 0. Points below the dotted line were effects in the opposite direction of the original. Density plots are separated by significant (blue) and nonsignificant (red) effects.

The chart also shows that the larger the effects observed in the original study, the more likely they were to replicate, for the pink dots are clustered on the left side of the graph, where the original effect sizes (normalized) are small. This goes along with the investigators’ findings that the lower the p value seen in the original experiment, and thus the more significant the result, the more likely it was to also be significant in the replicate.

  • While most of the results of replications were in the same direction as the original study, there were an appreciable number (I count about 20%) that were close to showing either the opposite direction or no effect at all. And remember, even if there is no real biological effect in the original study, half of the replications will, by chance alone, be in the same direction as in the original study.
  • The OSC team also asked each team doing a replication whether they considered that their results actually replicated that of the orignal paper. This assessment was subjective, but mirrored the results based on p-value significance: only 39% of investigators concluded that their results replicated those of the original study.
  • Finally, it’s possible that many of the p values in replications came close to the magic p = 0.05 cutoff point, which of course is more or less an arbitrary threshold for significance. To see if that was the case, the authors did a density plot of p values in the original paper versus those found in the repicates. Here are the results, with p values from original studies on the left and from the replicates on the right.
Screen Shot 2015-09-02 at 12.18.58 PM
Density plots of original and replication P values and effect sizes. P values.

As you can see, the p values for replications were distributed widely, and so were not hovering somewhere near the magic cutoff value for significance (0.05). Of course, all the p values in the original studies (left) were at or below that level of significance, or they wouldn’t have been published.

What does it all mean?

There are two diametric views about how to take this general failure to replicate. The first is to celebrate this as a victory for science. After all, science is about continually testing its own conclusions, and you can only do that by trying to see if what other people found out is really right. This, in fact, is the conclusion the authors come to. I quote from their paper:

Scientific progress is a cumulative process of uncertainty reduction that can only succeed if science itself remains the greatest skeptic of its explanatory claims.

The present results suggest that there is room to improve reproducibility in psychology. Any temptation to interpret these results as a defeat for psychology, or science more generally, must contend with the fact that this project demonstrates science behaving as it should. Hypotheses abound that the present culture in science may be negatively affecting the reproducibility of findings. An ideological response would discount the arguments, discredit the sources, and proceed merrily along. The scientific process is not ideological. Science does not always provide comfort for what we wish to be; it confronts us with what is. Moreover, as illustrated by the Transparency and Openness Promotion (TOP) Guidelines, the research community is taking action already to improve the quality and credibility of the scientific literature.

We conducted this project because we care deeply about the health of our discipline and believe in its promise for accumulating knowledge about human behavior that can advance the quality of the human condition. Reproducibility is central to that aim. Accumulating evidence is the scientific community’s method of self-correction and is the best available option for achieving that ultimate goal: truth.

There’s a lot of sense in this, of course. A result isn’t widely accepted (in most fields) unless it’s repeated or makes firm predictions that can be tested. Self-correction is a powerful too—one of the most important characteristics of science, and one that makes it different from, say, theology.

The “all is well in science” interpretation is also that pushed by Lisa Feldman Barrett in her new NYT op-ed about the study, “Psychology is not in crisis.” (Barrett is a professor of psychology at Northeastern University.) But her piece is a mess, comparing failure of psychology-study replication to changing the environment in which a gene is expressed. In some environments, she says, a gene producing curly wings makes the wings less curly, a common phenomenon that we geneticists call “variable expressivity”. And that’s indeed the case, but it doesn’t meant that the “Curly” mutation doesn’t cause the wings to become curled—something she implies. Variable expressivity is not a failure to replicate the finding that a particular genic lesion is responsible for curly wings.

Barrett also compares the OSC study’s failure to replicate to other studies in which failure to replicate depends on “context” (e.g., mice given shocks at when they hear a sound develop a Pavlovian response), so that one doesn’t see the same results under different conditions (mice won’t develop the Pavlovian response if they’re strapped down when shocked). But that, like the curly-wing result, is irrelevant to the OSC’s efforts, which tried ensure that the context and experimental conditions were as close as possible to those of the original studies.  In other words, the OSC tried to eliminate context-specific effects.  In Barrett’s eagerness to defend and exculpate her field, and affirm the strength of science, she makes arguments based on false analogies.

One thing that we can all agree on—the middle ground, so to speak—is that there’s a problem with the culture of science, which always favors big and impressive positive results over negative results, and favor publication of novel results while largely ignoring attempts to replicate. (Sometimes a failure to replicate isn’t even accepted by scientific journals!) That’s even more true of the popular press, which is quick to tout findings of stuff like a “gay gene,” but can’t be bothered to publish a caveat when that study—as it was—failed to replicate.  This problem, at least in the scientific culture, can be somewhat repaired. Most important, we need more studies like that of the OSC, but replications applied to other fields, especially biology.

And that brings me to my final point, which gives a less positive view of the results. As I said above, I think many studies in biology—particularly organismal biology—aren’t often replicated, especially if they involve field work. So such studies remain in the literature without ever having been checked, and often become iconic work that finds its way into textbooks.

In this way biology resembles psychology, although molecular and cell biology studies are often replicated as part of the continuing progress of the field.  I think, then, that it’s not as kosher to claim that ecology and evolution experience the same degree of self-checking as, say, physics and chemistry. Yes, all work should in principle be checked, but you find precious few dollars handed out by the National Institutes of Health or the National Science Foundation to replicate work in biology. (That’s because there isn’t that much money to hand out at all!) In my field of organismal biology, then, the self-correcting mechanism of science, while operative at some level, isn’t nearly as strong as it is in other fields like molecular and cell biology.

My main conclusion, then, is that we need an OSC for ecology and evolutionary biology. But it will be a cold day in July (in Arizona) when that happens!
_______

Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science, 349 online, DOI: 10.1126/science.aac4716

The marine toad

September 3, 2015 • 9:00 am

by Greg Mayer

Jerry had us spot the toad a few posts ago (I earlier posted an easier ‘spot the frog‘), and in the comments some readers mentioned the marine toad, Bufo (Rhinella) marinus, also known as the cane toad (especially in Australia) or the giant toad. This species, native from south Texas to central Brazil, has been widely introduced in the West Indies (including Bermuda), Florida, Australia, and the Pacific islands. They were introduced primarily as a way to control a beetle which attacked sugar cane; the toads were not very good at this, and have had negative effects on more desirable faunal elements in some places.

Adult female Bufo (Rhinella) marinus, in 2012; origianlly collected on Bermuda, 1999.
Adult female Bufo (Rhinella) marinus, in 2012 in my back yard (Racine, Wisconsin); originally collected on Bermuda, 1999.

The above is my pet female, collected for me during a visit to Bermuda in 1999 by Bermuda’s foremost naturalist and conservationist, David Wingate. He has succeeded in eliminating the toads from Nonsuch Island, a preserve where the restoration of Bermuda’s indigenous fauna and flora is being promoted, with considerable success. She is fairly large, being 165 mm snout-vent length; unfortunately, I did not measure her when I first got her, but she was adult-sized at the time. The largest one I have ever found myself was a 178 mm one in Nicaragua. They get up to around 250 mm; the largest ones are said to be from the Guianas. A rather large preserved individual at the Museum of Comparative Zoology is about 230 mm long, and has long resided in a large Agassiz jar on the coffee table in the herpetology department.

In addition to being large, she’s getting old. I had thought she must be a record, but found that ages up to 25 years have been reported. “Toady” must be at least 17, perhaps a bit more, so she’s got a few years to go. Her only sign of aging is a cataract-like opacity in her right eye, which does not seem to have interfered with her ability to spot prey.

Notice the very large parotoid gland behind her ear; these secrete a milky poison when the toad is stressed, and I have been told that d*gs, not being terribly bright, have been sickened and even killed by attempting to ingest the toads. In South America, carnivorous mammals are said to flip the toads over, and eat them from the belly side, where the skin does not contain toxins (or at least not as much). When being defensive, Toady angles her back toward the unwanted stimulus. The best overall guide to the biology of these toads is still “The Marine Toad, Bufo marinus: a natural history resume of native populations” by my friend and mentor, George Zug, and his wife Pat.


Easteal, S. 1981. The history of introductions of Bufo marinus (Amphibia: Anura); a natural experiment in evolution. Biological Journal of the Linnean Society 16:93-113. abstract

Slade, R.W. and C. Moritz. 1988. Phylogeography of Bufo marinus from its natural and introduced ranges. Proceedings of the Royal Society of London B 265:769-777.  pdf

Wingate, D.B. 2011. The successful elimination of Cane toads, Bufo marinus, from an island with breeding habitat off Bermuda. Biological Invasions 13:1487-1492.  abstract

Zug, G.R. and P. B. Zug. 1979. The marine toad, Bufo marinus: a natural history resume of native populations. Smithsonian Contributions to Zoology 284, 58 pp.   pdf

Readers’ wildlife photographs

September 3, 2015 • 7:45 am

This is the second dollop of reader/photographer Colin Franks‘s delivery of a batch of lovely bird photos (Colin’s Facebook page is here). I’ll put up the third and final installment in two days or so.

Cinnamon Teal (male), Anas cyanoptera:

7D__3619

White-crowned Sparrow, Zonotrichia leucophrys:

7D__4054

Savannah Sparrow, Passerculus sandwichensis: 

7D__4101

Great Blue Heron, Ardea herodias:

7D__4360

Canada gooseBranta canadensis:

7D__4465

Pileated Woodpeckers (babies), Hylatomus pileatus:

7D__4516

 

Thursday: Hili dialogue (with Cyrus and Cleo lagniappe)

September 3, 2015 • 6:30 am

It’s Thursday, and predicted to be another high of around 90 degrees F in Chicago, with perhaps some thunderstorms late in the day, which might cool things off. Summer is hanging on tenaciously. My hunt for crayfish unwisely leaving their pond has come up empty, so I haven’t had to perform emergency pond re-insertion. Meanwhile in Dobryzyn, Hili is having problems dealing with the world.

Hili: Don’t you think that all this is very complicated?
A: No, I don’t, but it’s not simple.
P1030347(1)
In Polish:
Hili: Czy nie myślisz, że to wszystko jest bardziej skomplikowane?
Ja: Nie sądzę, ale proste nie jest.
And lagniappe #1: Cyrus and Andrzej sharing a quiet moment in the evening:
11950251_10206977436021685_4909791062550785180_o
And, as an extra treat, Joyce Carol Oates sends an update on Cleopatra in her new Forever Home:
Cleopatra has commandeered the most comfortable chair in the study.  nap time now.
photo
Look at that spotted belly! Don’t you just want to rub it?

 

Spot the frog!

September 2, 2015 • 1:00 pm

Well, nobody’s interested in science today, I see. I could post on internet drama, but I’m revolted at such a tactic. Instead we’ll have a “spot the beast” contest.

Reader Amy contributed a “spot the frog” photo. I’ll put the answer up in a few hours. Her note:

My d*g was barking at something and it took me a moment to find that it was a frog. (And yes I realize it might be a toad but that’s not alliterative!)
So can you spot the frog/toad?
frog1