Note and correction on the “Lewontin fallacy”

April 23, 2020 • 1:00 pm

Yesterday I wrote about Angela Saini’s misguided claim that human populations and races (I prefer “ethnic groups” rather than “races”) are basically genetically identical. So identical, in fact, that, as Saini argued, it’s entirely possible (or even likely) that the genomes of a South Asian and a white Canadian could be more similar than the genomes of two South Asians. That is wrong, but plays into Saini’s ideological bias that there are no appreciable or meaningful difference between biological races.

As I indicated, we now have sufficient data to show that the chances that her assertion is true is close to zero. Looking at the whole genome, you’re not going to find many South Indians whose DNA is more similar to that of a white Canadian than to that of another South Asian.

In trying to understand why Saini would make such a statement, I speculated that she had bought into the “Lewontin fallacy“: the claim by my ex Ph.D. advisor that the vast bulk of genetic variation segregating in our species occurs among individuals within populations, rather than among populations within a classically-defined “race” or among races.

From his mathematical analysis, Lewontin concluded that the term “race” has no biological reality. The error of Lewontin’s claim was pointed out by geneticist A.W.F. Edwards, who noted that Lewontin was treating each gene as independent. But they’re not, because the constraints of history, geographic separation, and evolution ensures that differences among populations and races at different genes are correlated. Taking these correlations into account, Edwards concluded this (characterized in Wikipedia):

In Edwards’ words, “most of the information that distinguishes populations is hidden in the correlation structure of the data.” These relationships can be extracted using commonly used ordination and cluster analysis techniques. Edwards argued that, even if the probability of misclassifying an individual based on the frequency of alleles at a single locus is as high as 30 percent (as Lewontin reported in 1972), the misclassification probability becomes close to zero if enough loci are studied.

And the use of cluster analysis is in fact the way that population-genetic studies are able to describe evolutionary history and ancestry from DNA data. I cited cluster analysis of the genetic structure of the British Isles as an example of how well one can deduce someone’s ancestry and geographic origin from looking at half a million base pairs—a small fraction of the total DNA in the human genome (about 0.02%).

I stand by my claim that Saini was wrong, but I did err on one count, one that doesn’t affect my conclusions but that I wanted to point out to be scientifically accurate.

And that is this: Lewontin’s original claim about the apportionment of genetic variation among individuals, populations, and races was incorrect. This was pointed out to me by reader, biologist, and polymath Lou Jost, who works at a field station in Ecuador. I vaguely remembered that Lou had done some work on this, but had forgotten, as he just reminded me, that work completely invalidates Lewontin’s method.

Lou has written several papers on this error, one of which you can access for free. (I have the other papers if you want the pdfs.) Click on the screenshot:

The math behind Lou’s arguments is above the pay grade of many of us (including me), but I, at least, am convinced that Lewontin was wrong for reasons beyond Edwards’s claim: he was wrong because, as Lou showed, he “used a measure of ‘differentiation’ that doesn’t really measure differentiation.” Lou presented an alternative diversity-based model that will allow you to compare differentiation within and among groups (this holds for species diversity in ecology as well as genetic diversity in and among populations), but he didn’t apply it to Lewontin’s data, because those data are outmoded now (they were based on electrophoretically derived allele frequencies).

The take-home lesson is that Lewontin’s conclusion is wrong not only because it applies to single loci assumed to be uncorrelated, but also because he used the wrong metric to compare within- versus between-group diversity. As Lou noted, the take-home lesson of Lewontin’s paper—that most of the genetic diversity in our species is present in any population of our species, with only smaller amounts added by looking at different populations or races—may still be right. But until Lou’s metrics are applied to the new and better data we have, we just won’t know.

Again, this correction affects an idea that I thought Saini might have been erroneously pondering when she made her misleading statement. It does not affect the fact that her statement is misleading, and that we really can distinguish populations and ethnic groups very well using genetic data—the more data the better. And it doesn’t affect my claim that Saini is either deliberately misleading people or is ignorant about the data on population differentiation in our species, and that her ignorance, willful or not, plays into her ideological narrative about “races.”

I stand corrected on the Lewontin issue, and thank to Lou Jost for setting me straight about the “Lewontin fallacy.”

31 thoughts on “Note and correction on the “Lewontin fallacy”

  1. Ethnic groups are cultural groups. Racial groups are biological groups. You can’t use the former in place of the latter without making a category error.

  2. Thanks for this further explanation. I recall, in little detail, Lou describing this same issue in a comment quite some time ago.

  3. I love that science corrects science. Can you imagine a similar “setting the record straight” discussion in a journal of theology?

  4. All due respect for acknowledging the slight error in the original post. A demonstration of why the fact-based scientific method and its practitioners can be trusted.

    1. Mr Jost, perhaps it’s not the best format for this question but were not Lewontin’s conclusions verified in a number of later papers such as Edge and Rosenberg (2014) titled “Implications of the apportionment of human genetic diversity for the apportionment of human phenotypic diversity” based on the Edward’s model of correlational structure?

      They also used Fst of course, do you think they had similar issues to the Lewontin paper?

  5. Nice to be accurate.
    When I did some work in CS years ago I worked with a statistician to make sure the numbers were crunched correctly. Naturally, he is named as an author.

  6. There are two separable issues here, that keep getting mixed up. (1) How different are two populations at a typical locus? and (2) Can we use multiple loci to figure out which population you come from? Lewontin’s figure is roughly correct for the first question, and yes, we can use a large number of loci, each of which shows modest amounts of differentiation, to figure out which population you come from (and genome testing companies now do this routinely). The individual loci show modest differences in frequency, but that information can be accumulated over many loci. I also think that saying, mysteriously, that the information is “hidden in the correlation structure”, does not help clarify the matter — it is just that there are modest differences at a great many loci.

    1. “Lewontin’s figure is roughly correct for the first question”

      But you see the probelm with his method, right?

      And I think at many loci in the major histocompatibility complex, the real differentiaton between subpopulations would be much higher than Lewontin implied.

      1. Lewontin’s method when he gives his 15% figure, that 15% of the total variation is between populations, could be wrong if he used a heterozygosity measure and if the populations actually shared no alleles in common. Was that the case? No. (The histocompatibility locus is not a typical example of anything — it is under great selection to be very atypical). Beyond that, it is like asking of two statistically very significantly different populations whether that means that their differences are “large”. Not necessarily, if the sample size is large.

        1. “Lewontin’s method when he gives his 15% figure, that 15% of the total variation is between populations, could be wrong if he used a heterozygosity measure and if the populations actually shared no alleles in common. Was that the case? No.”

          I am not sure which part you think is not the case. The measure he used, entropy, does have the problem I described. And the problem arises even when the subpopulations are not completely differentiated.

          I am not criticizing his conclusions, just the reasoning.

        2. I fully agree with Lou Jost’s point of view on that.

          I came to the same conclusion about the inadequacy of Fst/Gst type metrics while testing for divergence between two african cichlid species at the major histocompatibility complex (Mhc) back in 2007 (Blais et al. Plos One 2007 2(8): e734).

          Fst/Gst are measures of relative diversity rather than differentiation per se. This is clearly expressed in this quote from the wikipedia entry stating that Fst can be conceived as: “…the fraction of total diversity that is not a consequence of the average diversity within subpopulations.”

          By the way, the fact that the Mhc may be atypical does not mean that it cannot serve as an example to illustrate this point ! Suppose a metrics A measures x but you’re interested in y. Now if under certain conditions, metrics A gives results that are similar to y but under other conditions gives completely non-sensical results with respect to y, the fact that these other conditions are deemed “atypical” should not prevent one from concluding that metrics A is innapropirate to measure y ! Especially if it turns out that in fact A doesn’t really measure y in the first place !

          Indeed, the Mhc is atypical in its extreme diversity, which is precisely what you need if you look for a concrete example of the effect of allelic diversity as theoretically described in Lou’s paper.

          Fst/Gst simply don’t measure how genetically different populations/races/species are from each other. If one is interested in estimating how different, divergent and genetically distinct some groups are from each other, one needs a measure that reflects genetic identity and that means a measure incorportating the proportion of shared and private alleles among the groups considered.

          1. The FST-like measures may or may not be good at assessing “how different” (whatever that means in the particular case) but they do infer how much genetic drift would have been needed to get the observed pattern of differentiation.

  7. Thank you to you and to Lou for the clarification. In teaching about human genetic diversity, I use phylograms. Is there a visual way to show the genetic relationships under discussion here?

    1. Mark, I’m not really making any phylogenetic claims, and I don’t think there is a mathematical problem with the way phylogenies are created or displayed. The problems arise when we ask HOW different are the groups, and how diverse is each group. For those kinds of questions my math is important.

      1. Oh I know that! I wasn’t asking about a phylogenetic tree. I was wondering if there exists somewhere something like a pie chart or some other kind of graphical representation.

  8. Funnily enough I was thinking about this sort of statistics for entirely different reasons.
    So it’s interesting.
    If Lou is reading this, I want to say that I’m enjoying the paper! It’s fun to see how different fields handle similar sorts of problems and that it is very nicely written.

    1. Thanks Andy. I knew the paper would be controversial and likely misunderstood, so I spent a lot of time thinking about how to write it.

      It still gets misunderstood, but hopefully less often than if I had written it quickly.

  9. You might want to consider using the term “population” rather than ethnic group. While even as an informed social scientist I can tell that this person’s assertions are absurd, part of the confusion centers on the use of the word “race”. Race carries essentialistic and discontinuous connotations and people associate it exclusively with single traits like skin color. The other issue is whether it makes sense to split the human species into sub groups, and if so, how? As I recall, Cavalli Sforza divided into at least 50 language groups that had similar genetic markers.

    1. I was once arguing on a blog with someone who argued that since we could detect differences among populations using enough markers, that this validated the concept of “race”. So I asked him, OK, so is it possible that there are then 1,000 races? He said, yes, maybe. It seems to me that that would not be what others had in mind.

    2. If you do not mind, I will attempt to answer your question as to why subgroups should be considered with a question of my own.
      As a pediatric doctor I saw a 7 year who had a painful erection of 2 hours duration which is consistent with priapism. For various social reasons no history was available. In patients with sickle cell disease, priapism is not uncommon and is in fact the most common cause of priapism in preadolescent males. However, no case of priapism due to sickle cell disease has been documented in a child not of Africa descent that I have found (so at a minimum exceedingly rare).
      At the time, it was 4:45pm. The treatment if due to sickle cell disease is a blood transfusion. Otherwise a urologist needs to perform a bypass operation. Time is of the essence because the longer the priapism duration, the higher the chance permanent injury occurs and he is rendered permanently impotent. After 5pm I will be dealing with the on call urologist adding 2-3 extra hours of time before surgery. Ditto for the transfusion. Both require a phone call by me and 15 minutes is not enough time for both. He was a white male by intake questionnaire and by exam.

      How could I make that decision without considering race?

    3. There is an important distinction between “race” in the sense of a biologically distinct subpopulation of humans and “ethnicity”.

      Ethnic identity can be defined in being composed of:

      1.) a belief in common ancestry;
      2.) shared language, rites, myths and customs;
      3.) taboos against exogamy.

      Obviously, common ancestry + exogamy creates genetic isolation, so there is a genetic component of ethnicity, but more importantly language and religious identity are critical, and have nothing to do with biology. Further, adoption into groups is pretty common in humans, which means ethnic groups are more porous than a strict biological perspective would allow.

      I think ethnicity is important, but I don’t think ethnicity can substitute for “race” or “population” because it is so heavily culturally determined. Two populations can be almost identical genetically, but have distinct and rival ethnic identities.

      “Race” is problematic, not for the reasons usually cited, but because it is too vague. If a race is a relatively isolated population of persons with inbreeding relative to other populations, then it cover the gamut from the Amish to the Caucasians. It has no clear scale, and lacks precision. Further, given that most political conflict occurs between ethnic groups, there is a danger it can be used to biologically “essentialize” what amounts to cultural and/or political conflicts.

      On the other hand, “race” as might be understood in 1950’s physical anthropology, that is, historically isolated genetically distinct continental populations separated by geographical barriers, is a biological concept. There really is a genetic discontinuity between populations on either side of the Altai Mountain range. There really is a genetic discontinuity between Melanesians and Southeast Asian populations separated by an ocean. That would be biologically defined as it transcends ethnicity. Further, it is mostly removed from politics except in places like Australia where you have colonization by one population and conflict with the indigenous population.

      The concept of race is allegedly the cause of genocide, but if you examine most, if not all, modern genocides, you see genocides carried out by rival ethnic groups who are often highly similar genetically to their rivals. Not sure whether the Hutus believed that their differences with the Tutsis were “real” or “socially constructed.” Its not clear that it mattered in the Rwandan genocide. The recurring pattern in genocide is usually a privileged minority gets blamed by a resentful majority for “stealing” what is owed to them, and then the genocide results in free stuff to loot and political opportunities for the elite manipulating the majority. Greed, envy and enmity are probably a bigger factor than anthropological or scientific theories of genetics. Socially constructed theories of race can serve just as well as biologically essentialist theories to justify stealing stuff, killing off rivals, and taking their place in government and industry.

  10. The races the racists use to feel superior simply divide humans into groups based entirely on skin color so with a lot of genetic variation within each race. For example genetic testing of these “blacks”, a dark skinned Indian, a darker skinned Nigerian, and a dark Skinned American would probably show that the American had more genetic information in common with a White European than either of the other two individuals and the other two were about as distinct from each other as either of them are from a white European. That’s one reason why “ethnicity” is a more valid way of categorizing humans than “race” and why it actually is true that “race” is a social construct. You don’t need to accept the lie that there are “essentially” no genetic differences between different human populations to combat racism. You just have to insist that those differences do not justify discriminating against, nor treating as sub-human, any human.

    1. I don’t know who you mean by “racist”, but if we take Immanuel Kant as our paradigm racist, he made a distinction between “whites”, “Negros”, “Hunnish” and “Hinduish” (into which he lumps Native Americans) races.

      These distinctions are geographical and cultural, based on the anthropological understanding extant in his time. Basically, geographically from Northwest, Southwest, Northeast, and Southeast. He would distinguish between a Sub-Saharan African and an Asian Subcontinental, even if they had the same skin tone. [He goes on to build a hierarchical scheme and exhibits classic white supremacy based on stereotypical characterizations of the other.]

      You have different populations in geographic isolation and Europeans noticed (among others) that these peoples not only had differences in customs and laws and dress, but also physical differences.

      The big change in European thought came in the shift from the Classical Greek notion that group differences were a result of differences in climate (which you can find even in Montesquieu’s writings in the 18th century) to an understanding that these differences were caused by heredity (to which Darwin and his “evil” half-cousin Galton contributed). [But note that whether its based on environment or heredity, everyone is constructing hierarchies of peoples since Antiquity. Your group can be better because God chose you, or based on your superior genes, or based on your collective unjust humble suffering for centuries under your oppressors, or all three.]

      The general trend from environmentalism to hereditarianism happened gradually over the 19th and early 20th centuries (but the Blank Slate didn’t disappear, see Communism), but the atrocities of the Second World War created a major backlash against any form of hereditarianism, no matter how mild. The problem is that developments in genetics may ultimately force a need for a revision of the postwar consensus, and given the economic and medical value of genetic discoveries, its hard to imagine the genie getting back in the bottle. [Even if SJW fundamentalism shut down genetic research on humans in the West, there is no reason to think that CCP would hesitate to take advantage of a scientific lead.]

      David Reich has good discussion of race in his book on population genetics. Its mysterious that Saini knows who he is, but seems to ignore his research.

      Its not clear that the problem stems from the existence of group differences, so much as the need to create hierarchies and decree that one group is “metaphysically” better than the other. On the other hand, this kind of tendency toward ethnocentrism may be one of the universal traits that all humans share in common.

  11. As a kind of corollary to this discussion, I have a question for Dr. Coyne:

    I believe you previously intimated that group selection, to the extent that it exists, is reducible to kinship altruism.

    Isn’t that claim belied by the phenomenon of Argentine Ant super-colonies, in which the ants can have large levels of genetic diversity within a colony but successfully cooperate with other same-colony ants?

    [The point being that non-kinship based altruism would make it possible to have selection on a cooperative colony of genetically diverse organisms, e.g. wiped out by a rival colony.]

    As an aside, it is interesting that the European ants have done a better job of integration than the European humans, isn’t it?

Leave a Reply