Yesterday I wrote about Angela Saini’s misguided claim that human populations and races (I prefer “ethnic groups” rather than “races”) are basically genetically identical. So identical, in fact, that, as Saini argued, it’s entirely possible (or even likely) that the genomes of a South Asian and a white Canadian could be more similar than the genomes of two South Asians. That is wrong, but plays into Saini’s ideological bias that there are no appreciable or meaningful difference between biological races.
As I indicated, we now have sufficient data to show that the chances that her assertion is true is close to zero. Looking at the whole genome, you’re not going to find many South Indians whose DNA is more similar to that of a white Canadian than to that of another South Asian.
In trying to understand why Saini would make such a statement, I speculated that she had bought into the “Lewontin fallacy“: the claim by my ex Ph.D. advisor that the vast bulk of genetic variation segregating in our species occurs among individuals within populations, rather than among populations within a classically-defined “race” or among races.
From his mathematical analysis, Lewontin concluded that the term “race” has no biological reality. The error of Lewontin’s claim was pointed out by geneticist A.W.F. Edwards, who noted that Lewontin was treating each gene as independent. But they’re not, because the constraints of history, geographic separation, and evolution ensures that differences among populations and races at different genes are correlated. Taking these correlations into account, Edwards concluded this (characterized in Wikipedia):
In Edwards’ words, “most of the information that distinguishes populations is hidden in the correlation structure of the data.” These relationships can be extracted using commonly used ordination and cluster analysis techniques. Edwards argued that, even if the probability of misclassifying an individual based on the frequency of alleles at a single locus is as high as 30 percent (as Lewontin reported in 1972), the misclassification probability becomes close to zero if enough loci are studied.
And the use of cluster analysis is in fact the way that population-genetic studies are able to describe evolutionary history and ancestry from DNA data. I cited cluster analysis of the genetic structure of the British Isles as an example of how well one can deduce someone’s ancestry and geographic origin from looking at half a million base pairs—a small fraction of the total DNA in the human genome (about 0.02%).
I stand by my claim that Saini was wrong, but I did err on one count, one that doesn’t affect my conclusions but that I wanted to point out to be scientifically accurate.
And that is this: Lewontin’s original claim about the apportionment of genetic variation among individuals, populations, and races was incorrect. This was pointed out to me by reader, biologist, and polymath Lou Jost, who works at a field station in Ecuador. I vaguely remembered that Lou had done some work on this, but had forgotten, as he just reminded me, that work completely invalidates Lewontin’s method.
Lou has written several papers on this error, one of which you can access for free. (I have the other papers if you want the pdfs.) Click on the screenshot:
The math behind Lou’s arguments is above the pay grade of many of us (including me), but I, at least, am convinced that Lewontin was wrong for reasons beyond Edwards’s claim: he was wrong because, as Lou showed, he “used a measure of ‘differentiation’ that doesn’t really measure differentiation.” Lou presented an alternative diversity-based model that will allow you to compare differentiation within and among groups (this holds for species diversity in ecology as well as genetic diversity in and among populations), but he didn’t apply it to Lewontin’s data, because those data are outmoded now (they were based on electrophoretically derived allele frequencies).
The take-home lesson is that Lewontin’s conclusion is wrong not only because it applies to single loci assumed to be uncorrelated, but also because he used the wrong metric to compare within- versus between-group diversity. As Lou noted, the take-home lesson of Lewontin’s paper—that most of the genetic diversity in our species is present in any population of our species, with only smaller amounts added by looking at different populations or races—may still be right. But until Lou’s metrics are applied to the new and better data we have, we just won’t know.
Again, this correction affects an idea that I thought Saini might have been erroneously pondering when she made her misleading statement. It does not affect the fact that her statement is misleading, and that we really can distinguish populations and ethnic groups very well using genetic data—the more data the better. And it doesn’t affect my claim that Saini is either deliberately misleading people or is ignorant about the data on population differentiation in our species, and that her ignorance, willful or not, plays into her ideological narrative about “races.”
I stand corrected on the Lewontin issue, and thank to Lou Jost for setting me straight about the “Lewontin fallacy.”