A misguided critique of genetic ancestry testing

May 23, 2023 • 11:00 am

Unfortunately, NPR has gotten hold of Agustín Fuentes, who seems to have a strong ideological slant on biology, to explain to its listeners the “problems” with using DNA tests for ascertaining your ancestry—as many of us have done with companies like 23andMe™.  Sadly, Fuentes’s “criticisms” of the method and results are misguided, bespeaking either an ignorance of biology or an ideological drive to convince people that humans around the world are so similar that it’s next to useless to use DNA to find out your ancestry. (This is, of course, part of the view that “race is a social construct”, which apparently now means “ethnic groups can’t readily be identified by their DNA.”)

To cast doubt on such tests, Fuentes makes a number of claims: races (or ethnic groups) are social constructs; we don’t have enough data to reliably identify groups from their DNA (ergo we don’t have enough data to reliably determine your genetic origins); that one doesn’t expect to find genetic differences between geographically separated populations because geography is purely subjective and arbitrary; that people move around too much to reliably determine the location where one’s ancestors lived; that your genealogical history may diverge from your genetic history; and that the best that ancestry tests can do is tell you what genetic diseases you may be prone to.  To sum up all the misguided information that Fuentes gave in his 14-minute interview with Regina Barber, I’ll first give one paragraph from Fuentes’s interview:

So I will tell you right now, my 23andMe tests miss a bunch of my actual kin – right? – because, like, most of your ancestors contributed no genetics to you – right? – because of the way genetics mixes down and across. And here’s the punchline for ancestry testing. It actually can tell you some information. When it comes to certain diseases, it’s actually really important to know, but it does not tell you who you are, and it actually doesn’t tell you who your ancestors are. It tells you which peoples from different places contributed to your genetics. But that is not your family, right? Your genealogy is more than just the biology.

Now we’ve met Fuentes before and I’ve taken issue with his distortions of biology (see here for some posts), especially those insisting that Darwin was a racist and that there is no such thing as a sex binary.  What worries me, especially in this NPR interview, is how Fuentes, perhaps in the interest of ideology, has repeatedly misled the public. In my view, the NPR interview does damage the public understanding of an important area of modern genetics.

But hear (or read) for yourself. The short NPR show (14 minutes) can be found by either clicking on the screenshot below and then by listening to the show, or by reading the transcript here.

I had an email discussion with my colleague Luana Maroja at Williams College about this, for the two of us have co-written a paper on this and similar topics that will be out in a month. She gave me permission to use her name and her words, and so I’ll put her words in indented italics and mine flush left in roman type. Fuentes’s statements from the interview are indented in roman type.

First, a few words about the supposed inability of using DNA to determine one’s ancestors. Although it’s true that most genetic variation occurs within rather than between populations (this was first popularized by my advisor Dick Lewontin), and that 99.9% of the DNA between any pair of humans is identical, people don’t realize that that still leaves a substantial amount of genetic difference between people, and especially between populations, that can be used diagnose ancestry. We know this because the human genome has 3.3 billion base pairs, and even 99.9% identity leaves 3.3 million differences among individuals.

And research has shown that a lot of those differences occur between geographic populations. (I use either that pharse or “ethnic group” instead of “races” because we know that the classical idea of races as absolutely geographically demarcated groups, profoundly genetically differentiated,and diagnosable using few genes—is wrong.) But differences between populations become clear when you use a large group of those 3.2 million segregating base pairs (SNPs, or “single-nucleotide polymorphisms”), and these can be used to tell you where your genes come from. If it was way off the mark, companies like 23andMe would be out of business.

For example (do check out the links for yourself):

a.)  Even the old and outmoded view of race is not devoid of biological meaning. A group of researchers compared a broad sample of genes in over 3,600 individuals who self-identified as either African-American, white, East Asian, or Hispanic. DNA analysis showed that these groups fell into genetic clusters, and there was a 99.84% match between which cluster someone fell into and their self-designated racial classification. This surely shows that even the old concept of race is not “without biological meaning”. But that’s not surprising because, given restricted movement in the past, human populations evolved largely in geographic isolation from one another—apart from “Hispanic”, a recently admixed population never considered a race. As any evolutionary biologist knows, geographically isolated populations become genetically differentiated over time, and this is why we can use genes to make good guesses about where populations come from.

More recent work, taking advantage of our ability to easily sequence whole genomes, confirms a high concordance between self-identified race and genetic groupings. One study of 23 ethnic groups found that they fell into seven broad “race/ethnicity” clusters, each associated with a different area of the world. On a finer scale, genetic analysis of Europeans show that, remarkably, a map of their genetic constitutions coincides almost perfectly with the map of Europe itself. In fact the DNA of most Europeans can narrow down their birthplace to within roughly 500 miles.

b.)  Here’s a genetic cluster analysis (using principal-components analysis of many genes from many Italian populations, nicely separated by geography (the paper is here). This is based on only about 270 variable SNPs in 210 genes studied in 1736 individuals. Although there’s been some mixing (overlap between clusters), in general you would be able to localize where in Italy a person was from by looking at even a relatively small sample of their DNA variants. Why the different groups? They reflect the history of colonization and settlement in different parts of Italy as well as local population structure due to mating with those born close to you. Clearly, migration has not been sufficient to efface these historical differences. You get similar maps if you look at the three links above, which cover both Europe and the whole world.

c.) You can also place people pretty accurately using variation within transposable (“mobile”) genetic elements, as you can see in this figure using a cluster (principal components) analysis of MEVs, or mobile element variation.  Populations fall out genetically very well according to the continent from where the individuals were sampled (the Nature paper from just 12 days ago is here).  Continental areas are coded this way: AFR, African; AMR, American; EAS, East Asian; EUR, European; SAS, South Asian. And remember, this is only DNA sequences in moving elements. If you use every bit of DNA in whole genomes, you get much cleaner results.

(If you added positions of these elements, you’d get even more information, but the analysis above seems to depend on DNA sequences alone, which aren’t ideal for MEV’s because they have are so many repeats.) Still, look at how just a small sample of the genome can give you pretty good diagnostic ability.

How many SNPS do companies like 23andMe use? Over a million variable sites (see here). That gives substantial diagnostic ability to determine where one’s ancestral genes came from. Not only that, but since we know the gene order, you can use that to find your relatives, for relatives not only have similar variants, but also have the same sets of variants grouped together on their chromosomes, as “linked” gene variants aren’t broken up by recombination within a generation or two. My own 23andMe analysis found several distant cousins, and when I checked with my sister, sure enough, they were indeed my cousins. This would not be possible unless the variation had some biological significance. You can diagnose ancestry with good accuracy, but you can also find your relatives! (Because of “linkage disequilibrium” between sites, you can even “paint” the chromosomes based on geographic ancestry, showing recombination that happened in your ancestral lineage).

Now that I’ve told you the fallacy of Fuentes’s insistence that DNA testing is severely compromised because most humans are genetically identical, I’ll turn you over to Luana, who knows a lot more about this stuff than I do, as she not only does it herself, but teaches it to her undergraduates.  She analyzes (her words in italics, again) a number of Fuentes’s claims, and, actually, finds the whole interview deeply misleading about DNA testing. Note that her words are reactions to what Fuentes said in the interview.

FUENTES: So here’s the deal. When you spit in a tube and send them – let’s take 23andMe – your DNA, they analyze your DNA – this little, teeny piece of it – right? – they don’t analyze all of it – and they file that in storage. It’s like, you know, a compartmentalized cluster of information. These are reference populations. These reference populations – the data they have are how they place your DNA and tell you something about it.

. . . This ability to take your spit and put it in a tube, pay someone 150 bucks and have them send you something back about your DNA – that is amazing. But what it tells you – when they send you back your results, that splash page is never accurate because the thing it should say on that splash page is, congratulations, you are 99.9% identical to every other living human. That’s not what it tells you.

LUANA: He seems to ignore that they use SNPs (single nucleotide polymorphism) rather than whole genome sequencing.  Well, because they only use informative sites (SNPs)—the sites that vary among individuals and populations—and not the sites that are 99.9% identical among people, they cannot actually come back with a result saying “you are 99.9% identical to all humans”.  The SNPs they actually use in determining ancestry are the variable sites alone, the 0.1% of the human genome.  And because they categorize people NOT by race, but by geographic location, Fuentes’s criticism of race as a social construct also falls apart.  

FUENTES: Yes. There are tens of thousands, if not hundreds of thousands of idealized reference populations in humans. So it sure as heck doesn’t tell you where you are in the human panoply of genetics.

LUANAThen he goes on to say there would be more populations if they sequenced more people – but this is not the point.  The populations nearby would still be the most genetically similar because of strong isolation by distance – so you could subdivide more (for instance, now Italians can be further subdivided between south, middle and north), but that would not change the fact that if your DNA says you are most closely related to people descending from the Italian peninsula, that doesn’t mean you may be more closely related to North Europeans, because Italians are  more closely related to each other than to North Europeans.

JAC: One of the biggest flubs in Fuentes’s argument is his claim that continental areas, because (he says) they are demarcated subjectively, they aren’t really expected to have much correlation with genetic differentiation. But in fact that’s how genetic differentiation occurs: by lack of gene flow between geographically isolated populations, which causes them to evolve in different directions. He picks out the only “arbitrary” geographic division I know of between continents to make his point. But even that divide, between Europe and Asia, is not purely subjective: it’s usually at the Ural Mountains, which are a geographic barrier.

FUENTES: A reference population is a cluster of individuals who have their DNA sequenced from some geographic place – continents, big geographic space. So Africa, Asia and Europe are not biological units, right? They’re not even single geobiological patterns or areas or habitats or ecologies, right? They are geopolitical. We named them. We created these landmasses and divided them in certain ways. So for example, what is the difference between Asia and Europe?

BARBER: Other than geographic location?

FUENTES: No, when does Asia become Europe?

BARBER: Oh, I don’t know.

This is cherry-picking nonsense. Of course the geographical demarcation between Europe and Asia is somewhat arbitrary (though it does involve a mountain barrier, but this does not mean that you can’t tell a European from people in various parts of Asia). And of course the other regions: the Americas, Polynesia, Australia, Africa, and so on, are geographically isolated. The difference between Europe, Asia, and Africa, or between Australia and the Americas, is not arbitrary. Further, the presence of genetic continuity is clear in DNA information, with more significant geographic barriers usually usually leading to greater population structure.

Luana chimes in:

LUANA: Then one more bit of nonsense – because we named continental regions – it does not mean they were not “regions”.  In fact, our geopolitical nomenclature usually follows geographic lines pretty closely – rivers, mountains etc.  And the categories of 23andMe are not sociopolitical locations – they are geographic locations – not countries.  These include the Iberian peninsula, Great Britain, east Asia etc.  Not to mention that political and linguistic boundaries also have a huge effect on gene flow. I am baffled about why Fuentes is even talking about subjective “geopolitical boundaries.”

FUENTES: The problem is that they don’t actually tell you from the get-go how human you are – right? – 99.9% identical to everyone else. It’s 0.1% that varies across humans – 0.1% of our DNA. They don’t tell you sort of how that actually varies. They tell you you are X percentage African, Asian or European because we think of continents – we think of Africa, Europe and Asia as places that reflect biologies, that reflect deep lineages in humanity. And that’s not true. So the danger in these tests is reifying that. You say, like, oh, I’m 17% African. Wow, I’m 17% Black. Those two things are not the same, right? If you have 17% ancestry, let’s say, from Africa on a test from 23andMe, most – and you’re here in North America, most likely, you have some genetic ancestry in populations from West Africa, right? That’s interesting. That’s fascinating. That’s important. But that doesn’t mean you have any relation to anyone in South Africa or East Africa or Central Africa or North Africa. Africa is not a biological unit. There is no gene for race because race doesn’t come from biology. It comes from racism.

LUANA.  More nonsense. He says, “But that doesn’t mean you have any relation to anyone in South Africa or East Africa or Central Africa or North Africa. Africa is not a biological unit. There is no gene for race because race doesn’t come from biology. It comes from racism.”

This is ridiculous – A sub-Saharan African population is indeed more closely related to other populations from that area than to populations from other areas, for genetic mixture between Sub-Saharan African and other groups was impeded by the Sahara. In all principal components analyses, sub-Saharan African populations appear as tight clusters, differing even from other African populations, with additional diagnostic differences seen within locations in the sub-Saharan cluster.  So, I think what he means is that you won’t have close family members in Africa, for we’re talking about the kind of ancestry that dates back thousands of years, not a couple generations.  

Luana found this 2011 paper from the European Journal of Human Genetics that shows the genetic structure of African and non-African populations. Notice that all sub-Saharan African populations in this principal-components analysis group together at the right (dark green), and are separate from northern African populations (orange), while European populations (blue), South Asians (pink), east Asians (light green), Pacific Islanders (yellow) and the Americas (tan). While there is some mixing, you can see that in general, the genetic clusters correspond to geographic localities, and sub-Saharan African populations are one of the most isolated of them all.  (Also notice now similar this SNP map is to the map of movable genetic elements shown above:  genetic information from different sources converges to a similar structure set by past population history).

(from the paper, subfigure a): Figure 1 PCA of merged HGDP and Hap Map 3 samples. Panels show the results of the PCA for the full merged set of SNPs (460 147 SNPs) (a), for random subsets of 100 000


JAC: One of Fuentes’s misleading beefs is that human migration largely nullifies any value in DNA testing:

FUENTES: But what it can tell us is where do you map related to these reference populations? What does the movement of humans look like? And the best thing they’re doing now is you can ask, sort of, well, where was I – where do my ancestors – genetic ancestors – where were they 200 years ago? Where were they 2,000 years ago? Where were they 10,000 years ago? And guess what? They’re different places. Now, humans throughout history – right? – for at least the last 3- to 500,000 years, humans and our most recent ancestors have been moving around and having sex with each other regularly. Humans do that. And that’s what we’re from.

LUANAAnd then this empty statement: “Now, humans throughout history – right? – for at least the last 3- to 500,000 years, humans and our most recent ancestors have been moving around and having sex with each other regularly. Humans do that. And that’s what we’re from.”  Sure, who said otherwise??  This is exactly what 23andMe gives you – the mixing, for it assumes mixed ancestry.  What Fuentes is leaving out is that human populations are also quite quick to regain genetic structure after replacement events (due to the very low ancestral migration distances in our species) and after settlement, humans tended to disperse very little until the invention of rapid transportation starting with horses and now with airplanes.

JAC:  One more argument Fuentes makes against assessing your ancestry via DNA testing is that his own personal ancestry changed over time as he took repeated tests. This argument implies that, say, a test you take now may be completely off the mark:

FUENTES: The cool thing about these tests is that they’re constantly updating their reference populations. So really cool part of this is that once you’ve done it, Ancestry.com, 23andMe or any of the other companies keep going back because as they expand their reference populations, lo and behold, your genes change. Everything changes about you. I – it’s basically – they just get more information, so they know better about you. So, for example, I’ve been watching myself slide around, like, the Iberian Peninsula, North Africa, way over into Arabia, down into Sudan, back up, back over. And then lately I’ve been shoved, like, way up into Russia. But what’s interesting is that you learn more and more about all of the movement of those peoples that contributed to you and how we are all mutts and how we’re all this blend of amazingness

LUANA: Finally the very thing he says: I’ve been watching myself slide around, like, the Iberian Peninsula, North Africa, way over into Arabia, down into Sudan, back up, back over”  simply shows the huge progress the sites are doing for identification.  When I first sequenced my DNA, I came out as partially east Asian.  Nowadays I have no East Asian, it is all Native American – in the past they did not have enough information to finely break these two related groups, now they do.  This is progress.  Unlike Fuentes’s insinuation, this means the dataset is getting more robust and that it’s easier to finely locate people to smaller regions. 

(Luana is from Brazil but has mixed ancestry from within the Americas.)

Jerry here again:  Fuentes’s presentation on this NPR show makes the listener think that the real value in DNA testing is not the “slippery” business of finding out where your ancestors come from, but what genetic diseases you have. He raises a number of “problems” with tests like those used by 23andMe, but these are not serious problems. And by concentrating on the similarity between humans, without emphasizing that there are several million sites in the DNA that can be used to diagnose ancestry as well as to find your relatives, he’s neglecting the fact that it is those millions of variable sites that are the ones that CAN BE AND ARE used to detect your ancestry—and we know now that they do so with substantial accuracy, as the data above show.

Fuentes’s deliberate neglect of genetic differentiation between populations that are geographically isolated or isolated by distance and by cultural “inbreeding”—the way we diagnose ancestry—can only be understood as an obfuscation due to either ignorance or ideology.  If you adhere to a certain ideology, populations cannot be allowed to show diagnostic genetic differences because that means that populations are different, and thus that populations could be unequal. And thus they could be superior or inferior.  This sliding from “difference”, which is indisputable, to “ranking”, which need not happen at all if you’re rational, is why “progressive” ideologues oppose the emphasis on diagnostic genetic differences between human populations. It is another case of reading into nature what you would like to see in nature.

And that is why Barber starts her interview with Fuentes this way.

BARBER: And aside from leaving out our similarities, most of these tests spit out results based on large, geographic locations – so continental ancestry. The problem is that these kinds of results – think African, European, South Asian – are then linked to race, a social construct.

No, we’re not talking about race or social constructs here: we’re talking about geographic populations, and which ones contributed genes to your own DNA.

Finally, because it’s so cool, here’s the genetic map of Europe compared to the geographic map, taken from the 2022 PNAS paper cited above. The genetic data, presented again as a principal components analysis on the right, are based on 5,500 individuals and 204,652 SNPs (single-nucleotide polymorphisms). Isn’t the coincidence between the genetic and geographic maps remarkable? This shows that migration has not effaced historical data, and that you don’t need obvious geographic barriers to get distinguishable clusters.

(From the paper): A sample of European structure in the UKBB. (A) The number of individuals included from each European country analyzed. Countries are grouped by geographic region; these regions are chosen as a means of group representation and do not necessarily imply historical links. Sample sizes from each region are also shown. Abbreviations are as follows: SE Europe (southeastern Europe), S Europe (southern Europe), E Europe (eastern Europe), C Europe (central Europe), N Europe (northern Europe), W Europe (western Europe), Brit. & Ire. (Britain and Ireland). (B) The sample counts for each European region. (C) The first two PCs calculated by PLINK of 5,500 European individuals. Individual genotypes are shown by letters that encode the alpha-2 ISO 3166 international standard codes and are color coded according to geographic region. The median PC for each country/region of birth is shown as a label. Plots were generated using the ggplot2 package (65) in the R statistical computing language (59).


And that, ladies and gentlemen, brothers and sisters, and comrades, is how we can make fairly accurate guesses about where your genes (and distant ancestors) come from.


UPDATE: Within a minute of pressing “post,” I got this notice from 23andMe, saying that they’d located putative relatives of mine, including one second cousin and three third cousins. I’ll check with my relatives!

