Human Phylogeography: The lessons learned, 1

June 4, 2019 • 9:10 am

by Greg Mayer

UPDATE. A couple of readers have drawn attention to the website, gcbias, of Graham Coop, a population geneticist at UC Davis. He has excellent discussions, with nice graphics, of issues in genetic genealogy, including calculation of the number of “genetic units” in particular generations. As an example, 7 generations back you have 256 ancestors, but only 286 genetic units produced by recombination, so although, on average, you will have a chunk from each of those 256, it is entirely plausible to have zero (since inheritance is stochastic). It’s well worth browsing, and this and this are good places to start. (Thanks to rich lawler and S. Joshua Swamidass for the pointers.)


In February, I posted the syllabus for a seminar class entitled “Human Phylogeography” that I was teaching with my colleague Dave Rogers. The seminar was based primarily on a close reading of David Reich’s (2018) Who We Are and How We Got Here (published by OUP in the UK). Well, the class has concluded now, and so I thought I’d report back on what happened.

First, I’d like to say that the class was a success. We had 16 students, double the most I’ve ever had in a number of similar seminar courses over the years, and the students were very successful in engaging with the subject in both written and oral contributions to the class. One of the students was a history major, and towards the end of the semester a colleague in computer science mentioned that, quite coincidentally, he was reading the book, so he joined the class for the last few meetings. In many ways, it was what college is supposed to be like (though too often isn’t). I hope the students learned a lot. I did, and here is the first of the three most striking things I learned.

1. Recombination is a lot rarer than you think.

If you think back to the last time you studied genetics, you’ll recall the phenomenon of recombination, one aspect of which is crossing over. Crossing over occurs during meiosis. Chromosomes come in homologous pairs (23 pairs in humans, for 46 total), and in meiosis the homologues can exchange pieces with one another. The chromosomes physically touch and cross one another, which is observable under the microscope, and are called, appropriately enough, chiasmata (chiasma, sing.)

Image result for crossing over meiosis
From BioNinja, https://ib.bioninja.com.au/standard-level/topic-3-genetics/33-meiosis/crossing-over.html

Recombination is important for a variety of reasons (for one, it increases genetic variability), but for our current purposes its importance is that it breaks up the nuclear genome from 23 genetic units into more, and smaller, units (as opposed to the mitochondrial genome, which has a number of genes, but all are inherited as a single genetic unit, since there is no recombination in mitochondria). In humans, it turns out, there are only 1-2 crossovers per chromosome per generation (1.2 per chromosome in fathers, 1.8 in mothers).

Now, I’d always thought that crossing over occurred frequently enough that we could think of the genome as essentially infinitely divisible. (There are 3 billion base pairs in the human genome, so, in the limit, there would be 3 billion genetic units, so not quite infinite!) But, it turns out that crossovers occur sufficiently infrequently that there is an appreciable chance that, if you go enough generations back, you share NO genes with your ancestor. This is because the number of ancestors goes up fast (2, 4, 8, 16, 32, 64, 128, 256, etc.), but the breaking up of the genome into smaller units by crossing over isn’t fast enough to ensure that the probability of sharing nothing is near zero.

Here’s a figure from Reich’s book showing how blocks of genes are broken up by recombination.

From Reich, 2018.

You start with an entirely Neanderthal chromosome (dark), which enters the anatomically modern human population by hybridization. A few generations later, the Neanderthal chromosome has been broken up, but it still occurs as largish blocks amongst the anatomically modern sections (gray). Still later, the blocks are smaller and fewer. (We’re assuming continued backcrossing into the anatomically modern population, so the % Neanderthal decreases; there could also be selection causing changes in the frequency of Neanderthal alleles). Finally, a present day individual has his Neanderthal DNA broken up into even smaller bits.

Here’s a figure from a talk by Svante Pääbo, showing in the top row for each chromosome (there are 22 listed, from 1-22) the entire genome of “Oase Boy” from 40K years ago in Romania. The green lines are Neanderthal sites in his genome. The five rows below Oase Boy are five modern human individuals; the colored lines are their “Neanderthal bits”. Note that for each chromosome, Oase boy has the biggest block of Neanderthal genes (green fluorescece):

From Pääbo , 2018. (Click to enlarge.)

Because of the age of the Oase sample, some of the black lines are missing data, and so Pääbo infers that there are seven large continuous blocks of Neanderthal genes (yellow bars above the Oase Boy line). Note that the modern individuals have less Neanderthal DNA, and it is not in large blocks.

Because the size of the blocks breaks up in a statistically predictable fashion, you can get a “recombination clock“, so that based on the size of the blocks you can estimate how many generations ago the hybridization occurred. For Oase Boy, Pääbo estimated that his Neanderthal ancestor occurred 4-6 generations back (his great great, or great great great, or great great great great grandfather).

From Pääbo , 2018, showing Oase’s Neanderthal ancestor (red) in the 5th generation (it could also be in the 4th or 6th).

Because the placement and frequency of crossing over is stochastic (random), the situation must be statistically modeled to derive sound estimates, and there will be a range of plausible estimates. And, since some of the fossils are well dated by other means, we can also estimate the long term human generation time, as was done by Priya Moorjani and her colleagues: it’s 26-30 years.

So, the low rate of recombination allows us to construct a “recombination clock”, and to estimate generation times. This is great stuff!

It also solved for me what was a puzzle. You may recall that last year Elizabeth Warren released the results of DNA tests showing that she had American Indian ancestry several generations back. This essentially confirmed what her family’s oral history said. The amount of her Indian ancestry was small (less than 1%), and a range of generations (6-10) was provided by the analysis (as was done by Pääbo for Oase Boy).

Now, there are a number of ways which these ancestry tests can be criticized, one of the most difficult for them being that there are very few North American Indian genotypes in the database used, and thus “American Indian” relationship is indicated by relationship to Central and South American Indians. Some critics of Warren, however, made erroneous criticisms. She did not contend, as some accused her of, of saying the results showed she was Cherokee—with few if any Cherokee in the database, the ancestry tests could not determine this. (And tribal membership is a legal matter, anyway, not directly dependent on genetic similarity.)

But some critics said that the data were consistent with her having no Indian ancestry at all. I wondered how they could say that– there are 3 billion bp, and 1 % of that is still a very large number. But now I realize my error. There are very many fewer genetic units– more than 23, but a lot less than 3 billion!– due to low rates of recombination. And, because of this, if you go back several generations, there is an appreciable probability of sharing no DNA with an indubitable ancestor. I now believe the critics must have looked at the latter fact, and realized Warren may not have DNA from all of her ancestors, and thus suggested she may have no Indian ancestry. But their error is that in saying she may lack DNA from an ancestor, say, 8 generations back, they are invoking an a priori probability. But in Warren’s case, her DNA was examined, and showed that she did have Indian ancestry.


Gravel, S. 2012. Population genetic models of local ancestry. Genetics 191:607-619. pdf

Ho, S. Y., Chen, A. X., Lins, L. S., Duchêne, D. A., & Lo, N. 2016. The genome as an evolutionary timepiece. Genome Biology and Evolution 8: 3006–3010. pdf

Huff, C.D. et mult. 2011. Maximum-likelihood estimation of recent shared ancestry (ERSA). Genome Research 21:768-774. pdf

Moorjani P, Sankararaman S, Fu Q, Przeworski M, Patterson N, Reich D. 2016. A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years. Proceedings of the National Academy of Sciences USA 113:5652-7. pdf

Pääbo, Svante. 3 October 2018. A Neanderthal Perspective on Human Origins. (video: embedded below)

Reich, D. 2018. Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. Pantheon, New York.