by Greg Mayer
Jerry and I were both working independently on posts about the coronavirus. When we realized this, we conferred and decided to continue our efforts, but with some coordination and cross-fertilization. Jerry’s piece was posted on Friday.
[JAC: Greg has a “technical notes” section at the end which clarifies terms in the text that might confuse nonbiologists.]
1). Getting people vaccinated will impede the origin of new variants, because adaptive evolution is faster in larger populations. Widespread vaccination, by reducing the number of cases, will reduce the population size of the virus. Adaptive evolution is faster in large populations because selection is more effective in large populations; this is a well-known population-genetical result. And it’s also faster because large populations, by having a greater total number of mutations, explore more of the total mutational space—including the possibility of favorable double (or more) mutations in which the component single mutations are not favored but the combinations are. This is, in part, the principle behind the AIDS “cocktail” treatments: by attacking HIV in multiple ways at once, no single resistance-conferring mutation will allow the virus to escape, because if one drug doesn’t get it, another one will. Only having multiple mutations will confer resistance to the whole “cocktail”, but this is very improbable because the individual mutations, not being favored, will not accumulate. But in a very large sample (i.e., a large population), improbable things can happen.
There are also interesting issues of components of fitness or levels of selection in the evolution of viruses (or any disease-causing micro-organism, for that matter). Jerry discussed this in his piece, contrasting the evolution of virulence within an infected host versus transmissibility between hosts. These can be viewed as two components of reproductive fitness: competition to reproduce within the host, and competition to move to new hosts. Or it can be viewed as different levels of selection—individual selection among virus particles within hosts, and group selection between the populations of viruses between the hosts—they all get sneezed out to the next host as a group. The evolution of myxoma virus in rabbits in Australia, which Jerry discusses, has been interpreted from both points of view. The interest comes from the potential conflict between what’s “good” within the host (reproducing very rapidly), and getting to the next host. If you are too good at “taking over” the host, you might kill off the host before you can spread to the next host. And if you don’t spread, you go extinct. So, what’s good in the host may not be good for getting to the next host.
There’s also an interesting issue of what is the proper estimate of population size for the virus. Is it the number of viral particles? The number of hosts? For within-host selection, it would be the number of viral particles in that host. For selection between host populations, it might be nearer to the number of hosts. (I would guess that the theory for this has already been developed in the context of group selection theory.) Either way, fewer hosts, with lower viral loads within hosts, lowers the rate of adaptive evolution of the virus.
2.) By a *very* crude analysis, the UK variant does not show evidence of selection on its protein sequences. The ratio of Nonsynonymous (N) to Synonymous (S) mutations is 13/6 = 2.17, which is very close to the expected ratio of 2.66 for neutral (i.e., unselected) mutation in a completely *random* genome. The defect of this analysis is that the virus’s genome is of course not random. I would expect that someone with the genomic sequence and the right software is already carrying out a proper analysis using the actual nucleotide and codon distribution of the virus. (In fact, I wouldn’t be surprised if it’s already been done; not being a virologist, I don’t follow that literature.) A second, and perhaps more important defect, which would apply even to a proper analysis, is that nonsynonymous/synonmymous ratios average over sites for a whole protein or genomic sequence, so even strong selection at one or a few sites in a protein can be lost in a sea of neutral change in the rest of the protein. (See Technical note below for more details.)
There are other ways of inferring selection, and Jerry stressed one of those: if the virus evolves in parallel in multiple locations, that suggests the action of selection. We seem to be seeing that, independently, in several different locations, the same variant is spreading widely and increasing in frequency. If the variants were neutral, their frequencies would change only due to chances of sampling and which variant happened to get somewhere first, so we wouldn’t expect the same variant to “get lucky” and take over all the time.
Another hint of selection would be if substitutions affecting function (such as nonsynonymous mutations and deletions) are concentrated in a part of the genome known to be of adaptive significance, such as the spike protein. That protein is a highly functional part of the virus, for it’s the part it uses to stick to host cells. The UK variant shows at least two nonsynonymous mutations and one deletion in the spike protein, but without full data, I can’t say if this is a greater than expected number for the spike protein (which forms ca. 10% of the genome).
3). The variants are differentiated strains, not “mutations”. The identified variants differ by multiple substitutions, and thus are not a mutation, but the accumulation of multiple mutations. Some substitutions in a strain may be subject to selection, but others will not be. If we think of the virus as a “species” (which, being a collection of asexual lineages, is not quite what the virus is), then the variants or strains are like “subspecies”: differentiated descendants of a common ancestor, differing in a number of ways, some of which may be adaptive, while others may not be. (In biological species, subspecies interbreed, and thus are a form of geographical variation; in viruses, however, the variants can exist without interbreeding in the same geographic area, including inside the same host, so the analogy to subspecies is inexact.)
4). Some of the media, or at least reporter Apoorva Mandavilli of the NY Times, are grasping that virus evolution is key to the course of the pandemic. Words and phrases in her article include: “selection pressure”, “evolve” (4 times!), “evolving”, “evolutionary biologist”, “adaptation”, and “coronavirus can evolve to avoid recognition”. And here’s a statement in the article of the distinction between genetic drift and selection:
Some variants become more common in a population simply by luck, not because the changes somehow supercharge the virus. But as it becomes more difficult for the pathogen to survive — because of vaccinations and growing immunity in human populations — researchers also expect the virus to gain useful mutations enabling it to spread more easily or to escape detection by the immune system.
This article is a pretty direct affirmation of the importance of understanding how evolution works when dealing with viral diseases.
5). After the AIDS epidemic, we all should have learned the importance of evolutionary biology for transmissible diseases. The lessons learned during the spread, evolution, and control of HIV and other viruses are so clear that they have become textbook examples of evolutionary principles, from elementary grades to college texts. Epidemics are all about evolution.
6.) You should call it the “UK variant”. The article at Ars Technica from which I got the (limited) genomic data I used above, falls over itself trying not to use geographic terms because they cause “stigma”. This is stupid. One of the oldest practices in taxonomy is to name species after the place they are found. The native anole of the southern United States is named Anolis carolinensis, because the description was based on lizards supposed to be from Carolina. It was later found to occur all over the southeastern United States, with closely related forms (sometimes considered conspecific) on a number of West Indian islands. It has also been introduced all over the world, from California to Hawaii to Japan. It is still Anolis carolinensis. Stability of names is important, and names related to place are a useful mnemonic, since they require no knowledge of Latin or an arcane numbering system. (The article refers to the UK variant as “B.1.1.7”. If there’s only one variant this might do, but with multiple ones it becomes an exercise in memorization.)
Technical note. “Nonsynonymous” mutations are mutations of the DNA sequence which change the amino acid structure of the resulting protein. Because the genetic code is redundant (DNA codes for the same amino acid in more than one way), some mutations are “synonymous”, resulting in an unchanged protein. There are 549 possible mutations of the 61 amino acid coding codons (61 codons X 3 nucleotides per codon X 3 possible nucleotides to change into). Of these possible mutations, 399 are nonsynonymous and 150 are synonymous. (I couldn’t find these numbers anywhere, so I counted them up myself from the table in Muse and Gaut (1994); my count could be off, but, I hope, not by much.) If a protein coding DNA sequence has a completely random sequence (i.e. all 61 protein coding codons are equally represented), then mutations occurring at random will occur with a nonsynonymous to synoymous ratio of
N/S = 399/150 = 2.66
and, if the mutations are neutral, will be fixed (i.e. will reach a frequency of 100%) in the same ratio, which is where I got the expected N/S ratio of 2.66 for evolution by neutral mutation.
However, the DNA sequence is not random, so we usually express the nonsynonymous/synonymous ratio by looking at the rate of substitution per site. Thus, we divide the the number of nonsynonymous mutations by the number of nonsynonymous sites (i.e. the number of nucleotide positions which would give rise to a nonsynonymous amino acid if mutated), and similarly for synonymous mutations. This gives us the dN/dS ratio, which is expected to be 1 under neutrality, because we have normalized by the expected rates of each type of mutation. It is greater than 1 when there is positive selection in favor of new mutations. In calculating dN/dS, adjustments can be made for known biases in the process of mutation (e.g. the different rates at which mutations which change the ring structure of the nucleotides occur).
dN/dS ratios are subject to some of the same limitations as raw N/S ratios, including the averaging effect noted above. Yang and Bielawski (2000) is a modestly readable introduction to using rates of nonsynonymous versus synonymous substitution to detect selection.
Charlesworth, B. and D. Charlesworth. 2010. Elements of Evolutionary Genetics. Roberts, Greenwood Village Colorado. An upper level text, but not as daunting as some. Amazon
Diamond, J., ed. Virus and the Whale: Exploring Evolution in Creatures Small and Large. NSTA Press, Arlington, Va. Uses HIV as an example of viral evolution. Amazon
Emlen, D. J. and C. Zimmer. 2020. Evolution: Making Sense of Life. 3rd ed. Macmillan, New York. Uses influenza as an example of viral evolution. Amazon
Herron, J.C. and S. Freeman. 2014. Evolutionary Analysis. 5th ed. Pearson. Uses HIV as an example of viral evolution. publisher
Muse, S.V. and B.S. Gaut. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Molecular Biology and Evolution 11:715-724. pdf
Yang, Z. and J.P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution 15:496-503. pdf
h/t Brian Leiter for the Ars Technica piece.