JAC:
In this post, Matthew—who has considerable expertise in this area—answers a student’s question about the genetic code that was sent to me yesterday. I immediately handed it off to Matthew, who was nice enough to turn the answer into a post. He is, of course, writing a popular “trade” book abut the genetic code.
In case you’re not sure what is meant by the “genetic code,” it refers to how the sequence of bases in DNA (there are four such bases) are translated into amino acids, the constituents of proteins and the products of most genes. As Matthew describes below, it’s a “triplet” code: each adjacent group of three DNA bases codes for a single amino acid. Since there are four bases, there are 64 possible triplets (“codons”) that, in total, code for 20 amino acids. That means that some amino acids are coded for by more than one triplet sequence.
Here is the code based on the RNA translation of the DNA (DNA is transcribed into RNA before it is translated into proteins). For any sequence of three bases, you read the first one down the column to the left, the second across the top, and the third on the column on the right. So, for example, CAU would be “His,” or the amino acid histidine. “Stop” refers to stop codons: when the process of protein-making in the ribosomes encounters this codon, the translation is stopped and the string of amino acids ends.
It is the near-universality of this code (Matthew’s post is about the rare exceptions) that gives us confidence that modern life traces back to a single ancestor. If there was more than one origin of life, and its descendants independently developed the DNA—>protein system, it would be very unlikely that all modern species would have the same code.
by Matthew Cobb
Glendon Wu, an immunology PhD student at the University of Pennsylvania, wrote in with a question. He was in a lecture the other day and learned that mitochondria – small energy-producing structures found in the cells of all multicellular organisms and also some single celled organisms like yeast (this group is called the eukaryotes) – contain a different genetic code to the rest of us. In other words, your cells contain two different versions of the genetic code – one for your human DNA, the other for the DNA in your mitochondria. Glendon was understandably intrigued about this and wanted to know more.
As it happens, I’m just putting the finishing touches to a popular science book about the race to crack the genetic code (Life’s Greatest Secret). Although the historical part finishes in 1967, the final three chapters bring the story up to date, which includes the existence of alternative genetic codes. What follows is an adapted version of part of one of those chapters.
The genetic code is contained in your DNA, and consists of 64 different three-letter ‘words’ (known as triplets or codons), 61 of which code for the 20 amino acids your body uses to make proteins, and three of which say ‘stop’. One codon codes both for an amino acid and for ‘start’.
We have four different kinds of letter (A, C, G and T in our DNA; as the genetic message is expressed, it passes into RNA, where T is replaced by U), so with four possible letters in each of three positions in a codon, we have 4 x 4 x 4 = 64 different codons.
In 1967, the last word of the genetic code was deciphered. Appropriately enough, it was the third stop codon, which reads UGA (for complicated reasons it was nicknamed opal). Everyone working on the genetic code assumed that the code would be universal, that is, all life on Earth would use the same way of representing amino acids in DNA and RNA. As Jacques Monod put it in 1961, ‘what is true for E. coli is true for an elephant’.
In November 1979, a group at Cambridge discovered that human mitochondria, UGA does not encode stop but instead produces an amino acid, tryptophan. Not only is the genetic code is not universal, the same organism can contain two different genetic codes, one in its genomic DNA, the other in its mitochondria.
This fact tells us something fundamental about the history of life on our planet. In 1967, the US biologist Lynn Margulis began arguing that mitochondria were not merely micro-structures within our cells, but were remnants of an independent single-celled organism that had fused with the ancestor of all eukaryotic organisms, billions of years ago, probably as part of a symbiotic relationship. She was not the first to come up with this idea – in the early years of the 20th century both Paul Portier and Ivan Wallin suggested that mitochondria might be symbionts.
Margulis argued that these symbiotic bacteria subsequently found themselves trapped in every one of our cells and lost all their independence, but not their own, separate genome – a tiny ring of DNA about 16,500 bases long (in comparison, the human nuclear genome contains about 3 billion bases). It appears that all mitochondria, in all the eukaryotes on the planet, have a common ancestor that was alive over 1.5 billion years ago.
Similar things happened in plants, which gained their power-generating chloroplast organelles in a similar way. In both cases there are arguments over exactly what kind of microbe fused with what, and above all the speed with which it took place, but most scientists now think that there was a single event, which enabled what was effectively a hybrid organism to grow larger and to acquire the energy required by more complex organisms.
The extremely small nature of the mitochondrial genome, and its peculiar use of codons, can be explained in terms of the history of this symbiotic relationship. The mitochondrial genome codes for very few proteins – most of the other genes were lost before or shortly after fusion with our ancestors or were incorporated into the genomic DNA of the host – so the appearance of new codons in mitochondrial DNA through mutation would not have had an important effect on the symbiont, most of whose needs were provided by the host cell.
Mitochondria are not alone in having an unusual genetic code. In a series of discoveries beginning in 1985 it was found that single-cell ciliates – tiny organisms like Paramecium – show variants of the nuclear genetic code that have appeared several times during their evolution. In some species of ciliate, UAA and UAG code for glutamate rather than stop, with only UGA encoding stop, while in others UGA codes for tryptophan.
In a few rare instances in single-celled organisms without a nucleus, UGA and UAG have even been recoded by natural selection to code for extra amino acids, not normally found in life – selenocysteine and pyrrolosine, respectively. A recent study of 5.6 trillion base pairs of DNA from over 1700 samples of bacteria and bacteriophages isolated from natural environments, including on the human body, revealed that in an important proportion of the sequences, stop codes had been reassigned to code for amino acids, while an investigation of hitherto unstudied microbes revealed that in one group UAG had been reassigned from stop to code for glycine.
More than 15 alternative or non-canonical genetic codes are known to exist, and it can be assumed that more remain to be discovered. The non-canonical codes almost always involve the reassignment of stop codons; this may indicate that there is something about the machinery involved in stop codons that makes them particularly susceptible to change, or it may simply be that as long as the organism can still code stop using another codon, reassigning one stop codon to an amino acid does not cause any important problems.
The exact process by which codon change takes place has been the focus of a great deal of theoretical and experimental research, with a number of hypotheses put forward to explain how variant codes might arise.
The current front-runner is called the codon capture model, and was first put forward in 1987 by Jukes and Osawa. According to this model random effects such as genetic drift can lead to the disappearance of a particular codon in a given genome; similar effects than those that lead to the codon being captured by a tRNA that codes for another amino acid.
A recent experimental study of genetically engineered bacteria in which some codons had been artificially replaced supported this model, and even suggested that reassignment of codons could be advantageous in some circumstances, providing the organism with expanded functions.
The responses of scientists to the non-universality of the genetic code reveal something important about the nature of biology. It was completely unexpected, and went against all the assumptions of all the researchers who had been studying the genetic code, showing that Monod was wrong – what is true for E. coli is not necessarily true for an elephant. But despite this revolution, the basic positions established during the cracking of the genetic code remained intact.
The strict universality of the code was not a law, nor even a requirement. The only requirement is that any divergence from this assumption can be explained within the framework of evolution, and through testable hypotheses about the history of organisms. This has been amply met.
Although the genetic code is not strictly universal, there is no dispute that life as we know it evolved only once, and that we all descend from a population of cells that lived over 3.5 billion years ago, known as the Last Universal Common Ancestor, or LUCA. The alternative codes are what are called derived features – they have appeared after all present life evolved.
The fact that all organisms use amino acids with a left-handed orientation, and the universality of RNA as a way of stringing amino acids together to make a protein are both very strong arguments that support this hypothesis. In 2010 Douglas Theobold calculated that the hypothesis that all life is related ‘is 102,860 times more probable than the closest competing hypothesis.’
The variations in the code that have been discovered can be explained either in terms of the deep evolutionary history of eukaryotes – thereby revealing the thrilling fact that our evolution has hinged on the chance fusion of two cells – or in something recent and local in the life-history of a particular group of organisms, which is what seems to have happened in the case of ciliates.
Hope that answers the question, Glendon!

















