Guest post: Why the genetic code is not universal

September 28, 2014 • 6:16 am


In this post, Matthew—who has considerable expertise in this area—answers a student’s question about the genetic code that was sent to me yesterday. I immediately handed it off to Matthew, who was nice enough to turn the answer into a post.  He is, of course, writing a popular “trade” book abut the genetic code.

In case you’re not sure what is meant by the “genetic code,” it refers to how the sequence of bases in DNA (there are four such bases) are translated into amino acids, the constituents of proteins and the products of most genes.  As Matthew describes below, it’s a “triplet” code: each adjacent group of three DNA bases codes for a single amino acid.  Since there are four bases, there are 64 possible triplets (“codons”) that, in total, code for 20 amino acids. That means that some amino acids are coded for by more than one triplet sequence.

Here is the code based on the RNA translation of the DNA (DNA is transcribed into RNA before it is translated into proteins). For any sequence of three bases, you read the first one down the column to the left, the second across the top, and the third on the column on the right.  So, for example, CAU would be “His,” or the amino acid histidine. “Stop” refers to stop codons: when the process of protein-making in the ribosomes encounters this codon, the translation is stopped and the string of amino acids ends.

It is the near-universality of this code (Matthew’s post is about the rare exceptions) that gives us confidence that modern life traces back to a single ancestor. If there was more than one origin of life, and its descendants independently developed the DNA—>protein system, it would be very unlikely that all modern species would have the same code.



by Matthew Cobb

Glendon Wu, an immunology PhD student at the University of Pennsylvania, wrote in with a question. He was in a lecture the other day and learned that mitochondria – small energy-producing structures found in the cells of all multicellular organisms and also some single celled organisms like yeast (this group is called the eukaryotes) – contain a different genetic code to the rest of us. In other words, your cells contain two different versions of the genetic code – one for your human DNA, the other for the DNA in your mitochondria. Glendon was understandably intrigued about this and wanted to know more.

As it happens, I’m just putting the finishing touches to a popular science book about the race to crack the genetic code (Life’s Greatest Secret). Although the historical part finishes in 1967, the final three chapters bring the story up to date, which includes the existence of alternative genetic codes. What follows is an adapted version of part of one of those chapters.

The genetic code is contained in your DNA, and consists of 64 different three-letter ‘words’ (known as triplets or codons), 61 of which code for the 20 amino acids your body uses to make proteins, and three of which say ‘stop’. One codon codes both for an amino acid and for ‘start’.

We have four different kinds of letter (A, C, G and T in our DNA; as the genetic message is expressed, it passes into RNA, where T is replaced by U), so with four possible letters in each of three positions in a codon, we have 4 x 4 x 4 = 64 different codons.

In 1967, the last word of the genetic code was deciphered. Appropriately enough, it was the third stop codon, which reads UGA (for complicated reasons it was nicknamed opal). Everyone working on the genetic code assumed that the code would be universal, that is, all life on Earth would use the same way of representing amino acids in DNA and RNA. As Jacques Monod put it in 1961, ‘what is true for E. coli is true for an elephant’.

In November 1979, a group at Cambridge discovered that human mitochondria, UGA does not encode stop but instead produces an amino acid, tryptophan. Not only is the genetic code is not universal, the same organism can contain two different genetic codes, one in its genomic DNA, the other in its mitochondria.

This fact tells us something fundamental about the history of life on our planet. In 1967, the US biologist Lynn Margulis began arguing that mitochondria were not merely micro-structures within our cells, but were remnants of an independent single-celled organism that had fused with the ancestor of all eukaryotic organisms, billions of years ago, probably as part of a symbiotic relationship. She was not the first to come up with this idea – in the early years of the 20th century both Paul Portier and Ivan Wallin suggested that mitochondria might be symbionts.

Margulis argued that these symbiotic bacteria subsequently found themselves trapped in every one of our cells and lost all their independence, but not their own, separate genome – a tiny ring of DNA about 16,500 bases long (in comparison, the human nuclear genome contains about 3 billion bases). It appears that all mitochondria, in all the eukaryotes on the planet, have a common ancestor that was alive over 1.5 billion years ago.

Similar things happened in plants, which gained their power-generating chloroplast organelles in a similar way. In both cases there are arguments over exactly what kind of microbe fused with what, and above all the speed with which it took place, but most scientists now think that there was a single event, which enabled what was effectively a hybrid organism to grow larger and to acquire the energy required by more complex organisms.

The extremely small nature of the mitochondrial genome, and its peculiar use of codons, can be explained in terms of the history of this symbiotic relationship. The mitochondrial genome codes for very few proteins – most of the other genes were lost before or shortly after fusion with our ancestors or were incorporated into the genomic DNA of the host – so the appearance of new codons in mitochondrial DNA through mutation would not have had an important effect on the symbiont, most of whose needs were provided by the host cell.

Mitochondria are not alone in having an unusual genetic code. In a series of discoveries beginning in 1985 it was found that single-cell ciliates – tiny organisms like Paramecium – show variants of the nuclear genetic code that have appeared several times during their evolution. In some species of ciliate, UAA and UAG code for glutamate rather than stop, with only UGA encoding stop, while in others UGA codes for tryptophan.

In a few rare instances in single-celled organisms without a nucleus, UGA and UAG have even been recoded by natural selection to code for extra amino acids, not normally found in life – selenocysteine and pyrrolosine, respectively. A recent study of 5.6 trillion base pairs of DNA from over 1700 samples of bacteria and bacteriophages isolated from natural environments, including on the human body, revealed that in an important proportion of the sequences, stop codes had been reassigned to code for amino acids, while an investigation of hitherto unstudied microbes revealed that in one group UAG had been reassigned from stop to code for glycine.

More than 15 alternative or non-canonical genetic codes are known to exist, and it can be assumed that more remain to be discovered. The non-canonical codes almost always involve the reassignment of stop codons; this may indicate that there is something about the machinery involved in stop codons that makes them particularly susceptible to change, or it may simply be that as long as the organism can still code stop using another codon, reassigning one stop codon to an amino acid does not cause any important problems.

The exact process by which codon change takes place has been the focus of a great deal of theoretical and experimental research, with a number of hypotheses put forward to explain how variant codes might arise.

The current front-runner is called the codon capture model, and was first put forward in 1987 by Jukes and Osawa. According to this model random effects such as genetic drift can lead to the disappearance of a particular codon in a given genome; similar effects than those that lead to the codon being captured by a tRNA that codes for another amino acid.

A recent experimental study of genetically engineered bacteria in which some codons had been artificially replaced supported this model, and even suggested that reassignment of codons could be advantageous in some circumstances, providing the organism with expanded functions.

The responses of scientists to the non-universality of the genetic code reveal something important about the nature of biology. It was completely unexpected, and went against all the assumptions of all the researchers who had been studying the genetic code, showing that Monod was wrong – what is true for E. coli is not necessarily true for an elephant. But despite this revolution, the basic positions established during the cracking of the genetic code remained intact.

The strict universality of the code was not a law, nor even a requirement. The only requirement is that any divergence from this assumption can be explained within the framework of evolution, and through testable hypotheses about the history of organisms. This has been amply met.

Although the genetic code is not strictly universal, there is no dispute that life as we know it evolved only once, and that we all descend from a population of cells that lived over 3.5 billion years ago, known as the Last Universal Common Ancestor, or LUCA. The alternative codes are what are called derived features – they have appeared after all present life evolved.

The fact that all organisms use amino acids with a left-handed orientation, and the universality of RNA as a way of stringing amino acids together to make a protein are both very strong arguments that support this hypothesis. In 2010 Douglas Theobold calculated that the hypothesis that all life is related ‘is 102,860 times more probable than the closest competing hypothesis.’

The variations in the code that have been discovered can be explained either in terms of the deep evolutionary history of eukaryotes – thereby revealing the thrilling fact that our evolution has hinged on the chance fusion of two cells – or in something recent and local in the life-history of a particular group of organisms, which is what seems to have happened in the case of ciliates.

Hope that answers the question, Glendon!

91 thoughts on “Guest post: Why the genetic code is not universal

  1. Fantastic stuff! It made perfect sense to me (though it’s tough holding it my head at once). I did graduate 1986 Biochem BA, so I already had most of the molecular genetics under the belt.

    What gets me is how much of this I did not know at the time (the 1985 paramecium result, e.g. – or anything later, as I have not kept up in the field in this respect). Really amazing to see the leaps forward, since I was mucking around doing recombinant DNA experiments.

    1. Know whatcha mean! I was last in academia in 1979…We knew about the (well-accepted, IIRC) exogenous origin of chloroplasts & mitochondria (though I guess not that their genetic code was different), but all the stuff about Monera and bacteria was new to me. I thought the part about the anucleate organisms that code for amino acids otherwise not observed in life was fascinating!

  2. Very interesting. I have been reading about the evidence for evolution in the genetic code, and these details add important parts of the picture for me.
    One of the interesting features of the code is that it is not entirely haphazard. It is, rather, arranged to be fairly ‘robust’ so that mutations in DNA or errors in protein synthesis have a reduced chance of doing fatal harm to a protein. For example, amino acids that are used frequently in proteins are specified by several different codons so that mistakes involving them are more likely to still specify the correct amino acid during translation! I think that is very cool.
    One can see in the code above that the stop codons are amongst the codons for the amino acids tryptophan, tyrosine, and cysteine. So the stop codons ‘almost’ specify one of these amino acids. These amino acids are therefore specified by a small # of codons, and as expected they are not used very often in proteins.
    It is possible that the non-canonical codes where a stop codon was reassigned are a little ‘frozen accident’ and not by selection. It is also possible that these cells experienced selective pressure to use more tryptophan or another of the less popular amino acids. In this scenario a stop codon was selectively reassigned to help out.

  3. I doubt that I would understand the explination, but I’d love to learn why/how a codon in one case means STOP and not in another. I suspect that it might “context sensitive”.

    Also, it seems to me that I recently read about right handed amino acids being also used. Apparently they are not coded for, but are produced by later reactions that.

    1. The RNA bases A U C and G pair specifically with each other by by hydrogen bonds. A pairs with U, and C pairs with G.
      The codons that specify an amino acid in the table above are those that can base pair with a transfer RNA (tRNA) molecule that carries that amino acid. So the codon AUG, for example, pairs with a tRNA that has the bases UAC. That tRNA carries the amino acid called methionine, and that is why the codon AUG is said to specify methionine during protein synthesis.
      What all this means is that the genetic code is pretty contrived. There is no real ‘deep’ meaning for why a codon specifies a given amino acid. It only specifies a given amino acid because the tRNA molecules broker particular deals for some reason.
      A stop codon does not fully match any tRNA, so it cannot specify an amino acid. When a stop codon is encountered during protein synthesis a bunch of proteins muscle into the synthesis machinery and put a stop to protein synthesis.

      1. This is presumably not the whole story. The length of the mRNA molecule must also play some role; it must be long enough, for instance, to contain the entire sequence of the protein being synthesized.

        So something must tell the mRNA polymerase where to start and stop transcribing the DNA sequence, and one imagines start and stop codons are involved somehow.

        1. Yes. There are sequences in DNA near the beginning of a gene that tells RNA polymerase to bind here and to use this strand of DNA to start making an RNA molecule. There are later sequences at the other end of the gene that tell RNA polymerase to stop making RNA, lest it get too long.
          The mRNA (RNA used as a template to make a protein) contains a sequence that tells where protein synthesis should start. That is a start codon. And a stop codon tells protein synthesis to stop.

        2. There are bits of sequence that signal the start and the stop of transcription; but they have nothing to do with start or stop codons and often lie far outside the gene.

      2. I really was not trying to look for a “deep meaning”, just wondering why in one case a specific codon is a stop and in another it codes for tryptophan. I suspect I’d need to know more about how tRNA was produced.

        I’m just a computer scientist with a late developing interest in biology 🙂

    2. Right handed amino acids (AAs) can be biologically useful, since they often give another chemical response in the cell. And I assume that the tendency for AAs to go racemic over time means cells early evolved means to recognize and handle them, often with enzymes it seems (see below).

      As I understand it, new analytical methods has meant that the research on them has exploded in medicine the last decade or so:

      “D-Amino Acids: A New Frontier in Amino Acid and Protein Research – Practical Methods and Protocols

      Editors: Ryuichi Konno (Dokkyo Univ. School of Medicine), Hans Brückner (Justus-Leibig-Universität Giessen) , Antimo D’Aniello (Stazione Zoologica “A. Dohrn”), George H. Fisher (Barry Univ.), Noriko Fujii (Kyoto Univ.) and Hiroshi Homma (Kitasato Univ.)

      Book Description:
      D-amino acids have been considered as unnatural amino acids and it has been the common belief that D-amino acids are not present in eukaryotes. However, improvements and developments of analytic methods have shown the D-amino acids are present in a considerable amount of eukaryotes and even in humans. Some of them have been shown to have physiological functions. In this book, all aspects of D-amino acid research are described: analytic methods for D-amino acids,the presence of various D-amino acids in a wide variety of organisms, nutritional aspects of D-amino acids, anabolic and catabolic enzymes for D-amino acids, physiological significance of D-amino acids, pathology of D-amino acids, industrial aspect of D-amino acids etc.”

      [ ; 2007]

  4. In a way I would have preferred if this post, and book for that matter had not been written. The reason is that for years Jonathan Wells ( and I think Paul Nelson?) have been using non-universality as an argument against evolution. The argument goes that originally the universality of the code was used as evidence for evolution, so now that we know it isn’t universal, shouldn’t this be evidence against evolution? In this case of course the exception proves the rule and the details are far more compelling evidence for evolution. But I would have preferred if this was brought up to them as an unpleasant surprise in a public debate. Now they’ll just move on to other arguments.
    A topic I’ve been interested in but only found bits and pieces of info is the optimization of the code. ( Perhaps MC can address it?) I vaguely recall hearing decades ago that there are an astronomically large number of possible codes but imagine most these are highly implausible. A small number of the millions of plausible ones are optimized to minimize the effects of mutation- about 2000. And among these there are about 25 that are optimized regarding compressibility of regulatory sequences. The optimization gives only a slight advantage and, it seems to me, all of this only makes sense in the context of evolution and competing codes during a prebiotic quasi-life period. Are my recollections correct?

    1. Seriously, you want us to withhold interesting facts from the public because those facts will become known to creationists and help them debate us? Sorry, but I (and I’m sure I speak for Matthew here)prefer that everyone know the facts. I can’t believe you want us to keep this to ourselves so we can surprise creationists at a debate. First of all, they probably know it anyway, and second, debating creationists is a useless exercise, and rarely changes anyone’s mind. It’s an exercise in rhetoric, and “debates” are best done without debating: by addressing creationist foolishness in solo talks and articles.

      1. Also, lantog, you’re misusing the phrase “exception that proves the rule.” Sorry to be a pedant, but that one bothers me, because when used incorrectly it is intellectually worthless and the opposite of what it really means. So I guess I’m not really sorry.

        1. (My other reply was for JAC, this one is for pacopico. I cant seem to click the oorrect ‘reply’ )

          Perhaps there was a better phrase to summarize what I was trying to get at then ‘exception…etc’ What I meant was that the exceptions to the code were clearly derived from the universal code. On the other hand, an IDer who invoked some motivation for why the designer would create a universal code would be hard put to explain the exceptions.

          1. Perhaps the exception does here prove the rule; in a correct use of the term, “No parking on Thursdays” proves that there is parking on other days, and advantageous deviations from the code in special circumstances support the view that the code itself is the product of evolution. This view undermines one of the more plausible creationist arguments, based on the exceptional robustness (referred to earlier) of the code. Not only do the most common amino acids have several codons, but many non-degenerate errors lead to the replacement of the correct amino acid by one with similar properties.

              1. Wikipedia following Fowler: BEGINQUOTE “The exception [that] proves the rule” means that the presence of an exception applying to a specific case establishes (“proves”) that a general rule exists. For example, a sign that says “parking prohibited on Sundays” (the exception) “proves” that parking is allowed on the other six days of the week (the rule). A more explicit phrasing might be “The exception that proves the existence of the rule.” …

                The phrase is derived from a legal principle of republican Rome: exceptio probat regulam in casibus non exceptis (“the exception confirms the rule in cases not excepted”), a concept first proposed by Cicero in his defence of Lucius Cornelius Balbus. This means a stated exception implies the existence of a rule to which it is the exception. The second part of Cicero’s phrase, “in casibus non exceptis” or “in cases not excepted,” is almost always missing from modern uses of the statement that “the exception proves the rule,” which may contribute to frequent confusion and misuse of the phrase.ENDQUOTE

                And who am I to quarrel with Cicero?

      2. Well, not exactly seriously. ( I said “in a way”) Some people like crossword puzzles, I like the give and take of the ongoing debate with ID but yes, I know its pointless. I don’t think people who are genuinely interested in science should be deprived of info because of tactics in the debate with creationists. I still think it would be have been nice to have been brought out as a slam dunk in a debate…or the next Dover case.

    2. I can not recall the specific numbers, but your recollections are otherwise correct. The genetic code seems optimized to reduce harm caused by errors. One feature is that if an error occurs translation still has an elevated chance to still specify the correct amino acid. The other is that even if an error causes a different amino acid to be specified, there is still a chance it will specify a chemically similar amino acid so the protein might still work.

    3. All life that we know about shares a near-universal genetic code. A few small variations does not undermine this as strong evidence that all life that we know about evolved from a Universal Common Ancestor. We expect variation; it is the degree of conservation over billions of years that is remarkable.

      If a new form of life were discovered in (say) an ocean trench, with a COMPLETELY different genetic code, how would that constitute evidence for creation? Such a new discovery would not challenge the notion that humans, chimps and E.coli have a common ancestor, because we already know that they DO share the same genetic code.

      A new organism with a different genetic code would be fascinating, perhaps showing that life arose twice, independently. Perhaps even that the entire process of abiogenesis occurred twice independently.

      But I’m not really seeing any “gotcha” for the creationists.

      1. … and it would show that life is not miraculous, but something which occurs here and there. I must confess that I would be disappointed if extraplanetary life found on Mars, Titan or elsewhere would have more or less the same code as here.

        1. That would be very weird. But I do expect that alien life would be organic, and would have organic versions of information storage like DNA and RNA, and organic molecules for catalytic reactions.

          1. My father (an organic chemist) gets mildly annoyed at all the “silicon based” SF one finds. Instead he proposes very slow metabolism ammonia-based life (instead of us, which are water based).

    4. Well, not exactly seriously. ( I said “in a way”) Some people like crossword puzzles, I like the give and take of the ongoing debate with ID but yes, I know its pointless. I don’t think people who are genuinely interested in science should be deprived of info because of tactics in the debate with creationists. I still think it would be have been nice to have been brought out as a slam dunk in a debate…or the next Dover case.

    5. The genetic code is so nearly universal that it proves a single origin of life beyond reasonable doubt, and at the same time exists in a relatively small number (25) of alternative states that can be ordered into a nested set of phylogenetic characters and used, like other rare genomic changes, to help resolve the true details of evolutionary history. That’s marvellous.

  5. What I would really love to know is if our code, and more or less the selection of amino acids life on this planet uses, are inevitable or the result of random chance. That is, if life independently arisen on another planet would have approximately the same code and biochemistry because that is just what selection and biochemical constraints produce when life arises, or whether myriads of other codes are just as likely to evolve.

    1. RNA seems fairly unique from a thermodynamic and enzymatic viewpoint. It is Goldilock enough stable while accessible for early redox chemistries and it is enzymatic in an anoxic FeII-rich Hadean/Archean type terrestrial ocean.

      The first 10 or so amino acids evolved, as seen by protein usage, are the ones that are produced abiotically by various sources. [ ]

      Our genetic code is a frozen accident, a local stereochemistry optimum (linking code to amino acid by steric interactions) but not a global one according to Koonin et al. [ ]

      My bet would be that alien cellular life forms will look superficially like ours, with a genetic code that is analogous but quite different due to the last point.

      1. Thanks – so I understand what you are saying is that it seems most likely aliens would use more or less the same amino acids, but may have a very different code.

        Will have a look at those references later.

        1. As I understand it, amino acids would basically be expected. DNA and / or RNA would not be unexpected; there’s some reason to suspect that basic chemistry strongly favors them. What’s guaranteed to be different (given a separate abiogenesis event) is the mapping of DNA (or RNA) codons to specific amino acids. And, if we do find a substantially similar mapping, that’s powerful evidence for some form of panspermia.

          That’s actually not as wild an hypothesis as one might think at first blush. There’ve been plenty of meteor impacts that’ve blasted terrestrial matter not just into orbit but to escape velocities, and there’re extremophiles (especially certain bacteria) that could reasonably survive those conditions in some sort of spore stage. And regular orbital perturbations could reasonably carry such matter to other planets. There are, for example, Mars rocks that have been discovered on Earth.

          However…any environment friendly to a drifting bacterium would also, of necessity, be friendly to its own abiogenesis event. As such, timing is critical; the invader would have to be sufficiently more sophisticated than the native life to out-compete it. But the window in which that could happen is quite narrow…still, if one planet had an head start (because of, say, differential cooling rates) of a few hundred million years over another, that could possibly do the trick.

          Over interstellar distances? Not a chance.



    2. Bad C&P during final edit:

      “The first 10 or so amino acids evolved” should be “The first 10 amino acids coded for”.

  6. In digital communications, messages are “error correction coded” in a way that makes it possible to correct errors in the message. Also, an “error detection code” is usually appended to the encoded message to make it possible to detect any residual errors after the message is decoded. (This technology lies at the heart of modern cell phone and satellite communications). I wonder if there are any analogues to these error correction and error detection codes in the world of genetic codes?

    1. There’s redundancy in the code, as evidenced by duplicate entries in the amino acid table Jerry posted.

      But there are no checksums or CRCs or anything like that, because there’s no molecular computer capable of calculating and checking them.

  7. “there is no dispute that life as we know it evolved only once, and that we all descend from a population of cells that lived over 3.5 billion years ago, known as the Last Universal Common Ancestor, or LUCA.”

    Is it realy correct to say this? Could not life have arisen more than once, and only one lineage survived? If ongoing attempts to find a “shadow biosphere” are successful, wouldn’t that falsify that statement?

    1. That’s what the ‘life as we know it’ getout clause means. There may have been lots of kinds of life, all we can see is our own, and we can’t directly see back how to that almost certainly arose – using RNA, not DNA – because there are no physcial or chemical traces. The ‘shadow life’ idea is pretty silly, in my opinion. If there is non-DNA life out there, it is more likely to a) be deep in the ocean, where DNA life would not eat it and b) it is most likely going to be a remnant of the RNA world. Finally, as Darwin said, life may be continually evolving from inanimate matter, but it would suffer a very rapid fate at the hands (or suckers) of the life-forms that dominate the planet now: we would eat it.

      1. Re the existence of physical or chemical traces, Lane and Martin got the claim that “[observations suggest that] the processes of biochemical energy conservation and geological energy dissipation at [Hadean/Archean] alkaline hydrothermal vents are homologous” through peer review. [“The Origin of Membrane Bioenergetics”, Cell, 2012]

        I have not seen criticism of that. (But then again I’m not familiar with the consensus here.)

    2. Yes, of course it could have arisen more than once. That’s what I meant by saying in my post that MODERN life traces back to one common ancestor. Other lineages with independent origins could have existed way back when, and then gone extinct. What is true is that if they existed, none are left.

  8. Out of curiosity, is there any correlation between having a non-canonical genetic code and being an organism that lives inside other organisms (e.g., obligate intracellular parasites or endosymbionts)? If so, could part of the explanation be that these organisms have smaller effective population sizes and therefore selection is not efficient enough to remove mildly deleterious mutations that might alter the genetic code?

  9. Ah, very nice explanation. I’ve seen all of this before, but it is nice to keep it fresh.

    Mark above gave further clarification that I think is also important. Though his explanation was good, I am still a little confused as to why any tRNA carries a specific amino acid. Is there underlying chemical bonds? If there are necessitated chemical bonds there, because of the previous ensuing processes, does that not take out the arbitrariness of codon to amino acid connection? I hope those are adequate questions.

    One last thing, and perhaps picky, but in the beginning of Matt’s answer he separates out “human DNA” from “DNA in the mitochondria.” This seems like it would create an unnecessary bifurcation of human parts from nonhuman parts, from that which is necessary for humans and that which is only accidental, or something of the sorts. Mitochondrial DNA may not be specifically made for humans but that is true of much of our DNA code. And it is vitally important for our humanness, I assume.

    1. There is no chemical correspondence between codons and amino acids – the code is arbitrary.

      The genome codes for a distinct tRNA to recognize each possible codon. Each tRNA is covalently joined to a specific amino acid by an enzyme, an aminoacyl tRNA synthetase (again, there are many of these, with specificity for different amino acids/tRNAs). However, the many tRNAs do not just differ at the 3 bases that recognize the codon, they have other sequence differences that yield different shapes that are recognized by the specific aminoacyl tRNA synthetase. There are numerous elements in this machinery that could in principle be altered by mutation. I’m not sure exactly where the differences occur in the divergent genetic codes, perhaps Matthew does?

        1. It may help to think of it in mechanical rather than chemical terms. Each tRNA is a kind of widget with a barcode on one end that matches a particular codon, and a sort of gripper on the other end that’s just the right shape to hold a particular amino acid. The correspondence between barcodes and grippers is arbitrary but fixed by the shape of the tRNA molecules, which in turn is coded in the DNA. Basically your DNA contains, in addition to protein-coding genes, a dictionary of specific tRNA sequences (barcode-to-gripper mappings) that implement a specific genetic code. And this dictionary is what is more or less universal (with some exceptions) to all organisms.

      1. I’m confused. The area seems to have hold to a chemical (well, steric) correspondence as a viable, even explanatory, hypothesis for 20 years now. (See my comment below for details.)

        What evidence suggest that the hypothesis can be rejected for good?

    2. On the mitochondria – the importance of the divergence between mitochondrial DNA and nuclear DNA is that it provides strong support the Endosymbiotic Theory.

      The point is that once life has “settled on” a particular genetic code, even though that code may initially be more-or-less arbitrary, it is difficult for it to change. A change that simply results changing one amino acid for another in every protein will almost certainly be a very bad thing. So the genetic code is highly conserved in evolution – hence Monod’s point that E.coli and elephants have virtually the same (nuclear) genetic code, even though their Most Recent Common Ancestor was a long way back.

      In the case of mitochondria, the fact that the genetic code differs from the nuclear genetic code suggests that nuclear genome and the mitochondrial genome have different evolutionary histories. In other words, the divergence occurred in mitochondria when they were separate organisms, before endosymbiosis occurred.

    3. There is the old “stereochemical hypothesis”, linking code to amino acid by steric interactions. Yariv, 1997; see the first reference here for details: .

      The linked article shows how it seems pretty well tested by looking at the frozen accident that is the resulting code. TL;DR: “The origin of the genetic code was constrained by pre-biotic chemistry (stereochemistry hypothesis) followed by a period of selection”.

  10. Matthew, I think that some of what you wrote here could be misunderstood.

    You wrote:
    “Not only is the genetic code is not universal, the same organism can contain two different genetic codes, one in its genomic DNA, the other in its mitochondria.”

    I think it’s important to be clear that the codes are nearly identical for all life, including mitochondria, and differ in only a few codons at most. The phrase “two different genetic codes” is misleading.

    You wrote:
    “The responses of scientists to the non-universality of the genetic code reveal something important about the nature of biology. It was completely unexpected, and went against all the assumptions of all the researchers who had been studying the genetic code, showing that Monod was wrong…”

    The particular case of mitochondria was unexpected, sure, because it showed divergence from the “host” (nuclear) genome. But in general, variation is a part of evolution. Why would any scientist have been surprised to see variation? The fact that variation exists at only a few codons, and that the core code IS universal for all life, strongly supports a Universal Common Ancestor. It seems disingenuous to say that Monod was wrong. E.coil and the elephant DO have virtually identical genetic codes. I think there’s a risk here of sounding like the kind of journalistic over-egging that has Darwin proven wrong twice a month.

    1. I don’t think we disagree. I’m not wanting to over-egg, after all this is hardly news! This is partly a grammatical thing – something is either universal or it isn’t! I think I made it clear that these are derived versions of the universal code, not something separate, and the really interesting thing is that it has not in any way altered I remember the excitement in 1985 when the first non-canonical paramecium code was discovered. It *is* surprising! In the 50s=70s, everyone assumed that the genetic code would be universal, and many people still think it is, truly, absolutely universal. It isn’t and I was trying to explain to a reader a) what exactly was going on and b) how it happened.

      1. Yes, and to a careful reader with some expertise, your points are clear and well made.

        My bone of contention is with this overly-dramatic sentence:

        “It was completely unexpected, and went against all the assumptions of all the researchers who had been studying the genetic code, showing that Monod was wrong – what is true for E. coli is not necessarily true for an elephant.”

        I think to a lay reader this is quite misleading. Monod was not wrong, and nothing that was discovered contradicted the notion that the genetic code is essentially universal. EVERY rule in biology has minor exceptions (including this one)! The discoveries of minor variations to the code were “unexpected” only in the sense that they are a fascinating adjunct to a hypothesis that has been shown to be essentially correct.

          1. I’m very much looking forward to your book.

            I think the genius of Crick, Brenner, Gamow et al. in anticipating the universal nature of the code is remarkable. And it’s clear that they had the evolutionary picture completely correct – including anticipating just the kind of variations that were later discovered. For people who may want a taste of the story in anticipation of Matthew’s book, this little article talks about it and cites some of the key papers


            From 1963, before the code was solved:
            “… if different codes do exist they should be associated with major taxonomic groups such as phyla or kingdoms that have their roots far in the past.”

            In fact, after the code was solved, and found indeed to be universal, but before ANY variation was discovered, Crick noted that this unexpected complete lack of variation was a surprise. He then went a little off the deep end in hypothesizing Directed Panspermia to explain it! (Crick & Orgel, 1973)

  11. Fine post. As an engineer, this is far away from my area of expertise, but I still remember being completely transfixed in the mandatory biology class they made everybody take. I am sure Jerry will let you plug your book here when it comes out.

  12. I enjoyed reading this post and I will now look forward to reading Matthew’s book.
    The evolution of the eukaryotes is something I find quite fascinating and the existence of mitochondria, plastids and other endosymbionts to be one of the best illustrations of evolution. In addition to the commonly known mitochondria and chloroplasts, a diversity of endosymbiont relationships abound. For example, the algae Bigelowiella natans acquired its plastid organelle through an endosymbiotic event with another eukaryote. B. natans therefore possesses two separate nuclear genomes, it’s own and the vestigial genome of the endosymbiont eukaryote, which is known as a nucleomorph. It also has a separate genome within its mitochondria and another genome within the plastid.
    It should also be mentioned that the role of mitochondria in ATP production, while an important advantage, may not be the primary selective advantage offered by this endosymbiont. There are examples of single celled organisms that have streamlined mitochondria-derived organelles that lack a genome and don’t produce ATP. The retention of these reduced mitochondria is proposed to be for the role they play in the biogenesis of iron-sulphur clusters that are important cofactors for many enzymatic processes in cells.

  13. Theobald used the commonality of the whole trancription/translation complex to get his result, the most well tested fact in all of science in terms of likelihood ratio.

    most scientists now think that there was a single event, which enabled what was effectively a hybrid organism to grow larger and to acquire the energy required by more complex organisms.

    It must have been hard, seeing how successful descendants of the shared ancestor is: “Mitochondria share an ancestor with SAR11, a globally significant marine microbe”. And likely that shared ancestor was common too, underscoring how difficult the successful endosymbiosis may have been in this particular case. [ ]

    Endosymbiosis can lead to gene transfer/extinction like in mitochondria. In “nested mealybug symbiosis” it happens over every nesting: “The research team also examined a strain of Tremblaya that doesn’t have Moranella living inside it. This variety employs about 50 more genes than the one containing Moranella, which strongly suggests Moranella plays a key role in allowing the insect-dwelling Tremblaya to operate with such a tiny genome.” [ ]

  14. Does this mean that the common ancestral eukaryote was more closely related to archaea in it’s DNA and bacteria in it’s mitochondrial DNA?

    I have long been curious about which domain our mitochondrial DNA was more closely related to… I think there is a growing consensus that our nuclear DNA is more similar to archaea.

    1. I haven’t checked recently, but that sounds right… TOLweb doesn’t commit, but that page dates from 1997 (approximately the dawn of time, or the last time I stood on a stage for a handshake in a funny hat). A tree showing archaea and eukaryotes as sister groups is showing up on a lot of mugs and t-shirts, but things might not be that simple. this recent open-access phylogenomic analysis finds a mixture of archaeal and bacterial genes in ancestral eukaryotes, and (from the abstract):
      “…We also provide evidence that eukaryotes branch close to the last archaeal common ancestor. Our results demonstrate that there is no phylogenetic support for hypotheses involving a fusion with a bacterium other than the ancestor of mitochondria. Overall, they leave only two possible interpretations, respectively, based on the early-mitochondria hypotheses, which suppose an early endosymbiosis of an alphaproteobacterium in an archaeal host and on the slow-drip autogenous hypothesis, in which early eukaryotic ancestors were particularly prone to horizontal gene transfers.”

  15. I think the near universality of the genetic code is the strongest evidence for a single common ancestor of all present day living things.

    The fact that there are minor variations (I understand yeast has the most, either 8 or 11 codons, I forget which), I take as evidence that other codes are possible. Thus the near universality of the code is most likely the result of common ancestry rather than chemical/physical constraints.

    A change in the code could happen with a single mutation in the DNA gene for a particular tRNA. The mutation changes the anticodon three base sequence of that particular tRNA, but would not change the peptide it carries. I suspect that variation in the code is largely the result of such mutations changing tRNA anticodons.

  16. I’ve often wondered if much work had been done on the chirality of the molecules. In the laboratory there are reactions which produce a very high fraction of one form because geometric considerations make that form much more favorable (a much higher rate of reaction) and there is no doubt that many biological molecules are produced exclusively in one form due to the proteins involved in catalysis. So – is it really possible at all to have a D-amino acid world? In a similar vein, a very long time ago some people hypothesized that we could have Silicon based lifeforms, but in that case it didn’t take long at all to show that silicon chemistry simply wasn’t up to the task.

    1. My understanding is that the chirality we find is also a frozen accident. Though, I wonder what asymmetry it *was* originally, since there seems to be a loose (i.e., with exceptions) rule that you need chiral species to make others.

        1. Jerry Siegel and Albert eschenmoser have independently answered this in the same way. I’ll follow Eschenmoser’s treatment.

          Imagine a(thought experiment) a large pot of nucleotides, racemic mixture. Allow them to condense to make n-mers, with the rule that eqch individual n-mer contains only one chirality. There are 4^n possibilities, and by the time n = around 40, there is just not enough material for each possibility to be realised. So some would be realised in one chirality only, some on the other, a few (a few tat shrinks rapidly as n increases) in both, and many in neither.

          Now imagine one of the n-mers to have a competitive advantage. It will probably be present n only one chirality; and there, unavoidably, is your frozen accident.

            1. By the time you get to 50-mers, you have 4^50 = 2^100 = 10^30 possibilities, or over 1.5 x 10^6 moles, or over 7.5 x 10^7 moles of nucleotides. Doubled for the two chiralities, and assuming that the mix is as diverse as possible.

              The statistics will get you long before this for any credible quantities.

  17. Yeah…but those other sites don’t have Hili! Nor are they authored by the guy who, literally, wrote the book on why Evolution is true….

    (In all seriousness, you strike a great balance here. Just letting you know that the biology bits really are really appreciated, even if they don’t get the multiple hundreds of comments some other topics do.)


  18. About 6 months or so ago, I was taking part in a “god belief” thread on a forum that I’m part of and a person offered the universality of the genetic code as one of the reasons they believed that life had been created by god.

    I pointed out this very information to him, and said that the genetic code wasn’t actually universal.

    After much hemming and hawing he acknowledged that he didn’t actually know anything about genetics but the fact that all life utilized DNA and RNA and the same Nitrogen bases, that meant god must have done it.

    Further, after me providing extensive material to the contrary, he could not be swayed from his contention that you could take any gene from any organism and put it into any other organism and get the correct protein.

    1. a person offered the universality of the genetic code as one of the reasons they believed that life had been created by god

      This argument seems completely backwards to me — if an omnipotent being created everything, why would it use the same genetic code over and over? Surely there would be circumstances where using different codes would be more optimal. Heck, surely there would be situations where non-carbon-based life might have an advantage, or where animate pudding or super-intelligent shades of blue might flourish.

      A single genetic code requires a small set of random events, whereas a huge number of incompatible biological systems would be very unlikely without some sort of intelligent intervention.

      1. The argument usually comes from people who know little about biology.

        It seems to be that the same intelligence would design the organisms the same way, so that the similarities at the most basic level act as a signature of a designer.

        I can’t understand why anyone would accept this as a strong argument, but there are those that do.

        1. Sure I’ll agree that genetic similarity can (weakly) support a desire to reuse good designs on the part of an intelligent designer. But then they have to explain why this designer would want to insert the same strings of junk DNA, remnants of ancient viral infections, etc. with chances between similar organisms that just happen to match the tree of life we’d expect if those organism evolved from a common ancestor…

  19. A somewhat unrelated question:
    The eukaryotes as i understand it are actually two distinct groups: one having mitochondria while the other chromoplasts (a.k.a animals and plants).
    This kid of symbiosis is assumed to be very very rare – however it seems to have happened twice, and not too far apart (give or take several 100M years)

    Does this mean some conditions were “right” at the time for bacteria to join forces, or that the eukaryote ancestor was promiscuous?

    1. Plants have mitochondria. I think all eukaryotes have mitochondria, but am not sure. It is interesting that both chloroplasts and mitochondria are involved in energy metabolism.

    2. Plants have chloroplasts and mitochondria.

      This suggests that there was a split in the eukaryote lineage after mitochondria were taken up, and in one of the lineages chloroplast like bacteria were also taken up.

      This could indicate some sort of promiscouosness on the part of the the eukaryotic ancestor, or it could indicate that two species of bacteria (or archea) developed some very good method of surviving intracellularly and escaping host destructive processes. Or it could indicate neither.

      I don’t follow plant evolutionary biology very closely but my guess would be that plant cells originated from some sort of lichen. I would expect that the eukaryotic species in a lichen would by necessity have (at least semi-) disabled defense to bacterial products. I would expect that this would be an ideal situation for a symbiotic relationship to develop in which the bacterial species exists intracellularly. And since the eukaryotic species in a lichen is a fungus you already have a cell wall present (though fungal and plant cell walls are different now, they could easily have the same origin, both are afterall primarily composed of B-1,4 linked sugars).

      1. Fungi are very closely related to animals, not to plants; and lichens are all terrestrial.

        Almost all eukaryotes have mitochondria, and the remainder have modified mitochondria (mitosomes, hydrogenosomes…) or at least mitochondrion-derived genes in their nuclei.

        Look up “primary endosymbiosis” and “secondary endosymbiosis”.

  20. I greatly enjoyed this well-written article. However, there is one inacccuracy appropos selenocysteine (Sec) utilization. Eukaryotes (including of the multicellular variety!) use Sec as well. In fact, Sec utilization is widespread across all three domains of life, with a conserved Sec insertion architecture.

    Here are two relevant citations:
    Kryukov, Gregory V. et al
    2003 Characterization of mammalian selenoproteomes. In Science. 300: 1439-1443.

    Castellano, Sergi
    2009 On the unique function of selenocysteine- insights from the evolution of selenoproteins. In Biochimica et Biophysica Acta. 1790: 1463-1470.

    1. Yes! This is an important point! Whenever certain regulatory elements are present, one of the stop codons does not attract a terminator but instead a special modified tRNA with selenocysteine on it.

      Pyrrholysine, on the other hand, really is limited to certain bacteria.

Leave a Reply