The mRNA coronavirus vaccine: a testament to human ingenuity and the power of science

December 27, 2020 • 9:45 am

The Pfizer and Moderna vaccines are a triumph of both technology and of drug testing and distribution. But to me, the most amazing thing about them is how they were designed. Unlike most vaccines, which are based on either weakened or killed viruses or bacteria, these use the naked genetic material itself—specifically, messenger RNA (mRNA). Viral mRNA serves normally to make more viruses using the host’s own protein-making machinery, and the virus’s genome codes for the most dangerous (and vulnerable) part of the virus: its spike protein. This is the protein that, sticking out all over the virus, recognizes and binds to the host cell—our cells. That allows the virus to inject its entire genome into our cells, commandeering our metabolic processes to make more viruses, which then burst out of the cell and start the cycle all over again.

The spike protein is the dangerous bit of the virus; without it, the virus is harmless. If we could somehow get our immune system to recognize the spike protein, it could then glom onto and destroy the viruses before they start reproducing in our cells. And that’s what the Pfizer and Moderna vaccines do.

The vaccine is in fact composed not of spike protein itself, but of artificially synthesized instructions for making the spike protein. Those instructions, coded in mRNA, are packed in lipid nanoparticles and injected into our arms.  The mRNA, engineered to evade our body’s many defenses against foreign genetic material, goes into our cells and instructs our own protein-synthesizing material to make many copies of the spike protein itself.  Since these copies aren’t attached to a virus, they aren’t dangerous, but they prime the immune system to destroy any later-attacking viruses by zeroing in on the spike proteins on the viral surface.

Thus the vaccine uses our own bodies in several ways: to make copies of just the spike protein, and then to provoke our immune system to recognize them, which the body “remembers” by storing the instructions to fabricate antibodies against real viral spike proteins.  The part of this story that amazes me is the years of molecular-genetic studies that went into our ability to design an injectable mRNA, studies that weren’t done to help make vaccines, but simply to understand how the genetic material makes proteins. In other words, pure research undergirded this whole enterprise.

You can read a longish but fascinating account of how the mRNA vaccine was made at the link below at science maven and engineer Bert Hubert’s website (click on the screenshot). Hubert doesn’t go into the details about packaging the engineered mRNA into lipid nanoparticles, which is a tale in itself, so there’s a lot more to learn. At the end, I’ll link to a story about how quickly this vaccine was made—less than a week to both sequence the virus’s RNA, including the spike protein, and then use that sequence to design a vaccine based on the spike protein.  What I’ll do here is try to condense Hubert’s narrative even more. 

Before China even admitted that the viral infection was dangerous and spreading, Yong-Zhen Zhang, a professor in Shanghai, had already sequenced its RNA (the genetic material of this virus is RNA, not DNA), and then deposited the sequence on a public website (a dangerous thing to do in China). The entire viral genome is about 29,000 bases long (four “bases”, G, A, C, and U, are the components of RNA), and makes 6-10 proteins, including the spike protein.

Within only two days after that sequence was published, researchers already knew which bit coded for the spike protein (this was known from previous work on coronaviruses) and then, tweaking that sequence, designed mRNA that could serve as the basis of a vaccine. Once you’ve designed a sequence, it’s child’s play these days to turn it into actual RNA.

The final mRNA used in the Pfizer vaccine is 4282 bases long (if you remember your biology, each three bases code for a single amino acid, and a string of amino acids is known as a protein). But the vaccine mRNA does a lot more than just code for a protein. Here are the first 500 bases of the Pfizer mRNA as given by Bert Hubert, and below you’ll see a diagram of the whole mRNA used in the vaccine:

If you remember your genetics, this sequence looks odd, for mRNA sequences usually contain the bases A, G, C, and U (uracil). Where are the Us? In this vaccine, the Us have been changed into a slightly different base denoted by Ψ (psi), which stands for 1-methyl-3′-pseudouridylyl. I’ll give the reason they did this in a second.

But what you see above is less than one-eighth of the whole mRNA used in the vaccine. I won’t give the whole sequence, as it’s not important here, but the structure of the mRNA is. Remember, this was engineered by people using previous knowledge and their brains, and then entering the sequence into a “DNA printer” that can fabricate DNA that itself can be turned into virus-like RNA. Isn’t that cool? Here’s a picture of the Codex DNA BioXp3200 DNA printer used to make the DNA corresponding to the vaccine’s RNA (photo from Hubert’s site):

And here’s the heart of this post: the structure of the 4282-nucleotide string of RNA that is the nuts and bolts of the vaccine (also from Hubert):

You can see that it’s complicated. The heart of this is the “S protein__mut”, which is the engineered code for the spike protein. But all that other stuff is needed to get that bit into the cell without it being destroyed by the body, get it to start making lots of spike protein to act as a stimulus (antigen) to our immune system, and to get the spike protein made quickly and copiously. The more innocuous spike protein we can get into our body, the greater the subsequent immune response when the virus attacks. Each bit of the mRNA shown in the diagram above has been engineered to optimize the vaccine. I’ll take it bit by bit:

Cap: Underlined in the diagram above, this is a two nucleotide sequence (GA) that tells the cell that the mRNA comes from the nucleus, where it’s normally made as a transcript from our DNA. These bases protect the engineered RNA from being attacked and destroyed by our body, as it makes it look like “normal” RNA.

Five prime (5′) untranslated region (“5′-UTR”) in the diagram.  This 51-base bit isn’t made into spike protein, but is essential in helping the mRNA attach to the small bodies called ribosomes where it is turned into proteins—three-base “codon” by three-base “codon”—with the help of smaller RNA molecules called “transfer RNAs” (tRNAs). Without the 5′-UTR, the protein won’t get made. Besides helping get the engineered mRNA to the ribosomes, this region has been further engineered. First, the Us have been engineered into Ψs, which keeps the immune system from attacking the mRNA without impairing its ability to attach to the ribosomes and make protein. And the sequence has been further tweaked to give it information for making a LOT of protein. To do this, the designers used sequence from our alpha-globin gene’s UTR, for that region makes a lot of protein. (Alpha globin is one half of our hemoglobin molecules, one of the most copious and quickly made proteins in the body.)

S glycoprotein signal peptide (“sig”) in the diagram. This 48-base bit, which does become part of the protein, is crucial in telling the cell where to send the protein after it’s made. In this case, it tells it to leave the cell via the “endoplasmic reticulum”, a network of small tubules that pervades the cell. Even this short bit was engineered by the vaccine designers, who changed 13 of the 48 bases. Why did they do this? Well, they changed the bases that don’t make a difference in the sequence of the protein (these are usually bases in the third position, whose nature isn’t important in protein sequence). But these bases do affect the speed at which a protein is made. Hubert doesn’t explain why this happens, but I suspect that the engineered changes were designed to fit with more common transfer-RNA molecules (tRNAs), which are the small bits of RNA that attach to amino acids in the cytoplasm and then carry them to the mRNA to be assembled into proteins. While there are 64 three-base sequences (4³), there are only 20 amino acids that normally go into proteins. That means that some tRNAs code for the same amino acids. Since these “redundant” tRNAs are not present in equal quantities in the cell, you can make proteins faster if you design an mRNA sequence that matches with the most common tRNAs. I’m guessing that this is what these 13 changes were about.

Spike protein (“S protein__mut”) in the diagram. This is the heart of the mRNA, containing 3777 bases that code for the spike protein. In this code, too, they’ve “optimized” it by changing the “redundant” bases to allow protein to be made faster. The Ψs are now gone, as they’re not needed to evade the body’s defenses.  But there’s one bit that puzzled me until I read Hubert’s explanation. The spike protein made by the body after vaccination differs from the viral spike protein in just two of the 1259 amino acids. The engineered sequence substitutes two amino acids—both prolines—for amino acids in the viral spikes. Why? Because it was known from previous work that these prolines stabilize the spike protein, keeping it from folding up. It thus retains the same shape it has in the native virus. A folded-up spike protein may induce antibodies, but they won’t readily go after the virus’s own spike proteins because their shape is different.  This is just one of the many bits of prior knowledge that came to bear on the vaccine’s design.

The 3′ untranslated region (“3′-UTR”) in the diagram: mRNA’s have these, but we’re not quite sure what they do, except, as Hubert says, the region is “very successful at promoting protein expression.” How this happens is as yet unclear. This bit, too, was engineered by the vaccine designers to make the mRNA more stable and boost protein expression.

The poly-A tail (“poly[A]” in the diagram). This is the 140-base end of the message. All mRNAs made into proteins contain a repeat of the adenine base at the butt (3′) end, so we get an AAAAAAAAAAAAA. . . sequence. It turns out that these A’s are used up when an mRNA molecule makes protein over and over again (they’re like telomeres that get shorter as we age!). When all the As are gone, the mRNA is useless and falls off the ribosomes. Again, previous knowledge told the designers how many As to put at the end of the sequence.  It was known that around 120 As gave the best result in terms of protein production; the designers used 100 As split up with a 10-base “linker” sequence. Hubert doesn’t explain the linker, and I don’t know why it’s there.

Nevertheless, you can see the complexity of this vaccine, whose design rests on an exact knowledge of the spike protein’s sequence (recent mutations in the sequence don’t seem to affect the efficacy of the vaccine, as they probably don’t affect the spike’s shape), as well as on previous research about stuff like the Ψ bases helping evade mRNA destruction, the optimum sequences for high production of protein, the number of As at the end that are most efficacious, and then those two proline substitutions in the vaccine’s spike protein. It’s all marvelous, a combination of new and old, and a testament to the value of pure research, which sometimes comes in mighty handy.

This prior knowledge, combined with fast sequencing of RNA and the development of machines to turn code into RNA, help explain why the vaccine was designed so quickly. Of course it had to be tested and distributed as well, and this Guardian article tells you ten additional reasons why it took only ten months to go from the onset of the pandemic to a usable vaccine.

Finally, a bit of history of science is recounted by “zeynep” at Substack, showing additional reasons why the vaccine came out so quickly (click on screenshot). It’s largely about Yong-Zhen Zhang, the Chinese scientist who published the genetic code of the Covid-19 virus. Zeynep sees him as a hero who took risks with that publication. What’s clear is that without that code (and of course sequencing of DNA and RNA has been done for a long time—another benefit of pure research), we wouldn’t be near as far along as we are in battling the pandemic.

When you think about all this, and realize that only one species has both the brains and the means to make a designer vaccine to battle a devastating virus, and then think about the many scientists whose work contributed over many years to the knowledge involved in designing these vaccines, it should make you proud of humanity—and of the human enterprise of science. Yeah, we screw up all the time, and are xenophobic and selfish, but this time we overcame all that and used the best in us to help all of us.

Thanks to Bert Hubert for helping me understand the complexity of these vaccines.

49 thoughts on “The mRNA coronavirus vaccine: a testament to human ingenuity and the power of science

  1. I’m curious of how the mRNA in the COVID-19 virus can switch some bases and get transferred from humans and into other animals such as cats and minks.

    1. I am just following along because the biology here is over my head. The knowledge that the scientists in the laboratory have or already had makes a big difference in this fast development. I understand that a lot of money was also thrown in to assist with the speed and success. Having also read about the failures at CDC concerning the test capability here in the U.S. has also caused big problems along the way. The real test now seems to be in getting this great discovery into enough arms to put an end to this pandemic.

    2. Like Randall I’m just following along – I don’t know anything about these viruses. But one amazing thing I learned from a colleague last winter (shortly after the new virus was reported) is that humans have lots of coronavirus “species” or strains that are shared with other mammals, but the coronaviruses sampled from humans can all be traced back to a very recent common ancestor <<10,000 years ago. I had just assumed that the viruses that have been causing colds in humans for hundreds of thousands of years are "human" viruses, but that's probably not the case: humans are constantly acquiring new viruses from other species (and passing viruses to other species), and every few thousand years all the other strains are replaced by a new virus acquired from some other host species (a sort of mass extinction event for the viruses). There are lots of small differences among those coronavirus genomes but they seem to have little effect on transmissibility between host species because that transmissibility is already pretty high. That seems like another reason to think that the new human strain in the UK is probably not more transmissible from human to human than other strains. But I could be wrong about that.

  2. Fascinating stuff. Someday perhaps we’ll own devices that can receive instructions over the internet, manufacture the vaccine, and then inject it into us. It better be password-protected though.

    1. The cheaper ones won’t be secured, will have a relatively high error rate, won’t get security upgrades (just be “bricked” so you need to buy a new, and more expensive, one), and will insert an engineered dependency on “McDNuts” (a copyright-protected vitamin) into every vaccine produced.
      Cynical? Moi?
      Also (having just had to do some OCR for Dad which the canned software for his scanner can’t handle), you can bet that there will be better software for running the machine, without the McDNuts dependency, available on GITHUB, and “hacking” your property to use it will be a criminal offence.

  3. Have read Huber’s piece and learned more about his matter than I ever thought I would. My programmed biology education ended long ago. I was somewhat uncomfortable with the speed of the vaccine until i read this. I also learned more about computers.

    1. The short answer is yes. From wikipedia and google, I get two quotes:

      “ACE2 acts as the receptor for the SARS-CoV-2 virus and allows it to infect the cell.”

      “Angiotensin-converting enzyme 2 (ACE2) is an enzyme attached to the cell membranes of cells located in the lungs, arteries, heart, kidney, and intestines. ACE2 lowers blood pressure by catalyzing the hydrolysis of angiotensin II into angiotensin.”

    2. Steve Gerrard’s response is also a large part of the answer to Alex K. at #1 above. The ACE enzyme and it’s receptors are common across at least placental mammals, if not further out into the fish and other vertebrates. They’re not identical from one taxon to another, but they are sufficiently similar to allow, for example, infection of bat cells, corresponding human cells, and corresponding mink cells.
      That’s quite a broad “phylogenetic bracket”, so I, for one, would not fall off my sofa in astonishment to hear that SARS-COV2 has leapt the species barrier into pigs or cows (sheep having relatively little contact with humans).

  4. Marvelous! I did not know most of that stuff. Basic research! It gets the job done!

    As I understand it, the 5-prime cap on eukaryote mRNA has different functions. These include transport out of the nucleus, protection from RNA-eating enzymes, and then its used in the initial stages of binding of ribosomes to that end of the RNA.

    I wonder if one could engineer transgenic bacteria, or whatever, to make RNA with the specialized base and other special features.

  5. I wonder how many philosophers helped in making of this vaccine, I mean they would want it justified right? we cant just say science works, can we. Or maybe Hawkins was right philosophy is dead.

  6. Thanks for the fabulous description Jerry. It seems that the use of specific redundant third position base pairs to improve speed may be related to what leads to codon bias (some third position base pairs are more common in sequence data than expected by chance)? Hiroshi Akashi, who worked as an undergrad in the Lewontin lab, studied codon bias in Drosophila for his PhD at Chicago (so you may know him). If I recall his work correctly, one interesting aspect of codon bias is that the selection pressures for codon bias are tiny (and would never be detectable in any field study) but the genetic signature it leaves is clear.

    1. Codon bias in prokaryotes and eukaryotes seems to be an open area. I’m pulling this from fallible memory and personal bias since it is late, but FWIW my impression has been that prokaryotes may have some environmental selection going on (but different clades tend to have different bias).

      While eukaryotes may have selective advantages in protein production rate regulation, cancer suppression and what not. Specifically here, from Hubert’s own reference:

      Mammalian genes are highly heterogeneous with respect to their nucleotide composition, but the functional consequences of this heterogeneity are not clear. In the previous studies, weak positive or negative correlations have been found between the silent-site guanine and cytosine (GC) content and expression of mammalian genes. However, previous studies disregarded differences in the genomic context of genes, which could potentially obscure any correlation between GC content and expression. In the present work, we directly compared the expression of GC-rich and GC-poor genes placed in the context of identical promoters and UTR sequences. We performed transient and stable transfections of mammalian cells with GC-rich and GC-poor versions of Hsp70, green fluorescent protein, and IL2 genes. The GC-rich genes were expressed several-fold to over a 100-fold more efficiently than their GC-poor counterparts. This effect was not due to different translation rates of GC-rich and GC-poor mRNA. On the contrary, the efficient expression of GC-rich genes resulted from their increased steady-state mRNA levels. mRNA degradation rates were not correlated with GC content, suggesting that efficient transcription or mRNA processing is responsible for the high expression of GC-rich genes. We conclude that silent-site GC content correlates with gene expression efficiency in mammalian cells.

      So there remain questions (doesn’t it always?).

      One of the substitutions were outside that optimization for unclear reasons, by the way.

  7. The scientists behind these vaccines deserve enormous respect and kudos. And special mention goes to Katalin Kariko, whose work with mRNA was derided, her grant applications rejected, and her promotion at Penn State denied. She then helped start BioNTech, which along with Pfizer, has developed this vaccine.

  8. Question:
    If we can trick a cell to produce the spike protein, wouldn’t be easier to produce the spike protein in a lab using a mammalian cell culture (or other suitable technique) and then inject the spike protein as the vaccine itself?

    1. This is one thing I wondered about, and I don’t have the answer, but there has to be a good reason. Perhaps a reader who knows the answer can weigh in below. It may be hard to make the spike protein, but I suspect that it has something to do with having the cell make the protein, which may in some ways be less harmful. But, short answer: I don’t know.

    2. Hypothesis : free “spike” protein in the bloodstream could lead to antibodies which also react against angiotensin (which is received by the “spike’s” target enzyme). If that does bad things to the blood pressure of, say, 10% of recipients, then it could be considerably worse than the original virus.

    3. My understanding is that the problems are with the protein synthesis. Making synthetic protein in a machine is limited to <200 amino acids, much shorter than the length of the spike protein (~1180 amino acids). IDK why longer proteins can't be synthesized, maybe because longer proteins have higher error rates.

      Making synthetic protein in a transgenic cell culture system works great for making proteins like insulin where minimal purification is needed for the protein to function (I guess because some contaminants are acceptable). IDK why this isn't an approach to making antigens, but maybe it requires high-stringency protein purification that is too slow or too expensive to be scaled up. But again not my area of expertise.

      1. Interesting point!

        I found this, which looks reasonable:

        Making synthetic proteins can be a slog. Generally, if scientists want to make a protein over 50 amino acids long, they either get an engineered microbe to grow it, or they stitch together shorter synthetic peptides into longer ones. Both options can take weeks to months and involve multiple steps. Bradley L. Pentelute and coworkers at the Massachusetts Institute of Technology have a better plan. They’ve created an automated chemical protocol to synthesize peptides up to 164 amino acids long in hours (Science 2020, DOI: 10.1126/science.abb2491).

        Chemists have previously made synthetic peptides using a process called solid-phase peptide synthesis, in which they build a peptide chain one amino acid at a time on a polymer resin. That method can’t make chains greater than 50 amino acids because the longer the process goes, the more likely it is that side reactions will produce aggregated peptides or ones missing amino acids.

    4. “wouldn’t it be easier to produce the spike protein in a lab using a mammalian cell culture?”

      No. It is much easier to key the mRNA sequence into that DNA printer, and have the vaccine ready for testing in two weeks. As PCC put it: “less than a week to both sequence the virus’s RNA, including the spike protein, and then use that sequence to design a vaccine based on the spike protein.” You are effectively producing the spike protein using the human body instead of a cell culture, and the human body is easier to manage and better at it.

      For the next vaccine, the only thing in the whole process that will change is the mRNA sequence they key into the machine. If you were working with proteins or disabled viruses, you would have to develop a new process each time for isolating the new target protein or a disabled virus that triggers effective antibodies.

  9. Fascinating post!
    I think the fastest vaccine developed before was the second mumps vaccine (IIRC), four years!
    If we can summarize the main reasons why these new vaccines could be developed so fast:
    – more than decade long research into SARS and other Corona virus by many scientists laid the basis.
    – the astonishing speed of decoding the virus (two days? seriously!?) and nearly immediately giving it out internationally, kudos, nay superkudos, there to Yong-Zhen Zhang and Li Wenliang, allowing to start working on a vaccine even before the outbreak was officially recognised.
    – the great amounts of money (not only ‘warp speed’, but also EU and UK initiatives) poured in, which made it possible to do phase I and II trials simultaneously, and phase II and III trials. (companies don’t do that because if phase I fails you do not invest in II, and same for III).

    I must say that I’m extremely impressed, not to say stunned, how cleverly designed these mRNA vaccines are.
    Yes, it makes us proud of humanity, and it’s capability to do science!

    1. As far as I know the mRNA vaccines – who have advantages such as having less allergenic potential (c.f. influensa vaccines which can’t be given to all) have been on the backburner for years. What blocked tests have been ethical concerns if my memory serves – but you may want to check that – and the pandemic gave an excuse to circumvent the usual procedures and open up the market.

      If that is correct we may see some later analyses criticizing the ethics involved – the standard vaccines are not much later and now seem as effective after tweaks – but it was likely a risk well taken.

  10. Thanks, it is indeed mind-blowing.

    But I have to say when combined with the proposition that at some point it is “child’s play” to come up with various genetic agents like this, and then just inputting them in to a “DNA printer”….it did raise the specter in my layman’s ignorance of nefarious actors easily creating truly nightmare-scenario viruses to unleash. Am I wrong?

  11. Shades of Walt Whitman.

    “For every atom belonging to me as good belongs to you.”

    I wonder about the ethics of getting this vaccine which probably contain atoms that were once inside a human embryo.

    1. It would also contain some atoms that were inside a distant star which blew up and then went into our solar system as it was forming. Why would we want 500 million year old alien atoms in our bodies???

  12. This is awesome. I’ve already got my predictions for the next Nobel Prize in Medicine. If only the distribution of the vaccines could be equally ingenious. As Leonard Cohen sang, “Hey, why not ask for more?”

  13. It crossed my mind that if this technique turns out to be safer, and less likely to produce side-effects, that the safety testing for vaccines could be significantly reduced. That could mean producing a new vaccine in days or weeks, rather than months or years.

  14. I wonder if Time Magazine is having second thoughts about designating Biden and Harris, whatever their considerable merits, as “Person of the Year.” I genuflect in the direction of Time.

  15. Please continue with these science posts. Your explication was very valuable. What extraordinary science is being done!

  16. Thank you Jerry for the great and informative post! I hadn’t yet take the time to dig into the specifics of the new vaccines, so this was a great opportunity.

    My immediate reaction was, and remains, that since mRNA vaccines rounds a lot of allergies – c.f. how influensa vaccines can’t be given to all – they were potential game changers. A few anaphylactic shocks have turned up as expected, but nothing systematic I think (and seeing how you can get anaphylaxis from most anything, such as “exposure to cold air after getting out of a shower” [ ], that doesn’t seem too alarming).

  17. It was known that around 120 As gave the best result in terms of protein production; the designers used 100 As split up with a 10-base “linker” sequence. Hubert doesn’t explain the linker, and I don’t know why it’s there.

    I did some spelunking and a tentative hypothesis is that it could be a hairpin forming BoxB analog “which encodes an RNA that would form a short stem structure with an A-rich loop” [ , ]. BoxB are part in activating antitermination processes that ensures no early stops or terminations during transcription.

    But it’s a wild guess, and I’m not sure why it would help during translation. Maybe it doesn’t form a hairpin as much as provide flexibility or rigidity for the tail. [It could also be a convenient site for an enzyme cut if they were testing poly-A length and/or production controlling early on, but I’m just throwing it onto the heap.]

  18. I understood the Codex BioXp3200 DNA printer was used to build the cDNA to be used as the key / base for the designed mRNA. How the mRNA has been so far synthesized, cloned, to the needed volume for zillions of doses? What the equipment, method and time to produce a batch of ready to inject mRNAs? (2 hours for 100 micro grams?) many questions still pend in all this story

Leave a Reply