The Pfizer and Moderna vaccines are a triumph of both technology and of drug testing and distribution. But to me, the most amazing thing about them is how they were designed. Unlike most vaccines, which are based on either weakened or killed viruses or bacteria, these use the naked genetic material itself—specifically, messenger RNA (mRNA). Viral mRNA serves normally to make more viruses using the host’s own protein-making machinery, and the virus’s genome codes for the most dangerous (and vulnerable) part of the virus: its spike protein. This is the protein that, sticking out all over the virus, recognizes and binds to the host cell—our cells. That allows the virus to inject its entire genome into our cells, commandeering our metabolic processes to make more viruses, which then burst out of the cell and start the cycle all over again.
The spike protein is the dangerous bit of the virus; without it, the virus is harmless. If we could somehow get our immune system to recognize the spike protein, it could then glom onto and destroy the viruses before they start reproducing in our cells. And that’s what the Pfizer and Moderna vaccines do.
The vaccine is in fact composed not of spike protein itself, but of artificially synthesized instructions for making the spike protein. Those instructions, coded in mRNA, are packed in lipid nanoparticles and injected into our arms. The mRNA, engineered to evade our body’s many defenses against foreign genetic material, goes into our cells and instructs our own protein-synthesizing material to make many copies of the spike protein itself. Since these copies aren’t attached to a virus, they aren’t dangerous, but they prime the immune system to destroy any later-attacking viruses by zeroing in on the spike proteins on the viral surface.
Thus the vaccine uses our own bodies in several ways: to make copies of just the spike protein, and then to provoke our immune system to recognize them, which the body “remembers” by storing the instructions to fabricate antibodies against real viral spike proteins. The part of this story that amazes me is the years of molecular-genetic studies that went into our ability to design an injectable mRNA, studies that weren’t done to help make vaccines, but simply to understand how the genetic material makes proteins. In other words, pure research undergirded this whole enterprise.
You can read a longish but fascinating account of how the mRNA vaccine was made at the link below at science maven and engineer Bert Hubert’s website (click on the screenshot). Hubert doesn’t go into the details about packaging the engineered mRNA into lipid nanoparticles, which is a tale in itself, so there’s a lot more to learn. At the end, I’ll link to a story about how quickly this vaccine was made—less than a week to both sequence the virus’s RNA, including the spike protein, and then use that sequence to design a vaccine based on the spike protein. What I’ll do here is try to condense Hubert’s narrative even more.
Before China even admitted that the viral infection was dangerous and spreading, Yong-Zhen Zhang, a professor in Shanghai, had already sequenced its RNA (the genetic material of this virus is RNA, not DNA), and then deposited the sequence on a public website (a dangerous thing to do in China). The entire viral genome is about 29,000 bases long (four “bases”, G, A, C, and U, are the components of RNA), and makes 6-10 proteins, including the spike protein.
Within only two days after that sequence was published, researchers already knew which bit coded for the spike protein (this was known from previous work on coronaviruses) and then, tweaking that sequence, designed mRNA that could serve as the basis of a vaccine. Once you’ve designed a sequence, it’s child’s play these days to turn it into actual RNA.
The final mRNA used in the Pfizer vaccine is 4282 bases long (if you remember your biology, each three bases code for a single amino acid, and a string of amino acids is known as a protein). But the vaccine mRNA does a lot more than just code for a protein. Here are the first 500 bases of the Pfizer mRNA as given by Bert Hubert, and below you’ll see a diagram of the whole mRNA used in the vaccine:
If you remember your genetics, this sequence looks odd, for mRNA sequences usually contain the bases A, G, C, and U (uracil). Where are the Us? In this vaccine, the Us have been changed into a slightly different base denoted by Ψ (psi), which stands for 1-methyl-3′-pseudouridylyl. I’ll give the reason they did this in a second.
But what you see above is less than one-eighth of the whole mRNA used in the vaccine. I won’t give the whole sequence, as it’s not important here, but the structure of the mRNA is. Remember, this was engineered by people using previous knowledge and their brains, and then entering the sequence into a “DNA printer” that can fabricate DNA that itself can be turned into virus-like RNA. Isn’t that cool? Here’s a picture of the Codex DNA BioXp3200 DNA printer used to make the DNA corresponding to the vaccine’s RNA (photo from Hubert’s site):
And here’s the heart of this post: the structure of the 4282-nucleotide string of RNA that is the nuts and bolts of the vaccine (also from Hubert):
You can see that it’s complicated. The heart of this is the “S protein__mut”, which is the engineered code for the spike protein. But all that other stuff is needed to get that bit into the cell without it being destroyed by the body, get it to start making lots of spike protein to act as a stimulus (antigen) to our immune system, and to get the spike protein made quickly and copiously. The more innocuous spike protein we can get into our body, the greater the subsequent immune response when the virus attacks. Each bit of the mRNA shown in the diagram above has been engineered to optimize the vaccine. I’ll take it bit by bit:
Cap: Underlined in the diagram above, this is a two nucleotide sequence (GA) that tells the cell that the mRNA comes from the nucleus, where it’s normally made as a transcript from our DNA. These bases protect the engineered RNA from being attacked and destroyed by our body, as it makes it look like “normal” RNA.
Five prime (5′) untranslated region (“5′-UTR”) in the diagram. This 51-base bit isn’t made into spike protein, but is essential in helping the mRNA attach to the small bodies called ribosomes where it is turned into proteins—three-base “codon” by three-base “codon”—with the help of smaller RNA molecules called “transfer RNAs” (tRNAs). Without the 5′-UTR, the protein won’t get made. Besides helping get the engineered mRNA to the ribosomes, this region has been further engineered. First, the Us have been engineered into Ψs, which keeps the immune system from attacking the mRNA without impairing its ability to attach to the ribosomes and make protein. And the sequence has been further tweaked to give it information for making a LOT of protein. To do this, the designers used sequence from our alpha-globin gene’s UTR, for that region makes a lot of protein. (Alpha globin is one half of our hemoglobin molecules, one of the most copious and quickly made proteins in the body.)
S glycoprotein signal peptide (“sig”) in the diagram. This 48-base bit, which does become part of the protein, is crucial in telling the cell where to send the protein after it’s made. In this case, it tells it to leave the cell via the “endoplasmic reticulum”, a network of small tubules that pervades the cell. Even this short bit was engineered by the vaccine designers, who changed 13 of the 48 bases. Why did they do this? Well, they changed the bases that don’t make a difference in the sequence of the protein (these are usually bases in the third position, whose nature isn’t important in protein sequence). But these bases do affect the speed at which a protein is made. Hubert doesn’t explain why this happens, but I suspect that the engineered changes were designed to fit with more common transfer-RNA molecules (tRNAs), which are the small bits of RNA that attach to amino acids in the cytoplasm and then carry them to the mRNA to be assembled into proteins. While there are 64 three-base sequences (4³), there are only 20 amino acids that normally go into proteins. That means that some tRNAs code for the same amino acids. Since these “redundant” tRNAs are not present in equal quantities in the cell, you can make proteins faster if you design an mRNA sequence that matches with the most common tRNAs. I’m guessing that this is what these 13 changes were about.
Spike protein (“S protein__mut”) in the diagram. This is the heart of the mRNA, containing 3777 bases that code for the spike protein. In this code, too, they’ve “optimized” it by changing the “redundant” bases to allow protein to be made faster. The Ψs are now gone, as they’re not needed to evade the body’s defenses. But there’s one bit that puzzled me until I read Hubert’s explanation. The spike protein made by the body after vaccination differs from the viral spike protein in just two of the 1259 amino acids. The engineered sequence substitutes two amino acids—both prolines—for amino acids in the viral spikes. Why? Because it was known from previous work that these prolines stabilize the spike protein, keeping it from folding up. It thus retains the same shape it has in the native virus. A folded-up spike protein may induce antibodies, but they won’t readily go after the virus’s own spike proteins because their shape is different. This is just one of the many bits of prior knowledge that came to bear on the vaccine’s design.
The 3′ untranslated region (“3′-UTR”) in the diagram: mRNA’s have these, but we’re not quite sure what they do, except, as Hubert says, the region is “very successful at promoting protein expression.” How this happens is as yet unclear. This bit, too, was engineered by the vaccine designers to make the mRNA more stable and boost protein expression.
The poly-A tail (“poly[A]” in the diagram). This is the 140-base end of the message. All mRNAs made into proteins contain a repeat of the adenine base at the butt (3′) end, so we get an AAAAAAAAAAAAA. . . sequence. It turns out that these A’s are used up when an mRNA molecule makes protein over and over again (they’re like telomeres that get shorter as we age!). When all the As are gone, the mRNA is useless and falls off the ribosomes. Again, previous knowledge told the designers how many As to put at the end of the sequence. It was known that around 120 As gave the best result in terms of protein production; the designers used 100 As split up with a 10-base “linker” sequence. Hubert doesn’t explain the linker, and I don’t know why it’s there.
Nevertheless, you can see the complexity of this vaccine, whose design rests on an exact knowledge of the spike protein’s sequence (recent mutations in the sequence don’t seem to affect the efficacy of the vaccine, as they probably don’t affect the spike’s shape), as well as on previous research about stuff like the Ψ bases helping evade mRNA destruction, the optimum sequences for high production of protein, the number of As at the end that are most efficacious, and then those two proline substitutions in the vaccine’s spike protein. It’s all marvelous, a combination of new and old, and a testament to the value of pure research, which sometimes comes in mighty handy.
This prior knowledge, combined with fast sequencing of RNA and the development of machines to turn code into RNA, help explain why the vaccine was designed so quickly. Of course it had to be tested and distributed as well, and this Guardian article tells you ten additional reasons why it took only ten months to go from the onset of the pandemic to a usable vaccine.
Finally, a bit of history of science is recounted by “zeynep” at Substack, showing additional reasons why the vaccine came out so quickly (click on screenshot). It’s largely about Yong-Zhen Zhang, the Chinese scientist who published the genetic code of the Covid-19 virus. Zeynep sees him as a hero who took risks with that publication. What’s clear is that without that code (and of course sequencing of DNA and RNA has been done for a long time—another benefit of pure research), we wouldn’t be near as far along as we are in battling the pandemic.
When you think about all this, and realize that only one species has both the brains and the means to make a designer vaccine to battle a devastating virus, and then think about the many scientists whose work contributed over many years to the knowledge involved in designing these vaccines, it should make you proud of humanity—and of the human enterprise of science. Yeah, we screw up all the time, and are xenophobic and selfish, but this time we overcame all that and used the best in us to help all of us.
Thanks to Bert Hubert for helping me understand the complexity of these vaccines.