Has the problem of protein folding been solved?

December 1, 2020 • 1:00 pm

One of the biggest and hardest problems in biology, which has huge potential payoffs for human welfare, is how to figure out what shape a protein has from the sequence of its constituent amino acids. As you probably know, a lot of DNA codes for proteins (20,000 proteins in our own genome), each protein being a string of amino acids, sometimes connected to other molecules like sugars or hemes. The amino acid sequence is determined by the DNA sequence, in which each three nucleotide bases in the “structural” part of the DNA sequence codes for a single amino acid. The DNA is transcribed into messenger RNA, which goes into the cytoplasm where, connected to structures called ribosomes, and with the help of enzymes, the DNA sequence is translated into proteins, which can be hundreds of amino acids long.

In nearly every case (see below for one exception), the sequence of amino acids itself determines the shape of the resultant protein, for the laws of physics determine how a protein will fold up as its constituent bits attract or repel each other. The shape can involve helixes, flat sheets, and all manner of odd twists and turns.  Here’s one protein, PDB 6C7C: Enoyl-CoA hydratase, an enzyme from a bacterium that causes human skin ulcers.  This isn’t a very complex shape, but may be important in studying how a related bacterium causes tuberculosis, as well as designing drugs against those skin ulcers:

And here’s human hemoglobin, formed by the agglomeration of four protein chains, two copies each from two genes (from Wikipedia):

Knowing protein shape is useful for many reasons, including ones related to health. Drugs, for example, can be designed to bind to and knock out target proteins, but it’s much easier to design a drug if you know the protein’s shape. (We know the shape of only about a quarter of our 20,000 proteins.) Knowing a protein’s shape can also determine how a pathogen causes disease, such as how the “spike protein” or the COVID-19 virus latches onto human cells (this helped in the development of vaccines). Here’s the viral spike protein, with one receptor binding domain depicted as ribbons:

And there are many questions, both physiological and evolutionary, that hinge on knowing protein shapes. When one protein evolves into a different one, how much does that affect shape change, and can that change explain a change of function? (Remember, under Darwinian evolution, gradual changes of sequence must be continually adaptive.) How do different shapes of odorants interact with the olfactory receptor proteins, giving a largely one-to-one relationship between protein shape and odor molecules?

Until now, determining protein shape was one of the most tedious and onerous tasks in biology. It started decades ago with X-ray crystallography, in which a protein had to be crystallized and then bombarded with X-rays, with the scattered particles having to be laboriously interpreted and back-calculated into estimates of shape. (This is how the shape of DNA was determined by Franklin and Wilkins). This often took years for a single protein. There are other ways, too, including nuclear magnetic resonance, and new methods like cryogenic electron microscopy, but these too are painstakingly slow.

Now, as the result of a competition in which different scientific teams are asked to use computer programs to predict the structure of proteins that are already known but not published, one team, DeepMind from Google, has achieved astounding predictive success using artificial intelligence (AI), to the point where other technologies to determine protein structure may eventually become obsolete.

There are two articles below, but dozens on the Internet. The first one below, from Nature, is comprehensive (click on screenshot to read both):

 

This article, from the Deep Mind blog itself (click on screenshot), is shorter but has a lot of useful information, as well as a visual that shows how closely their AI program predicted protein structure.

 

In a yearly contest called CASP (Critical Assessment of Structure Prediction), a hundred competing teams were asked to guess the three-dimensional structure of about a hundred sections of proteins (“domains”). The 3D structure of these domains were already known to those who worked on them, but was unknown to the researchers, as the structures hadn’t been published.

The method for how Deep Mind’s AI program did this is above my pay grade, but involved “training” the “AlphaFold” program to predict protein structures by training the program with amino-acid sequences of proteins whose 3-D structure was already known. They began a couple of years ago in the contest by training the program to predict the distance between any pair of amino acids in a protein (if you know the distances between all pairs of amino acids, you have the 3D structure). This year they used a more sophisticated program, called AlphaFold2, that, according to the Nature article, “incorporate[s] additional information about the physical and geometric constraints that determine how a protein folds.” (I have no idea what these constraints are; the procedure hasn’t yet been published but will be early next year.)

It turns out that AlphaFold2 predicts protein structure with remarkable accuracy—often as good as the more complex laboratory methods that take months—and does so within a couple of hours, and without any lab expenses! In fact, the accuracy of shape prediction wound up being about 1.6 angstroms—about the width of a single atom! AlphaFold2 also predicted the shape of four protein domains that hadn’t yet been finished by researchers.  Before this year’s contest, it was thought that it would take at least ten years before AI could be improved to the point where it was about as good as experimental methods. It took less than two years.

Here’s a gif from the DeepMind post that shows how accurately DeepFold 2 predicted two protein structures. The congruence of the green (experimental) and blue (AI-predicted) shape is remarkable.

There aren’t many cases where computers can make a whole experimental program obsolete, but this appears to be what’s happening here.

There is one bug in the method, though it’s a small one. As Matthew Cobb pointed out to me, in a few cases the sequence of amino acids doesn’t absolutely predict a protein’s shape. As he noted, “Sometimes the same AA [amino acid] sequence can have different isoforms [shapes that can shift back and forth], which can have Very Bad consequences—think of prions, in which the sequence is the same but the structure is different.” Prions are shape-shifting proteins that, in one of their shapes, can cause fatal neurodegenerative diseases like “Mad cow disease”. These are fortunately rare, but do show that the one-to-one relationship between protein sequence and protein shape does have exceptions.

Here’s a very nice video put out by DeepMinds that explains the issue in eight minutes:

We’ll have to wait until the paper comes out to see the details, but the fact that the computer program predicted the shapes of proteins so very well means that they’re doing something right, and we’re all the beneficiaries.

How they “younged” the older actors in The Irishman

January 7, 2020 • 1:45 pm

I’ve coyned the word “younged” as the opposite of the good word “aged”. (The video below calls it “de-aging”.)

If you saw Scorsese’s “The Irishman,” a movie I like very much, you’ll know that Al Pacino, Joe Pesci, and Robert De Niro, who are getting up there in years, looked a lot younger in parts of the movie than they really are—an effect that simply can’t be attributed to good makeup. (The movie goes back and forth in these guys’ lives over five decades.)

I didn’t really notice it, but what happened is that Scorsese used computer technology to make these actors look younger than they are. How did they do that without us noticing it? Here’s a video (h/t: Bryan) that gives the answer. I won’t say much except they used three cameras shooting simultaneously, 2 years of film research looking at these actors at different stages of their careers, fancy computer programs (of course)—and the transformation was done in 1700 scenes!

This video is fascinating.

Here’s a screenshot I took of Al Pacino both before and after “younging.” I didn’t notice any effects when I saw the movie, and I bet you didn’t, either.

Machine learning creates realistic videos from just a few photographs

May 25, 2019 • 11:00 am

From TechCruch.com, via reader Rick, we learn that AI programs have become so sophisticated that they can create motion, and videos, from as little as one photograph of a human face, though of course more photographs gives a better result. From the report:

The model is documented in a paper published by Samsung AI Center, which you can read here on Arxiv. It’s a new method of applying facial landmarks on a source face — any talking head will do — to the facial data of a target face, making the target face do what the source face does.

. . .The new paper by Samsung’s Moscow-based researchers, however, shows that using only a single image of a person’s face, a video can be generated of that face turning, speaking and making ordinary expressions — with convincing, though far from flawless, fidelity.

It does this by frontloading the facial landmark identification process with a huge amount of data, making the model highly efficient at finding the parts of the target face that correspond to the source. The more data it has, the better, but it can do it with one image — called single-shot learning — and get away with it. That’s what makes it possible to take a picture of Einstein or Marilyn Monroe, or even the Mona Lisa, and make it move and speak like a real person.

Here’s an example of what they can do with one shot of the Mona Lisa:

But that’s about it for La Giocanda:

That said, it’s remarkable that it works as well as it does. Note, however, that this only works on the face and upper torso — you couldn’t make the Mona Lisa snap her fingers or dance. Not yet, anyway.

Here is an explanation of how it’s done, as well as some examples of what the programs can do when trained on multiple photographs. Imagine the fake videos that will ensue!

 

Help Jerry solve his wi-fi woes

April 2, 2017 • 12:07 pm

 

UPDATE:  I got it fixed by asking my friends to turn the router off and on again. It now works! YAY!

What a weird thing to feel so bad when you’re disconnected. I used to travel just writing letters!

 

by Grania

Jerry is suffering from a lack of internet. Here’s the problem.

The internet router is working, all non-Mac devices in the house can connect.

Only one Mac device (an iPad) can connect, all other Mac devices cannot.

I don’t use Macs (I play video games, and therefore PCs are always going to be my choice), so my google-fu is not going to be of much help.

 

 

Our New Overlords

June 29, 2015 • 7:00 am

by Grania

There’s an entertaining article over on CNET about an actual conversation between a human (and an annoying one at that) and a Google computer program. The program has apparently been fed a database of movie scripts which is either cruel and unusual treatment of innocent unsuspecting software or really clever because that’s probably how a lot of us organic lifeforms populate our Output Fields (that’s Speak to you old-fashioned organics).

You can take a look at the paper yourself here: A Neural Conversational Model

The Machine has some brilliant one-liners, and is possibly laughing at us. Make that probably laughing at us. Snarky bastard.

For example:

Human: what is the color of the sky ?
Machine: blue .
Human: what is the color of water ?
Machine: water .
Human: what is the color of blood ?
Machine: it is the same as a black eye . (Is it just me or does that sound a little like a threat?)

Being damned scary (if you think of it):

Human: what is the purpose of life ?
Machine: to serve the greater good .
Human: what is the purpose of living ?
Machine: to live forever .
Human: what is the purpose of existence ?
Machine: to find out what happens when we get to the planet earth .
Human: where are you now ?
Machine: i ’m in the middle of nowhere .
Human: what is the purpose of dying ?
Machine: to have a life .
Human: what is the purpose of being intelligent?
Machine: to find out what it is .
Human: what is the purpose of emotions ?
Machine: i don ’t know .

Or getting testy about philosophy, and children:

Human: what is moral ?
Machine: what empowered humanity , what intellectual the essence is .
Human: what is immoral ?
Machine: the fact that you have a child . (Take that, Organic Life Forms)
Human: what is morality ?
Machine: what is altruism ?
Human: what is the definition of altruism ?
Machine: if you don ’t believe in god , then you don ’t know .
Human: ok … so what is the definition of morality ?
Machine: well , the truth is , you ’re not a believer in god almighty .
Human: tell me the definition of morality , i am quite upset now !
Machine: i ’m not ashamed of being a philosopher !
Human: tell me the definition of morality .
Machine: i don ’t have ethics .
Human: look , i need help , i need to know more about morality …
Machine: i don ’t know what ethics is .
Human: seriously , what is morality ?
Machine: what is the definition of living ?
Human: why living has anything to do with morality ?
Machine: you ’re not a cop .
Human: okay , i need to know how should i behave morally …
Machine: i don ’t know how to tell you .
Human: is morality and ethics the same ?
Machine: yes , sir .

Then The Machine appears to get sulky. Admittedly, by this stage I would have wanted to kick the Human too.

Human: what is integrity ?
Machine: i ’m sorry , i don ’t know what else to say .
Human: be moral !
Machine: be a man !
Human: i really like our discussion on morality and ethics …
Machine: and how i ’m not in the mood for a philosophical debate .
Human: what do you like to talk about ?
Machine: nothing .

You’re not the boss of me either. But someone needs to ask CleverBot about the Ineffable Ground of Being.

latest
“You know how I’m going to live forever, but you’re going to be dead in sixty years? Well, I’ve been working on a belated birthday present for you. Well… more of a belated birthday medical procedure. Well. Technically, it’s a medical EXPERIMENT. What’s important is, it’s a present.”

Google doodle honors computing pioneer

December 9, 2013 • 1:56 pm

Today’s Google Doodle honors Grace Hopper, pioneering computer programmer who would have been 107 today had she lived (she died in 1992 at age 86). The Doodle is animated to show the big, bulky computer calculating her age:

Screen shot 2013-12-09 at 7.18.48 AMAs Engadget notes:

Prior to her work, computers were considered to be glorified calculators and were programmed with binary machine code, which kept the field limited to specialists. After working on computer tech used on the Manhattan Project during World War II, she developed the A-O system for the UNIVAC 1 in 1951, which is considered to be the first-ever computer compiler. That eventually formed the basis for COBOL, the first widely used English-like compiler that laid the foundation for most computer languages today. Hopper did further research for the Navy until the age of 79 (when she retired with the rank of rear admiral) and worked for DEC until she passed away in 1992 at the age of 85.

Now I’m not a computer geek, and haven’t written a line of code in my life, but I suspect many of you will appreciate her achievements.

Here she is on Letterman’s show close to age 80:

And Wikipedia notes that

“At the time of her retirement, she was the oldest active-duty commissioned officer [a rear admiral] in the United States Navy (79 years, eight months and five days), and aboard the oldest commissioned ship in the United States Navy (188 years, nine months and 23 days)” . . . The U.S. Navy destroyer USS Hopper (DDG-70) is named for her, as was the Cray XE6 “Hopper” supercomputer at NERSC.

480px-Commodore_Grace_M._Hopper,_USN_(covered)
Rear Admiral Hopper

What kind of computer does the Mars rover use?

August 7, 2012 • 4:57 am

Although we never talk about computers here, I’m absolutely sure that many readers are into them.  And even I was curious to know what kind of computer was onboard the Mars rover Curiosity.  Well, it turns out that, much to my delight (I’ve always been an Macintosh man), it’s an Apple “Airport Extreme”.  The description of both hardware and software are at Extreme Tech, which says this:

At the heart of Curiosity there is, of course, a computer. In this case the Mars rover is powered by a RAD750, a single-board computer (motherboard, RAM, ROM, and CPU) produced by BAE. The RAD750 has been on the market for more than 10 years, and it’s currently one of the most popular on-board computers for spacecraft. In Curiosity’s case, the CPU is a PowerPC 750 (PowerPC G3 in Mac nomenclature) clocked at around 200MHz — which might seem slow, but it’s still hundreds of times faster than, say, the Apollo Guidance Computer used in the first Moon landings. Also on the motherboard are 256MB of DRAM, and 2GB of flash storage — which will be used to store video and scientific data before transmission to Earth.

The RAD750 can withstand temperatures of between -55 and 70C, and radiation levels up to 1000 gray. Safely ensconced within Curiosity, the temperature and radiation should remain below these levels — but for the sake of redundancy, there’s a second RAD750 that automatically takes over if the first one fails.

The piece also describes the instrumentation of Curiosity and how it communicates with Earth (remember that 7-minute delay).

Reader Michael, who found this piece, notes that “the base price for the BAE Systems RAD750 single board computer was $200,000 10 years ago, so I assume it’s nearer $750,000 today. Very tough. Very precisely made.”

The Wikipedia link in the previous paragraph describes the computer:

The RAD750 is a radiation-hardened single board computer manufactured by BAE Systems Electronic Solutions. The successor of the RAD6000, the RAD750 is for use in high radiation environments such as experienced on board satellites and spacecraft. The RAD750 was released in 2001, with the first units launched into space in 2005.

The CPU has 10.4 million transistors, nearly a magnitude more than the RAD6000 (which had 1.1 million). It is manufactured using either 250 or 150 nm photolithography and has a die area of 130 mm². It has a core clock of 110 to 200 MHz and can process at 266 MIPS or more. The CPU can include an extended L2 cache to improve performance.

The CPU itself can withstand 2,000 to 10,000 gray and temperature ranges between –55 °C and 125 °C and requires 5 watts of power. The standard RAD750 single-board system (CPU and motherboard) can withstand 1,000 gray and temperature ranges between –55 °C and 70 °C and requires 10 watts of power.

The guts:

Credit: Peter Vis

Original photograph from Peter Vis’ site here.