From TechCruch.com, via reader Rick, we learn that AI programs have become so sophisticated that they can create motion, and videos, from as little as one photograph of a human face, though of course more photographs gives a better result. From the report:
The model is documented in a paper published by Samsung AI Center, which you can read here on Arxiv. It’s a new method of applying facial landmarks on a source face — any talking head will do — to the facial data of a target face, making the target face do what the source face does.
. . .The new paper by Samsung’s Moscow-based researchers, however, shows that using only a single image of a person’s face, a video can be generated of that face turning, speaking and making ordinary expressions — with convincing, though far from flawless, fidelity.
It does this by frontloading the facial landmark identification process with a huge amount of data, making the model highly efficient at finding the parts of the target face that correspond to the source. The more data it has, the better, but it can do it with one image — called single-shot learning — and get away with it. That’s what makes it possible to take a picture of Einstein or Marilyn Monroe, or even the Mona Lisa, and make it move and speak like a real person.
Here’s an example of what they can do with one shot of the Mona Lisa:
But that’s about it for La Giocanda:
That said, it’s remarkable that it works as well as it does. Note, however, that this only works on the face and upper torso — you couldn’t make the Mona Lisa snap her fingers or dance. Not yet, anyway.
Here is an explanation of how it’s done, as well as some examples of what the programs can do when trained on multiple photographs. Imagine the fake videos that will ensue!