Machine learning creates realistic videos from just a few photographs

May 25, 2019 • 11:00 am

From, via reader Rick, we learn that AI programs have become so sophisticated that they can create motion, and videos, from as little as one photograph of a human face, though of course more photographs gives a better result. From the report:

The model is documented in a paper published by Samsung AI Center, which you can read here on Arxiv. It’s a new method of applying facial landmarks on a source face — any talking head will do — to the facial data of a target face, making the target face do what the source face does.

. . .The new paper by Samsung’s Moscow-based researchers, however, shows that using only a single image of a person’s face, a video can be generated of that face turning, speaking and making ordinary expressions — with convincing, though far from flawless, fidelity.

It does this by frontloading the facial landmark identification process with a huge amount of data, making the model highly efficient at finding the parts of the target face that correspond to the source. The more data it has, the better, but it can do it with one image — called single-shot learning — and get away with it. That’s what makes it possible to take a picture of Einstein or Marilyn Monroe, or even the Mona Lisa, and make it move and speak like a real person.

Here’s an example of what they can do with one shot of the Mona Lisa:

But that’s about it for La Giocanda:

That said, it’s remarkable that it works as well as it does. Note, however, that this only works on the face and upper torso — you couldn’t make the Mona Lisa snap her fingers or dance. Not yet, anyway.

Here is an explanation of how it’s done, as well as some examples of what the programs can do when trained on multiple photographs. Imagine the fake videos that will ensue!


57 thoughts on “Machine learning creates realistic videos from just a few photographs

  1. We’re really on the cusp of deepfakes for both video and audio. Alert thinkers will now have to question the authenticity of every video/audio clip we see. We’re used to this kind of skepticism when reading a news article, say, but we now need to condition ourselves to doubt not only what we read, but what we see and hear.

  2. Using AI to deceive people is the most immediate threat posed by the technology, IMO.

    I was concerned about this before the 2016 U.S. elections, then after the election I realized you don’t need “A” or “I” to dupe a massive number of people, you just need the bandwidth that the internet already provides. This fake video tech will crank the deception game up a couple notches.

    The Faux News Channel is already using low-tech deceptively edited videos on its gullible audience:

    1. The only politicians who will remain immune to character assassination via doctored and deepfake videos are those who regularly spout nonsense and idiocy in real life. You all know who I mean.

      1. Politicians (or anyone promoting anything) that use deceptive tactics will tend to attract the naive and poorly educated.

        Similar to the way things are now but amplified.

  3. The roaming beard segment is amusing – the system is seeing the beard as an object detached from the face behind it. The eye movements & tracking are almost perfect although one subject goes crosseyed briefly when the head turn & eye tracking lose touch with each other.

    It will not be long before we’ll have hi res 3D concert footage [& HiFi audio] of all the music artistes we love. The devil will have kept his end of the deal with Robert Johnson. [at the crossroads, Rosedale, Mississippi according to some]

    From the video description I copy/paste the following “statement regarding the purpose and effect of the technology (NB: this statement reflects personal opinions of the authors and not of their organizations)”:

    We believe that telepresence technologies in AR, VR and other media are to transform the world in the not-so-distant future. Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities. It will distribute jobs more fairly and uniformly around the World. It will better connect relatives and friends separated by distance. To achieve all these effects, we need to make human communication in AR and VR as realistic and compelling as possible, and the creation of photorealistic avatars is one (small) step towards this future. In other words, in future telepresence systems, people will need to be represented by the realistic semblances of themselves, and creating such avatars should be easy for the users. This application and scientific curiosity is what drives the research in our group, including the project presented in this video.

    We realize that our technology can have a negative use for the so-called “deepfake” videos. However, it is important to realize, that Hollywood has been making fake videos (aka “special effects”) for a century, and deep networks with similar capabilities have been available for the past several years (see links in the paper). Our work (and quite a few parallel works) will lead to the democratization of the certain special effects technologies. And the democratization of the technologies has always had negative effects. Democratizing sound editing tools lead to the rise of pranksters and fake audios, democratizing video recording lead to the appearance of footage taken without consent. In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection alongside with the ongoing shift for privacy and data security in major IT companies.

    I highly doubt the bolded bit!

      1. Technerd weak excuses & typical of their general attitude denying responsibility – rather like the popular misquote used to justify hacking & music piracy: “information wants to be free.” I do wonder if the narrowness of focus in tech education leaves these technerd people blind to what history has taught us.

        1. They are driven to solve a problem. This narrows their focus. As they get more and more into solving the problem and the little problems that arise as they go forward, they get deeper and deeper in their focus. What makes them able to imagine a better world paradoxically gets in their way of imagining an evil one. It’s not so much their focus is narrow in general, but it’s their focus is narrow on solving one problem. It’s the curse of an analytical mind.

  4. “Imagine he fake videos that will ensue”. What, like the one being peddled by tRump supporters claiming to show a drunk Nancy Pelosi that Assbook* admits is fake but refuses to take down? Something like that, maybe?

    *not the real name of the social media company, but it might as well be.

  5. I can’t think of any positive uses for this tech that come within a million miles of balancing out its negative uses.

    It’s very impressive from a technological standpoint, but what they’ve done is invent a weapon. The fact that you can make it do Cool Things, and make the Mona Lisa move her face, camouflages that fact from people but the truth is that this is one of the most dangerous things ever invented, possibly in the history of the human race.

    People will argue that there are benign uses for it, sure, in the same way that they argued there were benign uses for nuclear weapons.
    And people will argue that we’ll always be able to invent counteractive tech that can distinguish fakes from the real thing, as though the arms race between truth and falsehood will always throw up a solution that keeps us ahead. But that’s too close to the argument that we shouldn’t worry so much about climate change because something’ll come up eventually, and that we’re ‘an inventive species’ or ‘we’re problem-solvers’.

    This is civilisationally dangerous tech here, the kind that can break things in a very big way.

    1. Positive use: Orson Welles and Humphrey Bogart as Nero Wolfe and Archie Goodwin, just for starters.

      And we’ve already invented the tech needed to reliably distinguish authentic videos from fakes. It’s the same tech you use to log into WEIT securely via HTTPS. It doesn’t matter how convincing the fake is; if it isn’t digitally signed by a trusted authority, assume it’s fake.

      If we get to a point where digital signatures can be easily faked, then nothing on the internet is secure, and any kind of commerce beyond face-to-face barter comes to a halt. Fake video is small potatoes by comparison.

      1. D’you think if Trump’s supporters saw a deep-fake of, say, Obama and Clapper discussing how they were going to ‘bring down’ Trump that some wonk in the MSM telling them it’s not real would convince them?

        We already have hundreds of objectively irrefutable ways of countering normal disinformation, but they run up against the fact that people aren’t objective – and the people this kind of deep fake tech will be aimed at are the least objective of all.

        They get pumped up enough to go and shoot up a pizza parlour by the existence of a few conspirational tweets and rambling YouTubers – how much more pumped up will they be by actual video evidence of their worst fears?

        And how much effect do you think a programmer deconstructing and debunking said video evidence will have on them? Especially since deconstructing and debunking that video evidence will presumably involve pointing out tiny, subtle flaws that are only reliably noticed by _other software programs?_

        Re. the use of deep-fakes to stitch in different actors for different roles, personally I don’t see that being particularly popular. People are attracted by the work of a certain actor; the knowledge that it’s not actually them takes away that appeal. It’s the same as those AI paintings, where some computer program creates an entirely new ‘Rembrandt’. As soon as people know there was no human being behind it it becomes vacuous. Same with AI-made artworks in general. The connection to the artist, the knowledge that there were intentions and desires and artistic visions and influences behind the artwork is a huge part of what makes it interesting. Without those elements it’s just a kind of tech demo, a curiosity.

        1. What I expect is that you won’t need an IT guy or forensic video analyst to tell you whether a video is untrustworthy; your browser, video player, or email client will do it for you automatically and throw up a security warning or certificate error, just as they do now with untrustworthy websites and other suspicious content.

          Anyone clicking past such a warning and uncritically accepting whatever they see has already committed to believing the worst about their political enemies, and facts be damned. I can’t see fake video making much of a difference to their worldview.

          I would also expect that platforms such as Facebook and Twitter will put measures in place to prevent easy sharing and retweeting of untrustworthy video. Conspiracists will still be able to repost them if they really want to, but it will take some tech savvy to do so, so the danger of such videos going viral through impulsive sharing should be much reduced.

          Again, this isn’t rocket science; we already know how to implement these sorts of controls. If we fail to do so, it will our fault, and not due to an inherent danger in the technology.

          1. What I expect is that you won’t need an IT guy or forensic video analyst to tell you whether a video is untrustworthy; your browser, video player, or email client will do it for you automatically and throw up a security warning or certificate error, just as they do now with untrustworthy websites and other suspicious content.

            Fake videos exist now, easily accessible fake detection (e.g. embedded in browser or video client) doesn’t.

            It’s an arms race. Technology enables fake videos, then technology enables detection of fake videos, then new technology makes videos not detectable with existing technology, and so on.

            Facebook’s current policy is to label a fake video as fake, but not to take it down. There will be a lag time between posting fake video and labelling fake video as fake.

            1. Exactly. I’d also point out that unless we come up with 100% accurate detection software, that is universally available, and that is immune to being overtaken by future deepfake tech, then the reliability of video footage will be damaged permanently.

              Deepfake tech doesn’t need to be close to common for it to have an effect on public trust, and it doesn’t even need to be consistently convincing for people to start questioning _real_ footage and dismissing it as fake – which is the other, just as pernicious side of the coin from accepting fake footage as real.

              1. I think the only “solution” (if you want to call it a solution) is trusted sources, not trusted software.

                Technology can’t stop deception – it enables it more than it thwarts it.

                And then there’s the issue that so many people seem to thrive on being deceived (the Alex Jones and QAnon crowd, and to a lesser extent the Fox News audience).

              2. Trusted sources are exactly what I’m talking about. And the best way to know if a video (or any other digital content) is from a trusted source is to have the source sign it with a digital certificate that can be checked by browsers and other apps.

                I’m specifically not talking about an arms race between fake video creators and forensic video analysis. Sorry for not being sufficiently clear about that.

              3. Agree. I think digital signatures and such will play an increasingly important role. It will be up to Google (Youtube) and Facebook to push that protocol into users faces – signed videos would get a badge like “Authenticated by NBC” or something like that.

                But how does someone like Joe Biden make a video and sign it? I guess in the short term the signer would have to have a domain name, then youtube or a video client would indicate “Authenticated by”.

                I don’t think this infrastructure is going to be in place for the election.

                It’s going to be an interesting year.

          2. facebook is the news aggregator site on the net followed closely by Twitter – 43% of facebook users report facebook as where they primarily ‘consume’ news. facebook takes little responsibility for what is fed to users via their medium & Zuckerberg has no interest in truth checking on his ads-driven platform. The biggest brake on his technerd boy world outlook is the shareholders, some of whom out-technerd Zuckerberg. These people are about profits in the main & I’m not sure we can trust the platform to police itself – there record is appalling & I doubt they’ll invest in the hordes of more humans required to police the platform. In fact they shouldn’t police it themselves, but we don’t have effective independent institutions in place to achieve progress. Also people are divided on how [or whether] to clean up the this mess.

            I am confused by your bit about SSL Certificates – these certificates have nothing to do with the news propaganda wars as played by the likes of the State-owned & other ‘news’ sites. If RT wants to use a deepfake video to push a government line or counter the opposition, I’m sure they will do so [probably already have, but I haven’t checked]

            Who guards the guards?

  6. Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky:

    In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection

    The fatal flaw in their statement is the assumption that people will want to avoid being deceived and use spoof detection. At this point it’s clear there’s a high percentage of people that don’t want to leave their echo chambers. This technology will enable echo chambers turned up to eleven.

      1. I believe you. 🙂

        There’s a corollary to Rule 34 that says, ‘If an actress is famous, there will be porn fakes of her’

        Or so I’ve heard…


        1. Sure, you heard it from a friend, while dispassionately discussing technology over cheese and biscuits…

          “Terrible. Just terrible. The violations of these women’s privacy by sickening perverted geeks.”

          “Yes, awful.”

          “It is. It really is. (infinite improbability hums to himself.). By the way, how do you spell Zooey Deschanel?”


          “No reason. Forget it. (yawns dramatically). Anyway, I’m off to bed.”

          “But it’s five in the afternoon.”

          “Early rise tomorrow.”

  7. Can’t wait for the Russians to get their hands on this. This tech doesn’t bode well for the future. One of many things I’ve learned since the Pornstar President came into office is how ignorant millions of Americans are, and how fragile our democracy is; little did I know that so much of our democracy rests on truth, good will and decorum. This tech is a danger to democracy.

    1. “Can’t wait for the Russians to get their hands on this.”

      I have bad news for you. The authors of the paper are Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky.


  8. Despite all the reservations about the possible abuse of this technology, I really liked the animated versions of La Gioconda, aka Mona Lisa, or since in the Louvre: la Joconde.
    I think it is a great painting, but slightly overrated. The mystery about what she’s smiling about almost certainly accounts for that.

        1. Not that I know of. But there could be. What will be interesting is when they start animating De Kooning’s Women – might need a special AI for those.

        2. There are animations of van Gogh’s art and Van Gogh himself, but they were done by paint artists for the film, Loving Vincent.

    1. “The mystery about what she’s smiling about almost certainly accounts for that.”

      There is of course one well-known theory that the smug smile is because she’s just found out she’s pregnant.

      There’s an equally credible theory that she’s just found out she isn’t. 🙂

      I liked the animation, by the way.


    2. The animations of her are amazing. It’s close to black magic, reanimating the dead. No wonder it freaks people out.

  9. There’s something about that middle animated Mona Lisa that reminds me of Mary Elizabeth Mastrantonio when she was in The Abyss. I’m expecting to see the water creature slide in from out of frame….

  10. With this primitively doctored video of Ms Pelosi, that scores of people (deplorables) take for authentic, what are not the possibilities with this technology?

  11. Scary. This will make it so that the only speech you can trust is that which you observe in person. Until AI gurus manage perfect android avatars, that is.

Leave a Reply to Saul Sorrell-Till Cancel reply