Researchers train neural network to generate realistic videos lip-synced to audio clips

In a video conference, the users can avoid interruptions from video feeds timing out because of connectivity problems.

Researchers from the University of Washington have developed an algorithm that generates realistic looking video based on audio input. An artificial neural network was first trained using hours of videos to generate realistic looking mouth shapes, that are synced to the audio files. The shapes are then superimposed seamlessly on an existing video of the person involved. The system was tested using audio and video files of Barack Obama available in the public domain.

Steve Seitz, co-author of the paper says "When you watch Skype or Google Hangouts, often the connection is stuttery and low-resolution and really unpleasant, but often the audio is pretty good. So if you could use the audio to produce much higher-quality video, that would be terrific." The researchers made sure to get the mouth area absolutely right, to prevent the resulting video from looking unnatural.

Image: University of Washington

Image: University of Washington

There are a number of potential applications of the algorithm. Users can have realistic conversations with historical figures, in a virtual reality environment. In a video conference, the users can avoid interruptions from video feeds timing out because of connectivity problems. The system can also be used to generate high quality video feeds, even if the base video is not of a high resolution.

Find latest and upcoming tech gadgets online on Tech2 Gadgets. Get technology news, gadgets reviews & ratings. Popular gadgets including laptop, tablet and mobile specifications, features, prices, comparison.