Wav2Lip: Accurately Lip-sync Videos to Any Speech


In our paper, A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild, ACM Multimedia 2020, we aim to lip-sync unconstrained videos in the wild to any desired target speech. Current works excel at producing accurate lip movements on a static image or videos of specific people seen during the training phase. However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio. We identify key reasons pertaining to this and resolve them by learning from a powerful lip-sync discriminator. Extensive quantitative evaluations on our challenging benchmarks show that the lip-sync accuracy of the videos generated by our Wav2Lip model is almost as good as real synced videos. Please check out our paper for more details about the model and also our novel evaluation framework.


Interactive Demo

Or choose from the example pairs below!

Using our open-source code, you can attempt to lip-sync higher resolution/longer videos. You will be able to tune the inference parameters and hence obtain a much better result for the same inputs.

Note: If you do not get back a video result, it most likely means that the face detector could not detect faces in all the input video frames. This can sometimes happen for animated movie clips. It may take some time (not more than a minute usually) to generate the results! All results are currently limited to (utmost) 480p resolution and will be cropped to max. 20s to minimize compute latency. This interactive site is only an user-friendly demonstration of the bare minimum capabilities of the Wav2Lip model.

Examples you can instantly try:

Unsynced Video Input Target Audio Submit

Disclaimer

All results from this demo website or the open-source code should only be used for research/academic/personal purposes only. As the models are trained on the LRS2 dataset, any form of commercial use is strictly prohibhited. Please contact us for further queries.

Ethical use

To ensure fair use, we strongly require that any result created using this site or our code must unambiguously present itself as synthetic and that it is generated using the Wav2Lip model. In addition, to the strong positive applications of this work, our intention to completely open-source our work is that it can simultaneously also encourage efforts in detecting manipulated video content and their misuse. We believe that Wav2Lip can enable several positive applications and also encourage productive discussions and research efforts regarding fair use of synthetic content.