How much time do you need to lip sync a 10 sec or 1 minute video? #584

AIhasArrived · 2023-11-10T19:55:04Z

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming.
I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

sahreen-haider · 2023-11-13T19:49:08Z

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming. I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

well I have tried this repository with colab only and it seems fine if you are trying to merge a video of a certain length basically up to 25-30 secs, then anything after that is gonna take a lot of time.
Given that free version of colab gives you GPU: Nvidia K80/T4, GPU RAM: 12gb-16gb, TBH that's fine for free version.
For your reference lip syncing 1 minute long video would take anywhere from ~ 4 to ~ 5 minutes.

Rather if you want to increase the performance of the model, These are some of the things you can try:

use some alternative face recognition model a lighter one, The one which is used in the model is 'sfd' which is taken from another face detection model, Rather you can use there alternative model which is a faster than "sfd" that is "dlib"https://github.com/1adrianb/face-alignment.git.
checking If your video doesn't have many cuts between frames.
trying with lower resolution videos, since the model itself was trained on videos of resolution 720 p.

davidkundrats · 2023-11-15T22:21:44Z

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

AIhasArrived · 2023-11-17T08:11:49Z

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming. I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)
Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

well I have tried this repository with colab only and it seems fine if you are trying to merge a video of a certain length basically up to 25-30 secs, then anything after that is gonna take a lot of time. Given that free version of colab gives you GPU: Nvidia K80/T4, GPU RAM: 12gb-16gb, TBH that's fine for free version. For your reference lip syncing 1 minute long video would take anywhere from ~ 4 to ~ 5 minutes.

Rather if you want to increase the performance of the model, These are some of the things you can try:

use some alternative face recognition model a lighter one, The one which is used in the model is 'sfd' which is taken from another face detection model, Rather you can use there alternative model which is a faster than "sfd" that is "dlib"https://github.com/1adrianb/face-alignment.git.

checking If your video doesn't have many cuts between frames.

trying with lower resolution videos, since the model itself was trained on videos of resolution 720 p.

Thank you , will check it out.

AIhasArrived · 2023-11-17T08:12:13Z

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

Ok thanks will cehck it, might contact you again if needs be.

sahreen-haider · 2023-11-18T15:02:38Z

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

Ok thanks will cehck it, might contact you again if needs be.

Sure

AIhasArrived · 2023-11-20T00:14:13Z

Hello again @sahreen-haider
,
but how to change the model used for face recognition? That requires a quite bit of coding no?

sahreen-haider · 2023-11-20T17:04:41Z

Hello,

The model can be changed with the pertained model for face recognition which is another library,
And yes that will require a bit coding.

AIhasArrived · 2023-11-20T17:11:26Z

Is it possible to get help on that? (maybe send me the modified version by PM if you want it to stay not too much spread, I will only use it myself)
I just want a tool that does good lip sync, I have a nice GPU and woudl like to see if I can get some good results,
or maybe point me other/better/different tools I could try, It's desperating, I wish I can find the right tool

sahreen-haider · 2023-11-21T20:26:38Z

Hey @AIhasArrived,
I know it could be little difficult to get some good results from the model, since it could require some fine tuning and tampering of parameters, might Also have to change some of the code for the baseline libraries such as face detection and also for GAN (if Advanced towards a more High Definition output).

But I would require some significant time to do this grunt work, unfortunately I might not be able to do this at this time.

But rather you have asked for any alternatives for this,
https://www.sievedata.com/functions/sieve/video_retalking
the above url was posted by some person, it might be a possible alternate solution for your problem,
It is although not Way2Lip, But the issue stated that this alternative could produce much good results as compared to the existing library.

You might want to check it out.

sahreen-haider · 2023-11-22T08:05:53Z

@AIhasArrived, Connect with me over this email: sahreenhaider@gmail.com

AIhasArrived · 2023-11-22T08:07:34Z

Already did: sent you an email few days ago titled "Contact from github :)"

Manda69-bit · 2023-11-26T15:58:45Z

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming. I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

I can sync 8 sec video in like 15s and time could improve with better parameters. But ,when started i had really 4x slower time and i realized something was just wrong , starting chunks were loading really slow. After doing some research i realized problem is new Torch and GPU not working properly. By following other topics i did try with older versions ex " torch==2.0.1+cu118 and my chunk loading speed increased drastically. Hope it helps, and i hope they fix this shit with a new version.

AIhasArrived · 2023-12-08T17:06:16Z

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

Hello @davidkundrats I just tried this repo, it looks nice but when I run it I got into a problme (nothing happening while GPU is being used) did you get that problem yourself? and if yes what did you do to solve it? thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How much time do you need to lip sync a 10 sec or 1 minute video? #584

How much time do you need to lip sync a 10 sec or 1 minute video? #584

AIhasArrived commented Nov 10, 2023

sahreen-haider commented Nov 13, 2023

davidkundrats commented Nov 15, 2023

AIhasArrived commented Nov 17, 2023

AIhasArrived commented Nov 17, 2023

sahreen-haider commented Nov 18, 2023

AIhasArrived commented Nov 20, 2023

sahreen-haider commented Nov 20, 2023

AIhasArrived commented Nov 20, 2023

sahreen-haider commented Nov 21, 2023

sahreen-haider commented Nov 22, 2023

AIhasArrived commented Nov 22, 2023

Manda69-bit commented Nov 26, 2023 •

edited

Loading

AIhasArrived commented Dec 8, 2023

How much time do you need to lip sync a 10 sec or 1 minute video? #584

How much time do you need to lip sync a 10 sec or 1 minute video? #584

Comments

AIhasArrived commented Nov 10, 2023

sahreen-haider commented Nov 13, 2023

davidkundrats commented Nov 15, 2023

AIhasArrived commented Nov 17, 2023

AIhasArrived commented Nov 17, 2023

sahreen-haider commented Nov 18, 2023

AIhasArrived commented Nov 20, 2023

sahreen-haider commented Nov 20, 2023

AIhasArrived commented Nov 20, 2023

sahreen-haider commented Nov 21, 2023

sahreen-haider commented Nov 22, 2023

AIhasArrived commented Nov 22, 2023

Manda69-bit commented Nov 26, 2023 • edited Loading

AIhasArrived commented Dec 8, 2023

Manda69-bit commented Nov 26, 2023 •

edited

Loading