Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Chinese dataset to train expert lip-sync discriminator #81

Closed
Amber-Believe opened this issue Nov 20, 2023 · 18 comments
Closed

Use Chinese dataset to train expert lip-sync discriminator #81

Amber-Believe opened this issue Nov 20, 2023 · 18 comments

Comments

@Amber-Believe
Copy link

I use Chinese data set to train expert lip-sync discriminator, and the train loss remains at 0.69. Do you have such a situation?How should this situation be resolved

@ghost
Copy link

ghost commented Nov 27, 2023

did you filter your data?

@Amber-Believe
Copy link
Author

您过滤了您的数据了吗?

What do we mean by filtering? So far I have processed the data to fps 25 audio with a sampling rate of 16khz

@Amber-Believe
Copy link
Author

did you filter your data?

The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now

@einsqing
Copy link

did you filter your data?

The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now

syncnet_v2 模型是英文模型,中文不适用,你需要训练一个中文的

@ghost
Copy link

ghost commented Nov 29, 2023

how's about your config? lr, bs?

@Amber-Believe
Copy link
Author

how's about your config? lr, bs?

All Settings are set by default, such as lr=1e-3 when training expert lip-sync discriminator
hparams.txt

@ghost
Copy link

ghost commented Nov 29, 2023

1e-3 is too large, you can choose 1e-4 or 1e-5

@Amber-Believe
Copy link
Author

1e-3 is too large, you can choose 1e-4 or 1e-5
Okay, I'll try

@Amber-Believe
Copy link
Author

1e-3 is too large, you can choose 1e-4 or 1e-5

Thank you very much for your advice. After the lr is adjusted to 1e-5, the loss begins to decrease. What is the appropriate learning rate for wav2lip training? 1e-4?

@ghost
Copy link

ghost commented Dec 1, 2023

1e-4 is good

@Amber-Believe
Copy link
Author

Thank you!

@ghost ghost closed this as completed Dec 1, 2023
@Nyquist0
Copy link

Hi @Amber-Believe.
May I ask what dataset are you using? Is it CMLR? LRS-1000? Or a private one?

@ChengsongLu
Copy link

ChengsongLu commented Dec 20, 2023

Hi @Amber-Believe. May I ask what dataset are you using? Is it CMLR? LRS-1000? Or a private one?

@Amber-Believe
BTW, is the eval loss below 0.3 after changing lr from 1e-3 to 1e-5? And is there anything else you have done to achieve that?

@Nyquist0
Copy link

Hi @primepake
I was re-directed to this page from #97

But I still did not figure out my question.
The dataset I am using is LRS2. Because the official wav2lip algorithm use that for training. So I am assuming that should be filtered. And I also randomly checked some audios and videos of the dataset. The wav files are in 16khz sample rate and video files are in 25 fps.

And I would like to ask when do you think the syncnet training would be converged... Will it still be stuck on 0.69 for a long time? (110k steps for me currently..)

Look for your reply. Thanks.

@ghost
Copy link

ghost commented Dec 25, 2023

again, how's about your lr? bs? num of gpus?

@Nyquist0
Copy link

LR is 1e-4, and BS is 64, 1 RTX A6000 GPU.

@Nyquist0
Copy link

@primepake Greetings!
I am trying to following the pipeline you proposed here

And may I ask how could you pre-process the video data? Are you using the preprocess code from the official wav2lip code?

@MarwanAj
Copy link

MarwanAj commented Jan 9, 2024

did you filter your data?

The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now

how much is the average length per video of the videos of the datasets ?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants