Use Chinese dataset to train expert lip-sync discriminator #81

Amber-Believe · 2023-11-20T02:45:30Z

I use Chinese data set to train expert lip-sync discriminator, and the train loss remains at 0.69. Do you have such a situation?How should this situation be resolved

ghost · 2023-11-27T12:25:03Z

did you filter your data?

Amber-Believe · 2023-11-29T01:36:48Z

您过滤了您的数据了吗？

What do we mean by filtering? So far I have processed the data to fps 25 audio with a sampling rate of 16khz

Amber-Believe · 2023-11-29T02:32:14Z

did you filter your data?

The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now

einsqing · 2023-11-29T03:16:05Z

did you filter your data?

The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now

syncnet_v2 模型是英文模型，中文不适用，你需要训练一个中文的

ghost · 2023-11-29T06:27:11Z

how's about your config? lr, bs?

Amber-Believe · 2023-11-29T06:51:57Z

how's about your config? lr, bs?

All Settings are set by default, such as lr=1e-3 when training expert lip-sync discriminator
hparams.txt

ghost · 2023-11-29T06:53:12Z

1e-3 is too large, you can choose 1e-4 or 1e-5

Amber-Believe · 2023-11-29T06:54:00Z

1e-3 is too large, you can choose 1e-4 or 1e-5
Okay, I'll try

Amber-Believe · 2023-11-30T05:37:14Z

1e-3 is too large, you can choose 1e-4 or 1e-5

Thank you very much for your advice. After the lr is adjusted to 1e-5, the loss begins to decrease. What is the appropriate learning rate for wav2lip training? 1e-4?

ghost · 2023-12-01T02:21:26Z

1e-4 is good

Amber-Believe · 2023-12-01T02:28:14Z

Thank you!

Nyquist0 · 2023-12-20T06:53:20Z

Hi @Amber-Believe.
May I ask what dataset are you using? Is it CMLR? LRS-1000? Or a private one?

ChengsongLu · 2023-12-20T09:05:08Z

Hi @Amber-Believe. May I ask what dataset are you using? Is it CMLR? LRS-1000? Or a private one?

@Amber-Believe
BTW, is the eval loss below 0.3 after changing lr from 1e-3 to 1e-5? And is there anything else you have done to achieve that?

Nyquist0 · 2023-12-25T06:55:46Z

Hi @primepake
I was re-directed to this page from #97

But I still did not figure out my question.
The dataset I am using is LRS2. Because the official wav2lip algorithm use that for training. So I am assuming that should be filtered. And I also randomly checked some audios and videos of the dataset. The wav files are in 16khz sample rate and video files are in 25 fps.

And I would like to ask when do you think the syncnet training would be converged... Will it still be stuck on 0.69 for a long time? (110k steps for me currently..)

Look for your reply. Thanks.

ghost · 2023-12-25T07:44:42Z

again, how's about your lr? bs? num of gpus?

Nyquist0 · 2023-12-25T09:12:45Z

LR is 1e-4, and BS is 64, 1 RTX A6000 GPU.

Nyquist0 · 2023-12-26T01:21:23Z

@primepake Greetings!
I am trying to following the pipeline you proposed here

And may I ask how could you pre-process the video data? Are you using the preprocess code from the official wav2lip code?

MarwanAj · 2024-01-09T14:55:01Z

did you filter your data?

The current data is in Chinese, including 50 different people and 7700 videos; The English data contains 40 people and 5,500 videos. Is it due to the small amount of data or other reasons. Our data has not been processed by syncnet_python. We found that the model syncnet_v2. model is not available. Can you provide it? That's the problem right now

how much is the average length per video of the videos of the datasets ?

ghost closed this as completed Dec 1, 2023

ghost mentioned this issue Dec 25, 2023

How many steps for SyncNet to converge? #97

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Chinese dataset to train expert lip-sync discriminator #81

Use Chinese dataset to train expert lip-sync discriminator #81

Amber-Believe commented Nov 20, 2023

ghost commented Nov 27, 2023

Amber-Believe commented Nov 29, 2023

Amber-Believe commented Nov 29, 2023

einsqing commented Nov 29, 2023

ghost commented Nov 29, 2023

Amber-Believe commented Nov 29, 2023

ghost commented Nov 29, 2023

Amber-Believe commented Nov 29, 2023

Amber-Believe commented Nov 30, 2023

ghost commented Dec 1, 2023

Amber-Believe commented Dec 1, 2023

Nyquist0 commented Dec 20, 2023

ChengsongLu commented Dec 20, 2023 •

edited

Loading

Nyquist0 commented Dec 25, 2023

ghost commented Dec 25, 2023

Nyquist0 commented Dec 25, 2023

Nyquist0 commented Dec 26, 2023

MarwanAj commented Jan 9, 2024

Use Chinese dataset to train expert lip-sync discriminator #81

Use Chinese dataset to train expert lip-sync discriminator #81

Comments

Amber-Believe commented Nov 20, 2023

ghost commented Nov 27, 2023

Amber-Believe commented Nov 29, 2023

Amber-Believe commented Nov 29, 2023

einsqing commented Nov 29, 2023

ghost commented Nov 29, 2023

Amber-Believe commented Nov 29, 2023

ghost commented Nov 29, 2023

Amber-Believe commented Nov 29, 2023

Amber-Believe commented Nov 30, 2023

ghost commented Dec 1, 2023

Amber-Believe commented Dec 1, 2023

Nyquist0 commented Dec 20, 2023

ChengsongLu commented Dec 20, 2023 • edited Loading

Nyquist0 commented Dec 25, 2023

ghost commented Dec 25, 2023

Nyquist0 commented Dec 25, 2023

Nyquist0 commented Dec 26, 2023

MarwanAj commented Jan 9, 2024

ChengsongLu commented Dec 20, 2023 •

edited

Loading