How to train with custom datset ? #21

Unmesh28 · 2022-07-02T03:46:25Z

Hi, I am quiet new to this. I am looking for step by step guide to train custom dataset OR Train on AVSpeech dataset and finetune for other videos. Steps can be :

Download the dataset
Clean and convert to 25fps [If 30fps what should be done]
Train
Finetune on custom videos
Test

I think such guide will help a lot of people not get confused.

Thank You.

ghost · 2022-07-02T06:48:54Z

download dataset
convert to 25fps.
change sample rate to 16000hz.
split video less than 5s.
using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].
detect faces.
train expert_syncnet with evaluation loss < 0.25 then you can stop your training
train wav2lip model

Unmesh28 · 2022-07-02T06:57:38Z

@primepake Thanks.

But I was looking for the codes to run for each step with directory structure and other related info required.

ghost · 2022-07-02T10:30:11Z

you should process your dataset carefully, It will effect your training

Unmesh28 · 2022-07-02T11:12:02Z

Yeah, thats why looking for a step by step guide from you. It would really help me. Can you provide a doc or readme or something ? So that anybody can just follow the steps and start training. I have gone through and done training using the 96*96 wav2lip repo, but looking for more res results.

lsw5835 · 2022-07-07T14:24:04Z

Hi @primepake , Thanks for your comments.
Could you let me know the location of ' syncnet_python ' file to filter out the dataset?

Unmesh28 · 2022-07-07T14:44:36Z

@donggeon I think he meant "color_syncnet_train.py" this file.

If you have figured out previous steps can u tell me what u did exactly ?

ghost · 2022-07-07T18:43:21Z

https://github.com/joonson/syncnet_python
you can use this repo

lsw5835 · 2022-07-08T03:16:34Z

https://github.com/joonson/syncnet_python you can use this repo

Thanks for your answering. I was wondering if you could give me some more detailed instructions.
Because we detected faces using preprocessing code in wav2lip, it seems that we only need to check sync using some functions of syncnet python code.

Unmesh28 · 2022-07-12T05:17:50Z

@primepake Can you please share detailed instructions for training ?

ghost · 2022-07-12T06:07:19Z

I will release the code, it's a ton of code

lsw5835 · 2022-07-12T06:11:21Z

Thanks @primepake, it'd be a great help.

Unmesh28 · 2022-07-12T06:18:12Z

I will release the code, it's a ton of code

Thanks 👍🏻

skyler14 · 2022-07-14T07:46:50Z

do you also have your checkpoint from the avspeech runs, prior to running on your private dataset? I'm interested in comparing how it turned out on your end vs training via the instructions you provide.

ghost · 2022-07-14T10:00:53Z

I will public pretrained on avspeech

skyler14 · 2022-07-14T17:29:31Z

Great, thank you. Can you also leave an estimate of the gpu hardware and compute time it took for you to do the checkpoint training and finetuning.

ghost · 2022-07-19T06:33:07Z

I used 10 GPU A6000 with nearly 200GB memory of GPU

ghost · 2022-07-19T06:33:25Z

the trial and error just in a day

skyler14 · 2022-07-19T06:52:21Z

ok, great. For the public avspeech pretrained checkpoint is it being put in the repo, as a link in the readme, or just here in the issues?

ghost · 2022-07-19T07:02:17Z

for some reasons, I will public another day

Unmesh28 · 2022-07-19T07:15:21Z

@primepake When can you upload detailed training instructions for preprocessing with codes for each step ?

crazyxprogrammer · 2022-07-25T03:12:39Z

Can you please detailed instructions for https://github.com/joonson/syncnet_python. How to use this repo.

sylvie-lauf · 2022-07-26T03:58:24Z

download dataset

detect faces

convert 25fps

using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1]

train expert_syncnet with evaluation loss < 0.25 then you can stop your training

train wav2lip model

Is this the correct order ?

ghost · 2022-07-26T05:09:31Z

yes

skyler14 · 2022-07-26T18:26:24Z

Any update on the avspeech only checkpoint?

ghost · 2022-07-27T05:57:27Z

hi, I updated my preprocessing step, sorry about missing ordering

skyler14 · 2022-07-29T00:08:28Z

How many videos did you need per the method you wrote for the fine-tuning step?

crazyxprogrammer · 2022-07-29T06:41:35Z

can you give more details about this 4th step (split video less than 5s.). Is this step included in clean_data.py. And in the fifth step (using syncnet_python to filter dataset in the range [-3, 3]) should I have to only filter the dataset on the basis of offset given by Syncnet_python or Should I have to correct Synchronization.

Unmesh28 · 2022-07-29T09:16:58Z

download dataset

convert to 25fps.

change sample rate to 16000hz.

split video less than 5s.

using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].

detect faces.

train expert_syncnet with evaluation loss < 0.25 then you can stop your training

train wav2lip model

@primepake what do you mean by "split video less than 5s" ? Does it mean split longer videos in smaller videos with duration less than 5 sec ?

ghost · 2022-07-29T15:06:53Z

lip-sync expert has many problems, you need to find it. As the author mentioned, it doesn't care about similarity between frames. You need to read the paper to understand more.

Does not reflect the real-world usage. As discussed before,
during generation at test time, the model must not change the pose,
as the generated face needs to be seamlessly pasted into the frame.
However, the current evaluation framework feeds random reference
frames in the input, thus demanding the network to change the
pose. Thus, the above system does not evaluate how the model
would be used in the real world.

sylvie-lauf · 2022-07-30T04:46:56Z

@primepake I want to buy your model. Can you please share details on : sylvie.nexus11@gmail.com

skyler14 · 2022-08-01T19:03:44Z

How many videos did you need per the method you wrote for the fine-tuning step?

just was wondering if I could get this estimate for the fine-tuning after AVspeech (videos and/or minutes of footage)

Also any updates on the AVspeech checkpoint?

Unmesh28 · 2022-08-04T04:01:12Z

When I am running syncnet python getting below error :

WARNING: Audio (3.6720s) and video (3.7200s) lengths are different. Traceback (most recent call last): File "run_syncnet.py", line 40, in <module> offset, conf, dist = s.evaluate(opt,videofile=fname) File "/home/ubuntu/wav2lip_288x288/syncnet_python/SyncNetInstance.py", line 112, in evaluate im_out = self.__S__.forward_lip(im_in.cuda()); File "/home/ubuntu/wav2lip_288x288/syncnet_python/SyncNetModel.py", line 108, in forward_lip out = self.netfclip(mid); File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x278528 and 512x512)

Does anybody know how to resolve this ?

skyler14 · 2022-08-09T23:06:35Z

@primepake thanks for the note about fine-tuning ,do you have any updates on:
Roughly how many videos, GPU hardware, and compute time it took to do the AVSpeech only training
Status of adding the AVSpeech checkpoint

Also I was wondering if fine-tuning generally needed to be done on a per person basis or your propreity data was just alot of people all as one fine-tuned model?

Unmesh28 · 2022-08-22T08:00:13Z

@primepake I guess the issues are not clear yet, why did you close them ?

ghost · 2022-08-22T08:06:33Z

this is the problem in your code, you have to figure it out yourself. Just take a screenshot and leave it here so we can't solve it. thank you

Unmesh28 · 2022-08-22T08:18:32Z

I am using your exact code, haven't changed it

ghost · 2022-08-22T08:54:12Z

did you change size input in inference file?

Unmesh28 · 2022-08-22T11:24:08Z

No, I did not change it. Have kept it as it is.

The only thing I tried changing is img_size = 288 in hparams.py
I tried changing it to 192 when suggested by you , but getting same error for 288 and 192.

ghost · 2022-08-22T14:48:09Z

you need to change args.img_size = 288 in inference.py

aishoot · 2022-08-23T02:32:17Z

download dataset

convert to 25fps.

change sample rate to 16000hz.

split video less than 5s.

using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].

detect faces.

train expert_syncnet with evaluation loss < 0.25 then you can stop your training

train wav2lip model

Thanks for your nice work. I want to ask why “split video less than 5s”? What effect does it have on the results? I split videos maximum of 20s, is that ok?

ghost · 2022-08-23T02:47:21Z

download dataset

convert to 25fps.

change sample rate to 16000hz.

split video less than 5s.

using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].

detect faces.

train expert_syncnet with evaluation loss < 0.25 then you can stop your training

train wav2lip model

Thanks for your nice work. I want to ask why “split video less than 5s”? What effect does it have on the results? I split videos maximum of 20s, is that ok?

To understand more, you should read the paper but if the length of video is too long, it can lead to duplicate sound in this video so the positive pair and negative can be the same in a high probability

aishoot · 2022-08-23T03:06:42Z

Thanks

wllps1988315 · 2023-01-30T07:58:56Z

I used 10 GPU A6000 with nearly 200GB memory of GPU

how long have you trained expert_syncnet and wav2lip using AVspeech?

ldz666666 · 2023-09-28T01:32:56Z

download dataset

convert to 25fps.

change sample rate to 16000hz.

split video less than 5s.

using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].

detect faces.

train expert_syncnet with evaluation loss < 0.25 then you can stop your training

train wav2lip model

Hi, why we need to split video less than 5s to train the syncnet, what if i train with longer video clips which are about 1 min ?

easonhyx · 2023-12-14T13:07:04Z

I will public pretrained on avspeech
Dear author, may I ask if the model pre-trained on the AVSpeech dataset can be made public?If there is a plan to make it public, may I ask when it will be available?

1129571 · 2024-03-04T02:44:19Z

download dataset

convert to 25fps.

change sample rate to 16000hz.

split video less than 5s.

using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].

detect faces.

train expert_syncnet with evaluation loss < 0.25 then you can stop your training

train wav2lip model

Hello, I would like to know if the filter dataset in range [-3, 3] you mentioned here refers to offset, conf, or dist in the syncnet_python project?
My current understanding is that:

Offset in [-3, 3]?
Confidence in [6, 9]?
Can I refer to this issue in the original wav2lip? Advice on sync correcting videos Rudrabha/Wav2Lip#91

Is my understanding correct?

Unmesh28 mentioned this issue Jul 12, 2022

Preprocessing of AvSpeech dataset #14

Closed

ghost mentioned this issue Jul 27, 2022

custom dataset #24

Closed

Unmesh28 mentioned this issue Aug 12, 2022

RuntimeError: Calculated padded input size per channel: (2 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size #27

Closed

ghost closed this as completed Aug 22, 2022

ghost mentioned this issue Aug 24, 2022

Fake/Real stay at 0.69 Rudrabha/Wav2Lip#314

Closed

hannarud mentioned this issue Dec 14, 2022

Overfitting #30

Closed

shengzewen mentioned this issue Aug 23, 2023

Train the expert discriminator to report errors #68

Closed

shengzewen mentioned this issue Sep 4, 2023

损失值一直下降不下来 #73

Closed

ghost mentioned this issue Dec 9, 2023

Sorry，this not a issue! i want to know how to train my own model,thanks! #85

Closed

Nyquist0 mentioned this issue Dec 26, 2023

Use Chinese dataset to train expert lip-sync discriminator #81

Closed

sylyt62 mentioned this issue Jan 31, 2024

Syncnet training? Chilcken and egg? #119

Open

This was referenced May 13, 2024

Syncnet loss does not converge #146

Open

Need Help lililuya/Wav2Lip288#3

Closed

This issue was closed.

How to train with custom datset ? #21

How to train with custom datset ? #21

Comments

Unmesh28 commented Jul 2, 2022 • edited Loading

ghost commented Jul 2, 2022 • edited by ghost Loading

Unmesh28 commented Jul 2, 2022

ghost commented Jul 2, 2022

Unmesh28 commented Jul 2, 2022

lsw5835 commented Jul 7, 2022

Unmesh28 commented Jul 7, 2022

ghost commented Jul 7, 2022

lsw5835 commented Jul 8, 2022

Unmesh28 commented Jul 12, 2022

ghost commented Jul 12, 2022

lsw5835 commented Jul 12, 2022

Unmesh28 commented Jul 12, 2022

skyler14 commented Jul 14, 2022 • edited Loading

ghost commented Jul 14, 2022

skyler14 commented Jul 14, 2022

ghost commented Jul 19, 2022

ghost commented Jul 19, 2022

skyler14 commented Jul 19, 2022

ghost commented Jul 19, 2022

Unmesh28 commented Jul 19, 2022 • edited Loading

crazyxprogrammer commented Jul 25, 2022

sylvie-lauf commented Jul 26, 2022

ghost commented Jul 26, 2022

skyler14 commented Jul 26, 2022

ghost commented Jul 27, 2022

skyler14 commented Jul 29, 2022

crazyxprogrammer commented Jul 29, 2022 • edited Loading

Unmesh28 commented Jul 29, 2022 • edited Loading

ghost commented Jul 29, 2022

sylvie-lauf commented Jul 30, 2022

skyler14 commented Aug 1, 2022

Unmesh28 commented Aug 4, 2022

skyler14 commented Aug 9, 2022

Unmesh28 commented Aug 22, 2022 • edited Loading

ghost commented Aug 22, 2022

Unmesh28 commented Aug 22, 2022

ghost commented Aug 22, 2022

Unmesh28 commented Aug 22, 2022

ghost commented Aug 22, 2022

aishoot commented Aug 23, 2022

ghost commented Aug 23, 2022

aishoot commented Aug 23, 2022 • edited Loading

wllps1988315 commented Jan 30, 2023

ldz666666 commented Sep 28, 2023 • edited Loading

easonhyx commented Dec 14, 2023

1129571 commented Mar 4, 2024 • edited Loading

Unmesh28 commented Jul 2, 2022 •

edited

Loading

ghost commented Jul 2, 2022 •

edited by ghost

Loading

skyler14 commented Jul 14, 2022 •

edited

Loading

Unmesh28 commented Jul 19, 2022 •

edited

Loading

crazyxprogrammer commented Jul 29, 2022 •

edited

Loading

Unmesh28 commented Jul 29, 2022 •

edited

Loading

Unmesh28 commented Aug 22, 2022 •

edited

Loading

aishoot commented Aug 23, 2022 •

edited

Loading

ldz666666 commented Sep 28, 2023 •

edited

Loading

1129571 commented Mar 4, 2024 •

edited

Loading