Need Help #3

kavita-gsphk · 2024-05-19T22:50:00Z

It seems like you have tried to train a modified version of wav2lip_288*288. It would be a great help if you could help me with the below problem.

I am training syncnet on avspeech dataset with train_syncnet_sam.py from above mention repo. My training loss is stuck at 0.69 even after 500k steps. Lr and bs are 5e-5 and 64 , respectively.

I have tried different lr but it didn't work. How can I solve this problem based on your experience?

For preprocessing, I followed all the steps suggested here except the video split part. My videos average length is 7.1s (videos are in range 0-15s) and total length of training dataset is roughly 30.5hr

Thank you so much!

The text was updated successfully, but these errors were encountered:

lililuya · 2024-05-20T07:21:29Z

I 've encountered the same problem in our own dataset. In my view, the loss stick in 0.69 strongly correlated to the dataset, the loss of training wav2lip288 or 384 can be converge in LRS2 dataset. I got the final loss at about 0.32 in step 130856 , but as you know the LRS2 is a low resolution dataset, the result is not good.
As the author mentioned，the dataset need to be carefully processed in following steps:

1.download dataset
2.convert to 25fps.
3.change sample rate to 16000hz.
4.split video less than 5s.
5.using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].
6.detect faces.

You may need carefully process your dataset. But the loss of syncnet is really hard to converge, the paper 《SIDGAN：High-Resolution Dubbed Video Generation via Shift-Invariant Learning》give an explaination of why in high resolution, the syncnet is not stable. Hope this can help you.

kavita-gsphk · 2024-05-20T19:43:58Z

Thank you so much for the response. I will look into it.

lililuya · 2024-05-21T03:49:49Z

Thank you so much for the response. I will look into it.

By the way, the avspeech maybe so dirty, when you process the data, you need carefully consider the scenario where contrains more than one people and the audio is not the specific speaker(You need the GT to be True, Or it's nonsense).

Becasue of the fully convolutional structure and GAN-based method, this has a huge negative impact on training the syncnet. Maybe you can train first on clear dataset like LSR2 or Voxceleb2.
You can also check the case mentioned above, just random sample some video case feed in sync_python to get the confidence and AV offset.
I've search some methods about correct the offset, I lost the link, so I paste this pic here
I used avspeech train the syncnet before, so I get some record, please don't mind if it is useless
Finally, you can also refer the issue in Wav2lip288 or 384 repo to get answer.

kavita-gsphk · 2024-06-09T20:45:15Z

My loss has finally started decreasing from 0.69, although it's progressing very slowly. Thank you for all your help.

I noticed you attempted to implement learning from the SIDGAN paper in your repository. Were you able to achieve any results with that?

lililuya · 2024-06-10T03:34:48Z

Hi, happy to hear the loss decreasing, As for SIDGAN, the implemention of the core-part APS is in repo, but the oritention of the filter is only limited at vertical direction, just like the paper mentioned:

So, you need implement a horizontal version of APS filter or you can get in touch with the author to the detail. I finally failed to add this part in wav2lip. Sorry for unable to provide you with more effective help.

lililuya closed this as completed Oct 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need Help #3

Need Help #3

kavita-gsphk commented May 19, 2024

lililuya commented May 20, 2024 •

edited

Loading

kavita-gsphk commented May 20, 2024

lililuya commented May 21, 2024 •

edited

Loading

kavita-gsphk commented Jun 9, 2024

lililuya commented Jun 10, 2024

Need Help #3

Need Help #3

Comments

kavita-gsphk commented May 19, 2024

lililuya commented May 20, 2024 • edited Loading

kavita-gsphk commented May 20, 2024

lililuya commented May 21, 2024 • edited Loading

kavita-gsphk commented Jun 9, 2024

lililuya commented Jun 10, 2024

lililuya commented May 20, 2024 •

edited

Loading

lililuya commented May 21, 2024 •

edited

Loading