Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need Help #3

Closed
kavita-gsphk opened this issue May 19, 2024 · 5 comments
Closed

Need Help #3

kavita-gsphk opened this issue May 19, 2024 · 5 comments

Comments

@kavita-gsphk
Copy link

It seems like you have tried to train a modified version of wav2lip_288*288. It would be a great help if you could help me with the below problem.

I am training syncnet on avspeech dataset with train_syncnet_sam.py from above mention repo. My training loss is stuck at 0.69 even after 500k steps. Lr and bs are 5e-5 and 64 , respectively.

Screenshot 2024-05-13 at 14 39 28

I have tried different lr but it didn't work. How can I solve this problem based on your experience?

For preprocessing, I followed all the steps suggested here except the video split part. My videos average length is 7.1s (videos are in range 0-15s) and total length of training dataset is roughly 30.5hr

Thank you so much!

@lililuya
Copy link
Owner

lililuya commented May 20, 2024

I 've encountered the same problem in our own dataset. In my view, the loss stick in 0.69 strongly correlated to the dataset, the loss of training wav2lip288 or 384 can be converge in LRS2 dataset. I got the final loss at about 0.32 in step 130856 , but as you know the LRS2 is a low resolution dataset, the result is not good.
As the author mentioned,the dataset need to be carefully processed in following steps:

  • 1.download dataset
  • 2.convert to 25fps.
  • 3.change sample rate to 16000hz.
  • 4.split video less than 5s.
  • 5.using syncnet_python to filter dataset in range [-3, 3], model works best with [-1,1].
  • 6.detect faces.

You may need carefully process your dataset. But the loss of syncnet is really hard to converge, the paper 《SIDGAN:High-Resolution Dubbed Video Generation via Shift-Invariant Learning》give an explaination of why in high resolution, the syncnet is not stable. Hope this can help you.

@kavita-gsphk
Copy link
Author

Thank you so much for the response. I will look into it.

@lililuya
Copy link
Owner

lililuya commented May 21, 2024

Thank you so much for the response. I will look into it.

By the way, the avspeech maybe so dirty, when you process the data, you need carefully consider the scenario where contrains more than one people and the audio is not the specific speaker(You need the GT to be True, Or it's nonsense).

  1. Becasue of the fully convolutional structure and GAN-based method, this has a huge negative impact on training the syncnet. Maybe you can train first on clear dataset like LSR2 or Voxceleb2.
  2. You can also check the case mentioned above, just random sample some video case feed in sync_python to get the confidence and AV offset.
  3. I've search some methods about correct the offset, I lost the link, so I paste this pic here
    image
  4. I used avspeech train the syncnet before, so I get some record, please don't mind if it is useless
    image
  5. Finally, you can also refer the issue in Wav2lip288 or 384 repo to get answer.

@kavita-gsphk
Copy link
Author

My loss has finally started decreasing from 0.69, although it's progressing very slowly. Thank you for all your help.

I noticed you attempted to implement learning from the SIDGAN paper in your repository. Were you able to achieve any results with that?

@lililuya
Copy link
Owner

Hi, happy to hear the loss decreasing, As for SIDGAN, the implemention of the core-part APS is in repo, but the oritention of the filter is only limited at vertical direction, just like the paper mentioned:
5f57f3871c58ecb7e23c864cc4eb740
So, you need implement a horizontal version of APS filter or you can get in touch with the author to the detail. I finally failed to add this part in wav2lip. Sorry for unable to provide you with more effective help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants