-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need Help #3
Comments
I 've encountered the same problem in our own dataset. In my view, the loss stick in
You may need carefully process your dataset. But the loss of syncnet is really hard to converge, the paper 《SIDGAN:High-Resolution Dubbed Video Generation via Shift-Invariant Learning》give an explaination of why in high resolution, the syncnet is not stable. Hope this can help you. |
Thank you so much for the response. I will look into it. |
By the way, the avspeech maybe so dirty, when you process the data, you need carefully consider the scenario where contrains more than one people and the audio is not the specific speaker(You need the GT to be True, Or it's nonsense).
|
My loss has finally started decreasing from 0.69, although it's progressing very slowly. Thank you for all your help. I noticed you attempted to implement learning from the SIDGAN paper in your repository. Were you able to achieve any results with that? |
Hi, happy to hear the loss decreasing, As for SIDGAN, the implemention of the core-part APS is in repo, but the oritention of the filter is only limited at vertical direction, just like the paper mentioned: |
It seems like you have tried to train a modified version of wav2lip_288*288. It would be a great help if you could help me with the below problem.
I am training syncnet on avspeech dataset with
train_syncnet_sam.py
from above mention repo. My training loss is stuck at 0.69 even after 500k steps. Lr and bs are 5e-5 and 64 , respectively.I have tried different
lr
but it didn't work. How can I solve this problem based on your experience?For preprocessing, I followed all the steps suggested here except the video split part. My videos average length is 7.1s (videos are in range 0-15s) and total length of training dataset is roughly 30.5hr
Thank you so much!
The text was updated successfully, but these errors were encountered: