-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training stuck #71
Comments
Hi, temporarily turn off duration discriminator and tell me if it works. |
Yes it seems to be working for now. Any reason with the duration discriminator is causing the issue ? |
I feel my implementation was too naive. Might need to correct it with some testing. Busy on other models now, will do it when I have some time on me. Let me know about the audio quality after you train. Thanks. |
Hello,thank you for your great effort ! |
I have moved to improving |
Hi, p0p4k, |
I think better than vits/vits2. Only downside it not being e2e. |
ok, thank you. |
@p0p4k do you know the bug here? |
@codeghees which part ? training stuck part? |
yep |
In the same boat. |
@codeghees did not look into this personally cause of no gpu yet. Maybe you can try to debug and send a PR. I can assist you. Thanks a lot! |
Yep, will do! Trying to debug this. |
@p0p4k bug is on line Seems like GradScalar has issues with multi-gpu. I removed it and replaced it with standard backprop. The issue persists. Looks like a multi GPU issue. |
Works on single gpu? |
Haven't tested yet. Trying a run with fp16 enabled. |
@p0p4k I have no issues for single GPU training. But it will stuck if I do multiple GPU training. Any success for resolving the issue? |
Hello ,
Thanks for all the effort to create this repo. When I launch training it runs for a few steps and then I see no progress at all. Its just stuck without any progress for a long time. It still hasnt progressed.
INFO:ljs_base:Saving model and optimizer state at iteration 1 to ./logs/ljs_base/G_0.pth
INFO:ljs_base:Saving model and optimizer state at iteration 1 to ./logs/ljs_base/D_0.pth
INFO:ljs_base:Saving model and optimizer state at iteration 1 to ./logs/ljs_base/DUR_0.pth
Loading train data: 4%|████████████▍
have you encountered this before ? Any help would be extremely helpful
The text was updated successfully, but these errors were encountered: