Training stuck #71

Madhavan0123 · 2023-11-14T23:24:14Z

Hello ,

Thanks for all the effort to create this repo. When I launch training it runs for a few steps and then I see no progress at all. Its just stuck without any progress for a long time. It still hasnt progressed.
INFO:ljs_base:Saving model and optimizer state at iteration 1 to ./logs/ljs_base/G_0.pth
INFO:ljs_base:Saving model and optimizer state at iteration 1 to ./logs/ljs_base/D_0.pth
INFO:ljs_base:Saving model and optimizer state at iteration 1 to ./logs/ljs_base/DUR_0.pth
Loading train data: 4%|████████████▍

have you encountered this before ? Any help would be extremely helpful

p0p4k · 2023-11-15T00:49:09Z

Hi, temporarily turn off duration discriminator and tell me if it works.

Madhavan0123 · 2023-11-15T04:08:22Z

Yes it seems to be working for now. Any reason with the duration discriminator is causing the issue ?

p0p4k · 2023-11-15T04:37:34Z

I feel my implementation was too naive. Might need to correct it with some testing. Busy on other models now, will do it when I have some time on me. Let me know about the audio quality after you train. Thanks.

CreepJoye · 2023-11-27T08:36:27Z

Hello,thank you for your great effort !
I meet the same problem and I want to know will you correct it recently or still busy on other models?

p0p4k · 2023-12-12T07:54:02Z

I have moved to improving pflowtts.

JohnHerry · 2024-01-19T01:19:22Z

I have moved to improving pflowtts.

Hi, p0p4k,
How is the pflowtts growing now? is it a better choice then vits2？ can it support both normal tts and zero-shot tts?

p0p4k · 2024-01-19T08:17:21Z

I think better than vits/vits2. Only downside it not being e2e.

JohnHerry · 2024-01-19T08:20:25Z

I think better than vits/vits2. Only downside it not being e2e.

ok, thank you.

codeghees · 2024-03-14T01:27:19Z

@p0p4k do you know the bug here?

p0p4k · 2024-03-14T01:35:03Z

@codeghees which part ? training stuck part?

codeghees · 2024-03-14T01:35:40Z

yep

codeghees · 2024-03-14T01:35:47Z

In the same boat.

p0p4k · 2024-03-14T05:45:02Z

@codeghees did not look into this personally cause of no gpu yet. Maybe you can try to debug and send a PR. I can assist you. Thanks a lot!

codeghees · 2024-03-14T15:56:26Z

Yep, will do! Trying to debug this.

codeghees · 2024-03-21T01:17:03Z

@p0p4k bug is on line
scaler.scale(loss_gen_all).backward()

Seems like GradScalar has issues with multi-gpu. I removed it and replaced it with standard backprop. The issue persists. Looks like a multi GPU issue.

p0p4k · 2024-03-21T01:42:19Z

Works on single gpu?

codeghees · 2024-03-21T01:43:10Z

Haven't tested yet. Trying a run with fp16 enabled.

farzanehnakhaee70 · 2024-05-01T11:18:42Z

@p0p4k I have no issues for single GPU training. But it will stuck if I do multiple GPU training. Any success for resolving the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training stuck #71

Training stuck #71

Madhavan0123 commented Nov 14, 2023

p0p4k commented Nov 15, 2023

Madhavan0123 commented Nov 15, 2023

p0p4k commented Nov 15, 2023

CreepJoye commented Nov 27, 2023

p0p4k commented Dec 12, 2023

JohnHerry commented Jan 19, 2024

p0p4k commented Jan 19, 2024

JohnHerry commented Jan 19, 2024

codeghees commented Mar 14, 2024

p0p4k commented Mar 14, 2024

codeghees commented Mar 14, 2024

codeghees commented Mar 14, 2024

p0p4k commented Mar 14, 2024

codeghees commented Mar 14, 2024

codeghees commented Mar 21, 2024

p0p4k commented Mar 21, 2024

codeghees commented Mar 21, 2024

farzanehnakhaee70 commented May 1, 2024

Training stuck #71

Training stuck #71

Comments

Madhavan0123 commented Nov 14, 2023

p0p4k commented Nov 15, 2023

Madhavan0123 commented Nov 15, 2023

p0p4k commented Nov 15, 2023

CreepJoye commented Nov 27, 2023

p0p4k commented Dec 12, 2023

JohnHerry commented Jan 19, 2024

p0p4k commented Jan 19, 2024

JohnHerry commented Jan 19, 2024

codeghees commented Mar 14, 2024

p0p4k commented Mar 14, 2024

codeghees commented Mar 14, 2024

codeghees commented Mar 14, 2024

p0p4k commented Mar 14, 2024

codeghees commented Mar 14, 2024

codeghees commented Mar 21, 2024

p0p4k commented Mar 21, 2024

codeghees commented Mar 21, 2024

farzanehnakhaee70 commented May 1, 2024