Training with lower precision crashes with runtime error #96

Laope94 · 2023-06-06T16:55:41Z

One more issue.
I can't use any other precission than 32. BF16 doesn't seem to possible on card I am running training on (TITAN RTX), but 16 should be. I am getting this error:
RuntimeError: "fill_cuda" not implemented for 'ComplexHalf'.

To further explain how I am training - I built docker image as described in readme (based on nvcr.io/nvidia/pytorch:22.03-py3).
I've mounted dataset, installed git inside running container and cloned this github repo. Then I've run build_monotonic_align.sh and attempted to start training, this time on single gpu.

First this issue https://stackoverflow.com/questions/75834134/attributeerror-trainer-object-has-no-attribute-lr-find has stopped me so I've downgraded lightning to 1.9. There is no version in dockerfile so just the newest is installed.

But then I got error above. I've also tried modifiy piper_train/vits/config.py and set fp16_run: bool = False to True, but no success. Only thing I could find for VITS is this issue jaywalnut310/vits#15 with several possible code modifications suggested. I'd appreciate if someone can take a look and point me to what should I modify. CUDA is 12.1.

The text was updated successfully, but these errors were encountered:

Laope94 · 2023-06-06T17:10:49Z

Okay, I'll answer this one myself actually. It seems that downgrading versions of few packages bundled with nvcr.io/nvidia/pytorch:22.03-py3 helps.

This works for me:
torch~=1.6
pytorch-lightning~=1.7
torchtext~=0.6 (not sure about this one, but pip has been complaining).

synesthesiam · 2023-06-06T20:01:54Z

I would really like to understand how this can be fixed. I've tried the code modifications you mentioned before and none of them worked. I'm worried that the "fix" may require a complete rewrite of the VITS code :(

beqabeqa473 · 2023-06-12T09:18:15Z

@Laope94 did you able to make a progress with this?

Laope94 · 2023-06-12T15:12:36Z

@Laope94 did you able to make a progress with this?

Yes and no. It seems to be torch issue though. As I stated above, using torch 1.6 works, but I haven't achieved desired result. I am not able to fit batch bigger than 16 to memory and I hoped that lowering precision can help me a bit, but it's not really effective so I've continued with fp32 and batch size of 16.

beqabeqa473 · 2023-06-12T16:12:16Z

i am giving batch size of 24 to my 12 gb gpu

…

On 6/12/23, Laope94 ***@***.***> wrote: > @Laope94 did you able to make a progress with this? Yes and no. It seems to be torch issue though. As I stated above, using torch 1.6 works, but I haven't achieved desired result. I am not able to fit batch bigger than 16 to memory and I hoped that lowering precision can help me a bit, but it's not really effective so I've continued with fp32 and batch size of 16. -- Reply to this email directly or view it on GitHub: #96 (comment) You are receiving this because you commented. Message ID: ***@***.***>

-- with best regards Beqa Gozalishvili Tell: +995593454005 Email: ***@***.*** Web: https://gozaltech.org Skype: beqabeqa473 Telegram: https://t.me/gozaltech facebook: https://facebook.com/gozaltech twitter: https://twitter.com/beqabeqa473 Instagram: https://instagram.com/beqa.gozalishvili

Laope94 · 2023-06-14T15:00:02Z

i am giving batch size of 24 to my 12 gb gpu

I have GPU with 24GB available, but I am not able to fit any higher batch size than 16 (medium model, 22khz audio), crashes with CUDA OOM everytime. Even after numerous attempts with fp16, mixed precision, different library versions, max_split_size_mb, cuda malloc async backend... Just no, so I am working with smaller batch size.

synesthesiam · 2023-06-14T22:11:17Z

@Laope94 This is likely because you have a few very long sentences in your training data. Because batches have to be padded out to the longest sentence length, these will cause OOM crashes.

I have a --max-phoneme-ids <N> option you can pass to the training script, which will drop sentences larger than N (and print how many were dropped). I usually set this to 400 so I can ensure a batch size of 32 on my RTX 3090's (24GB).

Laope94 · 2023-06-16T08:36:04Z

@synesthesiam thanks, it seems that this really did the trick. I've tried few different values and I can go up to 700 comfortably. I'd maybe able to use even bigger batch size, but 32 is sufficient for me.

synesthesiam closed this as completed Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with lower precision crashes with runtime error #96

Training with lower precision crashes with runtime error #96

Laope94 commented Jun 6, 2023

Laope94 commented Jun 6, 2023

synesthesiam commented Jun 6, 2023

beqabeqa473 commented Jun 12, 2023

Laope94 commented Jun 12, 2023

beqabeqa473 commented Jun 12, 2023 via email

Laope94 commented Jun 14, 2023

synesthesiam commented Jun 14, 2023

Laope94 commented Jun 16, 2023

Training with lower precision crashes with runtime error #96

Training with lower precision crashes with runtime error #96

Comments

Laope94 commented Jun 6, 2023

Laope94 commented Jun 6, 2023

synesthesiam commented Jun 6, 2023

beqabeqa473 commented Jun 12, 2023

Laope94 commented Jun 12, 2023

beqabeqa473 commented Jun 12, 2023 via email

Laope94 commented Jun 14, 2023

synesthesiam commented Jun 14, 2023

Laope94 commented Jun 16, 2023