-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't train with fp16 on Nvidia P100 #15
Comments
the problem does not occur on torch version 1.6.0 as in the requirements.txt |
Use |
I have the same issue, any idea of where the complex number is generated?, |
I got the same issue. It’s due to a bug in the pytorch STFT function for half tensor. The work around is moving the calculation of y_hat_mel in train.py outside autocast, and casting y_hat to float one line above y_hat_mel calculation. |
@boltzmann-Li Can you create a PR so we can see that fix. I haven't managed to get it working following your instructions. FYI the problem hasn't been fixed in torch 1.10.0 Is there an issue for the Complex Half problem? |
I created a pull request. It has been working for me with 3090 GPUs and torch 1.9 |
Very helpful @boltzmann-Li here are the lines https://github.com/boltzmann-Li/vits/blob/5a1f4b7afb8a822f66c0ddc75bc959a44a57d035/train_ms.py#L156-L166 |
I think a better way to solve this problem is to wrap the torch.stft with
|
training with fp16 doesn't work for me on a P100, I'll look into fixing it, but for future reference here is the full stacktrace
torch version 1.9.0
The text was updated successfully, but these errors were encountered: