Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN values in gradients #29

Open
liuem607 opened this issue Apr 10, 2021 · 1 comment
Open

NaN values in gradients #29

liuem607 opened this issue Apr 10, 2021 · 1 comment

Comments

@liuem607
Copy link

Hi, in my experiment, I used Moving-MNIST dataset. But here are my problems during training that I couldn't find an answer:

I tried to play with a small network by using only num_latent_scale=1 and num_groups_per_scale=1. Then I realized there were no gradients generated for parameters including prior.ftr0 and an error was given to stop the training.

If I increase num_groups_per_scale from 1 to 2 or more, I still got Nan in some of the gradients in the first iteration, then they went away, but the training continues without errors.

I'm wondering if you could provide some hint or clue to why such behavior happens? Thank you in advance!

@arash-vahdat
Copy link
Contributor

arash-vahdat commented Apr 11, 2021

Hi, getting no gradient for num_latent_scale=1 and num_groups_per_scale=1 is weird. By no gradients, do you mean that the gradients were zero or None? If they were zero, do you see any changes after some time of training?

Getting NaN in gradient is natural especially at the beginning of the training. We are using mixed precision which means that most operations are cast to FP16. Because of the lower precision, we may get NaN easily and it's autocast and grad_scalar's job to drop these gradients and scale the loss such that we don't get NaN.

You can disable mixed-precision by supplying enabled=False to autocast() at this line:

NVAE/train.py

Line 163 in 38eb997

with autocast():

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants