-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training loss nan #9
Comments
Are you using the ShapeNet dataset as well? Can you share the training log here? |
Yes! I use shapeNet v2 core 15k(downloading from PVD) |
2023-01-25 00:17:56.473 | INFO | main:get_args:205 - EXP_ROOT: ./exp + exp name: 0125/car/3dbf3ah_hvae_lion_B40, save dir: ./exp/0125/car/3dbf3ah_hvae_lion_B40
|
And below is my config bash_name: ''
|
Hi, I try with VAE training using batch-size 40 on 4 gpus: I also get similar NaN issue. However, the same training code works with batch-size 32. It's not clear to me what's the reason, it seems the training does not work with batch-size > 40 somehow. |
Thanks for your hard working! I cannot believe you run it yourself! It is so nice of you! Have a good night! |
Hi, I train the vae model as the readme part tells. But the training loss become nan. I use 4 gpu and 40 batchsize. And I keep the left the same in the repo.
The text was updated successfully, but these errors were encountered: