-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory #6
Comments
Hi @anhnb206110 , the increase in memory consumption is due to the kicking in of physical properties. You can find its setting in the |
Hi, I tried a clean installation and tested on Could you specify during which epochs you observe memory growth? Does the OOM happen during training or validation (validation happens every 2000 steps - it is interleaved with training) On the other hand, my test environment is Ubuntu 20.04/CentOS 7.9.2009 with Python 3.10, PyTorch 1.13 and CUDA 11.6. It might be helpful to align the software versions especially Python and PyTorch versions. For this project my pytorch-lightning version is 1.9.5 so it might also be good to know which pytorch-lightning version you are using. Lastly, I do observe some GPU memory leak issue with pytorch-lightning in other projects. It mainly happens during inference ( |
I am not sure if you are encountering the same issue as @anhnb206110 - it seems that your GPU has limited memory to run at default SPP. TITAN RTX (24 GB) and above would be necessary to run the default config. If you want to use full SPP while reducing the VRAM usage, you can also try reducing |
Thank you for your work on this project!
I followed your instructions to train the model using the 'male-3-casual' dataset from the PeopleSnapshot dataset, without modifying any configurations in the config file. However, I encountered a CUDA out of memory error during training.
Here’s the error message I received:
It appears that the memory usage increases with each epoch until it runs out of memory. Could you please help me understand why this is happening and suggest any possible solutions to resolve the issue?
The text was updated successfully, but these errors were encountered: