-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory #4
Comments
Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code. For create the github code space you have to:
And that' s all. |
Possible, but the message says that the code was trying to allocate 180 GB, which is a bit insane. So it looks like some sort of bug. @lambdald If you want us to take a look please provide the full Dockerfile you are using. If you are not using Docker, it's gonna be hard to recreate your issue since it's likely setup-specific. |
True, I didn't see the end of the error. I have the same error when I run small_city, it can be a bug of the program searching information I find this https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch. I' m unable to make it work |
@Snosixtyboo Sorry, I use conda to manage my environment instead of Docker, and I setup the python environment following the readme. |
Try this:
Reference: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch |
I tried, and I still meet the bug |
Run this code and tell us what it says import torch
print(torch.cuda.memory_summary(device=None, abbreviated=False)) |
Same problem. OOM was encountered on a GPU with 80 GB memory but not encountered on a GPU with 8 GB memory, using the same dataset. |
@kevintsq Interesting tell us more about the two computers |
The former is an HPC using Slurm on Linux, and the latter is a Windows laptop. I should have compiled the submodules according to the correct CUDA computing capabilities. I’ve tried CUDA 12.1, 12.3, 12.4, 12.5 + PyTorch 2.3, 2.4 on the HPC, but the problem persists (12.1 is illegal memory access). The laptop runs well on CUDA 12.4 + PyTorch 2.4. |
I met with the same issue on ubuntu 22.04, and fixed it by switching from cuda 12.5 to 12.2. |
Hello, when I was running the small_city data, I encountered the following error. How can I solve it?
The text was updated successfully, but these errors were encountered: