Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #4

Open
lambdald opened this issue Jul 24, 2024 · 12 comments
Open

CUDA out of memory #4

lambdald opened this issue Jul 24, 2024 · 12 comments

Comments

@lambdald
Copy link

lambdald commented Jul 24, 2024

Hello, when I was running the small_city data, I encountered the following error. How can I solve it?

File "/data/workspace/hierarchical-3d-gaussians/train_coarse.py", line 94, in training
    render_pkg = render_coarse(viewpoint_cam, gaussians, pipe, background, indices = indices)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/hierarchical-3d-gaussians/gaussian_renderer/__init__.py", line 381, in render_coarse
    rendered_image, radii, _ = rasterizer(
                               ^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/diff_gaussian_rasterization/__init__.py", line 205, in forward
    return rasterize_gaussians(
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/diff_gaussian_rasterization/__init__.py", line 28, in rasterize_gaussians
    return _RasterizeGaussians.apply(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/diff_gaussian_rasterization/__init__.py", line 84, in forward
    num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer, invdepths = _C.rasterize_gaussians(*args)
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 184.32 GiB. GPU
@White-Mask-230
Copy link

Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code.

For create the github code space you have to:

  1. Open this repository
  2. Press the key .
  3. Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
  4. In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces

And that' s all.

@Snosixtyboo
Copy link
Collaborator

Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code.

For create the github code space you have to:

  1. Open this repository
  2. Press the key .
  3. Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
  4. In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces

And that' s all.

Possible, but the message says that the code was trying to allocate 180 GB, which is a bit insane. So it looks like some sort of bug.

@lambdald If you want us to take a look please provide the full Dockerfile you are using. If you are not using Docker, it's gonna be hard to recreate your issue since it's likely setup-specific.

@White-Mask-230
Copy link

White-Mask-230 commented Jul 24, 2024

True, I didn't see the end of the error.

I have the same error when I run small_city, it can be a bug of the program searching information I find this https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch. I' m unable to make it work

@lambdald
Copy link
Author

Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code.
For create the github code space you have to:

  1. Open this repository
  2. Press the key .
  3. Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
  4. In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces

And that' s all.

Possible, but the message says that the code was trying to allocate 180 GB, which is a bit insane. So it looks like some sort of bug.

@lambdald If you want us to take a look please provide the full Dockerfile you are using. If you are not using Docker, it's gonna be hard to recreate your issue since it's likely setup-specific.

@Snosixtyboo Sorry, I use conda to manage my environment instead of Docker, and I setup the python environment following the readme.

@SunHongyang10
Copy link

same bug
L1VzZXJzL3N1bmhvbmd5YW5nL0xpYnJhcnkvQ29udGFpbmVycy81WlNMMkNKVTJULmNvbS5kaW5ndGFsay5tYWMvRGF0YS9MaWJyYXJ5L0NhY2hlcy81WlNMMkNKVTJULmNvbS5kaW5ndGFsay5tYWMvdGh1bWJuYWlscy80MUM2MkJFMC03QkFCLTQzNjctOEQxOS1FOUFDMDVERTFGMkQucG5n png

@White-Mask-230
Copy link

Try this:

  1. Open the python console runing python3
  2. Import torch import torch
  3. Clean the cache torch.cuda.empty_cache()

Reference: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch

@SunHongyang10
Copy link

Try this:

  1. Open the python console runing python3
  2. Import torch import torch
  3. Clean the cache torch.cuda.empty_cache()

Reference: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch

I tried, and I still meet the bug

@White-Mask-230
Copy link

White-Mask-230 commented Jul 27, 2024

Run this code and tell us what it says

import torch

print(torch.cuda.memory_summary(device=None, abbreviated=False))

@kevintsq
Copy link

Same problem. OOM was encountered on a GPU with 80 GB memory but not encountered on a GPU with 8 GB memory, using the same dataset.

@White-Mask-230
Copy link

@kevintsq Interesting tell us more about the two computers

@kevintsq
Copy link

The former is an HPC using Slurm on Linux, and the latter is a Windows laptop. I should have compiled the submodules according to the correct CUDA computing capabilities. I’ve tried CUDA 12.1, 12.3, 12.4, 12.5 + PyTorch 2.3, 2.4 on the HPC, but the problem persists (12.1 is illegal memory access). The laptop runs well on CUDA 12.4 + PyTorch 2.4.

@LRLVEC
Copy link

LRLVEC commented Aug 28, 2024

I met with the same issue on ubuntu 22.04, and fixed it by switching from cuda 12.5 to 12.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants