Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

Closed
chiba1sonny opened this issue Nov 8, 2021 · 6 comments
Assignees

Comments

@chiba1sonny
Copy link

Batch size=1, and used fp16. Still got this error: cuda out of memory.

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch)
Exception raised from malloc at /opt/conda/conda-bld/pytorch_1595629403081/work/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f09dcb9c77d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x20626 (0x7f09dcdf4626 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x214f4 (0x7f09dcdf54f4 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDACachingAllocator::raw_alloc(unsigned long) + 0x5e (0x7f09dcdee12e in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: + 0xcb2a06 (0x7f09ddcb4a06 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xcb74ec (0x7f09ddcb94ec in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xcafeba (0x7f09ddcb1eba in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xcb06ce (0x7f09ddcb26ce in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0xcb0d90 (0x7f09ddcb2d90 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #9: at::native::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x49 (0x7f09ddcb2fe9 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0xd119bb (0x7f09ddd139bb in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: + 0xd415f8 (0x7f09ddd435f8 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: at::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x1ad (0x7f0a0ff6870d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x18a (0x7f09ddcacc0a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #14: + 0xd118c5 (0x7f09ddd138c5 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #15: + 0xd41654 (0x7f09ddd43654 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #16: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: + 0x2c250c2 (0x7f0a11c3b0c2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: + 0x2c39684 (0x7f0a11c4f684 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x258 (0x7f0a11ac2098 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #21: + 0x30d1017 (0x7f0a120e7017 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f0a120e2860 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f0a120e3401 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f0a120db579 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f0a1640a99a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: + 0xc9039 (0x7f0a18f42039 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #27: + 0x9609 (0x7f0a3b0ce609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #28: clone + 0x43 (0x7f0a3aff5293 in /lib/x86_64-linux-gnu/libc.so.6)

@MengzhangLI
Copy link
Contributor

Hi, 丸山さん:

Which model do you use? Transformer models?

If batch size == 1 and using FP16 already, you could try to make crop size smaller like here.

@chiba1sonny
Copy link
Author

Thank you for your quick reply. I am using the U-net model. And after setting crop size smaller, it actually worked.
But could you tell me how crop size works?
Thanks, advance.

@MengzhangLI
Copy link
Contributor

Seems like your data input is very big?
From our normal (config](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/unet), it is usually < 1G GPU memory in training.
The crop size is the size of actual input of model, which is saved together with model parameters in GPU memory so we could make it smaller to save GPU memory.

@MengzhangLI
Copy link
Contributor

(1) Besides make crop size smaller, you could also try to make cudnn_benchmark = False here. Can you avoid CUDA out of memory?

(2) If your customized dataset is not medical image, you could try some other models. Such as BiSeNetV2(FP16), BiSeNetV1 and PSPNet.

@MengzhangLI MengzhangLI self-assigned this Nov 8, 2021
@chiba1sonny
Copy link
Author

Thank you for your guidance.
My image size is 19201080, and I made crop size 800800. It’s working.

Thanks!!!
I will try Bisenet.

@MengzhangLI
Copy link
Contributor

Thank you for your guidance. My image size is 1920_1080, and I made crop size 800_800. It’s working.

Thanks!!! I will try Bisenet.

Not too big, if it is not medical image, try to use our model which uses pretrained model in backbone. It would get better results than UNet which is trained from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants