You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch)
#1021
Closed
chiba1sonny opened this issue
Nov 8, 2021
· 6 comments
Thank you for your quick reply. I am using the U-net model. And after setting crop size smaller, it actually worked.
But could you tell me how crop size works?
Thanks, advance.
Seems like your data input is very big?
From our normal (config](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/unet), it is usually < 1G GPU memory in training.
The crop size is the size of actual input of model, which is saved together with model parameters in GPU memory so we could make it smaller to save GPU memory.
Thank you for your guidance. My image size is 1920_1080, and I made crop size 800_800. It’s working.
Thanks!!! I will try Bisenet.
Not too big, if it is not medical image, try to use our model which uses pretrained model in backbone. It would get better results than UNet which is trained from scratch.
Batch size=1, and used fp16. Still got this error: cuda out of memory.
RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch)
Exception raised from malloc at /opt/conda/conda-bld/pytorch_1595629403081/work/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f09dcb9c77d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x20626 (0x7f09dcdf4626 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x214f4 (0x7f09dcdf54f4 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDACachingAllocator::raw_alloc(unsigned long) + 0x5e (0x7f09dcdee12e in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: + 0xcb2a06 (0x7f09ddcb4a06 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xcb74ec (0x7f09ddcb94ec in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xcafeba (0x7f09ddcb1eba in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xcb06ce (0x7f09ddcb26ce in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0xcb0d90 (0x7f09ddcb2d90 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #9: at::native::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x49 (0x7f09ddcb2fe9 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0xd119bb (0x7f09ddd139bb in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: + 0xd415f8 (0x7f09ddd435f8 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: at::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x1ad (0x7f0a0ff6870d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x18a (0x7f09ddcacc0a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #14: + 0xd118c5 (0x7f09ddd138c5 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #15: + 0xd41654 (0x7f09ddd43654 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #16: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: + 0x2c250c2 (0x7f0a11c3b0c2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: + 0x2c39684 (0x7f0a11c4f684 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x258 (0x7f0a11ac2098 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #21: + 0x30d1017 (0x7f0a120e7017 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f0a120e2860 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f0a120e3401 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f0a120db579 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f0a1640a99a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: + 0xc9039 (0x7f0a18f42039 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #27: + 0x9609 (0x7f0a3b0ce609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #28: clone + 0x43 (0x7f0a3aff5293 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered: