error in ms_deformable_col2im_cuda: an illegal memory access was encountered #7186

makifozkanoglu · 2022-02-17T10:31:52Z

Describe the bug
I'm getting the following error when trying to run deformable_detr

Reproduction

What command or script did you run?
I tried to train the config file below

https://github.com/open-mmlab/mmdetection/blob/7a9bc498d5cc972171ec4f7332afcd70bb50e60e/configs/deformable_detr/deformable_detr_r50_16x2_50e_coco.py

Did you make any modifications on the code or config? Did you understand what you have modified?
No I did not modify
What dataset did you use?

Environment
sys.platform: linux
Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
CUDA available: True
GPU 0: TITAN RTX
CUDA_HOME: /usr/local/cuda-11.0
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.0
OpenCV: 4.5.5
MMCV: 1.4.4
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.20.0+

torch is installed by pip

Error traceback

error in ms_deformable_col2im_cuda: an illegal memory access was encountered
Traceback (most recent call last):
  File "tools/train.py", line 200, in <module>
    main()
  File "tools/train.py", line 188, in main
    train_detector(
  File "/cta/users/mehmet/CenterNetMMCV/ssod/apis/train.py", line 206, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/cta/users/mehmet/CenterNetMMCV/thirdparty/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/cta/users/mehmet/CenterNetMMCV/thirdparty/mmcv/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/cta/users/mehmet/CenterNetMMCV/thirdparty/mmcv/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/cta/users/mehmet/CenterNetMMCV/thirdparty/mmcv/mmcv/runner/hooks/optimizer.py", line 56, in after_train_iter
    runner.outputs['loss'].backward()
  File "/cta/users/mehmet/.conda/envs/centernetmmcv/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/cta/users/mehmet/.conda/envs/centernetmmcv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: an illegal memory access was encountered.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

The text was updated successfully, but these errors were encountered:

PeterVennerstrom · 2022-02-25T17:34:16Z

Experienced the same issue and tested a few environments and GPU models.

Fixed by using an earlier version of mmcv-full. 1.4.2 is the latest version of mmcv-full that worked for me.

CUDA_LAUNCH_BLOCKING=1 python ./tools/train.py configs/config.....

imkzh · 2022-03-27T13:36:05Z

Exactly same error:

error in ms_deformable_col2im_cuda: an illegal memory access was encountered
Traceback (most recent call last):
  File "./mmdetection/tools/train.py", line 209, in <module>
    main()
  File "./mmdetection/tools/train.py", line 198, in main
    train_detector(
  File "/home/user/.local/lib/python3.8/site-packages/mmdet/apis/train.py", line 208, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/user/.local/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/home/user/.local/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/user/.local/lib/python3.8/site-packages/mmcv/runner/hooks/optimizer.py", line 56, in after_train_iter
    runner.outputs['loss'].backward()
  File "/home/user/.local/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: an illegal memory access was encountered

I'm on:

Ubuntu 20.04
CUDA 11.2 (RTX3090)
torch 1.9.0+cu111
mmcv-full 1.4.7
mmdet 2.22.0
python 3.8.10
nvcc V11.2.67
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

P.S.: downgrading mmcv-full to 1.4.2 solved the problem as @PeterVennerstrom mentioned above.

Manningchan · 2022-11-24T11:27:58Z

i met the same question, and in my environment, there is 8 gpus, if i use 0, it will not happened and if i use other gpus, it will occured

xuqingyu26 · 2023-03-23T15:36:27Z

Hello, i met the same question as you. Have you solved this question?

imkzh · 2023-03-23T15:40:49Z

@xuqingyu26 a workaround is downgrading mmcv-full to 1.4.2 which solved the problem in my case, as mentioned in my comment.

xbkaishui · 2023-08-21T01:47:54Z

hi, any update on this?

PeterVennerstrom · 2023-08-22T19:35:05Z

It was fixed. Here's a link to the issue with a link to the PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error in ms_deformable_col2im_cuda: an illegal memory access was encountered #7186

error in ms_deformable_col2im_cuda: an illegal memory access was encountered #7186

makifozkanoglu commented Feb 17, 2022

PeterVennerstrom commented Feb 25, 2022 •

edited

Loading

imkzh commented Mar 27, 2022 •

edited

Loading

Manningchan commented Nov 24, 2022

xuqingyu26 commented Mar 23, 2023

imkzh commented Mar 23, 2023

xbkaishui commented Aug 21, 2023

PeterVennerstrom commented Aug 22, 2023

error in ms_deformable_col2im_cuda: an illegal memory access was encountered #7186

error in ms_deformable_col2im_cuda: an illegal memory access was encountered #7186

Comments

makifozkanoglu commented Feb 17, 2022

PeterVennerstrom commented Feb 25, 2022 • edited Loading

imkzh commented Mar 27, 2022 • edited Loading

Manningchan commented Nov 24, 2022

xuqingyu26 commented Mar 23, 2023

imkzh commented Mar 23, 2023

xbkaishui commented Aug 21, 2023

PeterVennerstrom commented Aug 22, 2023

PeterVennerstrom commented Feb 25, 2022 •

edited

Loading

imkzh commented Mar 27, 2022 •

edited

Loading