Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: nms is not compiled with GPU support #2686

Closed
Yukinoyuki1 opened this issue May 11, 2020 · 9 comments
Closed

RuntimeError: nms is not compiled with GPU support #2686

Yukinoyuki1 opened this issue May 11, 2020 · 9 comments
Labels
installation/env The problem about codebase installation or running environment.

Comments

@Yukinoyuki1
Copy link

I met the RuntimeError: nms is not compiled with GPU support.
I used the command ''python ./tools/train.py ./configs/dcn/cascade_mask_rcnn_r101_fpn_dconv_c3-c5_1x_coco.py --work-dir ./work_dirs --no-validate'', and the traceback is :
Traceback (most recent call last):
File "./tools/train.py", line 159, in
main()
File "./tools/train.py", line 155, in main
meta=meta)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/apis/train.py", line 165, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/lustre/home/acct-eejxh/eejxh/.local/lib/python3.7/site-packages/mmcv/runner/runner.py", line 383, in run
epoch_runner(data_loaders[i], **kwargs)
File "/lustre/home/acct-eejxh/eejxh/.local/lib/python3.7/site-packages/mmcv/runner/runner.py", line 282, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/apis/train.py", line 74, in batch_processor
losses = model(**data)
File "/lustre/home/acct-eejxh/eejxh/.conda/envs/mmdetection/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/lustre/home/acct-eejxh/eejxh/.conda/envs/mmdetection/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/lustre/home/acct-eejxh/eejxh/.conda/envs/mmdetection/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/models/detectors/base.py", line 148, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/models/detectors/two_stage.py", line 151, in forward_train
*rpn_outs, img_metas, cfg=proposal_cfg)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/core/fp16/decorators.py", line 127, in new_func
return old_func(*args, **kwargs)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 490, in get_bboxes
scale_factor, cfg, rescale)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 119, in get_bboxes_single
dets, keep = batched_nms(proposals, scores, ids, nms_cfg)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/ops/nms/nms_wrapper.py", line 146, in batched_nms
torch.cat([bboxes_for_nms, scores[:, None]], -1), **nms_cfg
)
File "/lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/ops/nms/nms_wrapper.py", line 53, in nms
inds = nms_ext.nms(dets_th, iou_thr)
RuntimeError: nms is not compiled with GPU support (nms at mmdet/ops/nms/src/nms_ext.cpp:20)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x2b73b90fc627 in /lustre/home/acct-eejxh/eejxh/.conda/envs/mmdetection/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: nms(at::Tensor const&, float) + 0xf7 (0x2b73f6a90f77 in /lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/ops/nms/nms_ext.cpython-37m-x86_64-linux-gnu.so)
frame #2: + 0x179ed (0x2b73f6a9c9ed in /lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/ops/nms/nms_ext.cpython-37m-x86_64-linux-gnu.so)
frame #3: + 0x15858 (0x2b73f6a9a858 in /lustre/home/acct-eejxh/eejxh/yhy/mmdetection/mmdet/ops/nms/nms_ext.cpython-37m-x86_64-linux-gnu.so)

I use my onw dataset in COCO format and I modified the num of classes, learning rate in the corresponding configs.

Environment

sys.platform: linux
Python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /lustre/opt/cascadelake/linux-centos7-skylake_avx512/gcc-8.3.0/cuda-10.0.130-zzjtq46rbpziy7avb45ng3wgaexbc45j
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0: Tesla V100-SXM3-32GB
GCC: gcc (Spack GCC) 8.3.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.5.1
MMDetection: 2.0.0+93de55a
MMDetection Compiler: GCC 8.3
MMDetection CUDA Compiler: not available

@ZwwWayne
Copy link
Collaborator

According to the last line: MMDetection CUDA Compiler: not available, it seems that your CUDA environment is not installed correctly.

@ZwwWayne ZwwWayne added the installation/env The problem about codebase installation or running environment. label May 11, 2020
@Yukinoyuki1
Copy link
Author

Thanks. Because I firstly combiled the mmdetection and then rent a GPU from the calculation centre, the CUDA compiler didn't work properly.

@shuangz97
Copy link

I have the same problem. How did you solve it?

@Yukinoyuki1
Copy link
Author

Yukinoyuki1 commented May 15, 2020 via email

@PihtaHorse
Copy link

PihtaHorse commented May 26, 2020

The first time I installed Pytorch with the wrong version of CUDA, and I got the same error.

But after deleting the "/ build" folder and repeating installation, everything now works.

@changruowang
Copy link

I run the nms_test program, i get follow output,which i think the cuda_nms is aviliable:

Run NMS on device_id = 0Run NMS on device_id = 1
Run NMS on device_id = 2
Run NMS on device_id = 3
But when i run the distribute test program, i get the erro information:
RuntimeError: nms is not compiled with GPU support
How can i solve the problem?

@wingskh
Copy link

wingskh commented Sep 4, 2020

The first time I installed Pytorch with the wrong version of CUDA, and I got the same error.

But after deleting the "/ build" folder and repeating installation, everything now works.

Sorry, what is the location of the "/build" folder?

@Kwongrf
Copy link

Kwongrf commented Sep 14, 2020

The first time I installed Pytorch with the wrong version of CUDA, and I got the same error.
But after deleting the "/ build" folder and repeating installation, everything now works.

Sorry, what is the location of the "/build" folder?

I cannot find either. But I solved this by install the correct mmcv-full. I used 'pip install mmcv-full' before and then I installed the corresponding version of pytorch1.6 and cuda10.1(which is my environment), it works.

@wangzhenyuan1
Copy link

我遇到了同样的问题,估计是torch版本和mmcv版本不兼容导致的(完全按照引导安装的)。

解决方案:在pytorch官网上按照自己的cuda版本,重新安装遍pytorch,卸载之前的mmcv,按照以下方式重新安装:

git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e . # package mmcv-full will be installed after this step
cd ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation/env The problem about codebase installation or running environment.
Projects
None yet
Development

No branches or pull requests

8 participants