-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: no kernel image is available for execution on the device #693
Comments
python -m detectron2.utils.collect_env sys.platform linux PyTorch built with:
|
python detectron2/utils/collect_env.py sys.platform linux PyTorch built with:
|
Issues is from torchvision installation and unrelated to detectron2. You're probably using a version of torchvision that's built with a different version of cuda or with different compute compatibilities. |
@ppwwyyxx
|
It has a problem when you call torchvision's nms function on cuda tensor. |
which torchvision's version should |
As install.md says the version that comes together with pytorch release should work. If not, that is either because you are not using this version, or because a bug in torchvision/pytorch. |
according to pytorch guide |
First, you can have multiple versions and the command does not guarantee you'll run the torchvision & pytorch you just installed (http://ppwwyyxx.com/blog/2019/On-Environment-Packaging-in-Python/). |
according to pytorch |
Your original command I'll say this again: the command does not guarantee that you'll use the version you installed with this command, especially if you're on a python environment with pytorch previously installed by other means (e.g. pip). I did not say it's a pytorch issue, so your comment at pytorch/pytorch#32151 (comment) is not accurate. |
I use the Virtual environment created by conda ,it is created specially for detection2 |
Conda's virtual environment does not guarantee much either. The correct way to know which version you're using is mentioned in the link I posted above:
Once you found the location, it should have a "_C.so" file there and |
I found the most important is set Pillow==6.2.2 pyyaml==5.1 as you write inhttps://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=9_FzH13EjseR
I suggest you write in INSTALL.md |
They are absolutely not related to your issue. Also, they are already declared as dependencies. pip will either install them automatically or warn you that it cannot. So there is no need to mention them in INSTALL.md. |
first I used conda install pytorch torchvision cudatoolkit=10.0 -c pytorch |
then it is the |
"conda install pytorch torchvision cudatoolkit=10.0 -c pytorch" had installed torch torchvision |
Then it is unrelated to your original issue again. If you're using pillow 7.0, torchvision will give a different error from your issur. |
so if I use Pillow==6.2.2 pyyaml==5.1 the program is run ok |
Pillow==6.2.2 does address a different error from torchvision, which does not support Pillow 7.0. But it is unrelated to your original issue, which is probably fixed before running |
If you do not know the root cause of the problem / bug, and wish someone to help you, please
post according to this template:
Instructions To Reproduce the Issue:
run demo
python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input input.jpg [--other-options] --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
cuda 10.0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:04:00.0 Off | N/A |
| 31% 34C P0 55W / 250W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:06:00.0 Off | N/A |
| 31% 34C P0 46W / 250W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:07:00.0 Off | N/A |
| 31% 34C P0 51W / 250W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:08:00.0 Off | N/A |
| 31% 33C P0 60W / 250W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 208... Off | 00000000:0C:00.0 Off | N/A |
| 31% 35C P0 60W / 250W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 208... Off | 00000000:0D:00.0 Off | N/A |
| 30% 30C P0 50W / 250W | 0MiB / 10989MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce RTX 208... Off | 00000000:0E:00.0 Off | N/A |
what you observed (including the full logs):
return _C.nms(boxes, scores, iou_threshold)
RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /tmp/pip-req-build-9d9zypi6/torchvision/csrc/cuda/nms_cuda.cu:127)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f3cd35c7e7d in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x8d1 (0x7f3ca5dbaece in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torchvision/_C.so)
frame #2: nms(at::Tensor const&, at::Tensor const&, float) + 0x183 (0x7f3ca5d7eed7 in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torchvision/_C.so)
frame #3: + 0x79cf5 (0x7f3ca5d98cf5 in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torchvision/_C.so)
frame #4: + 0x765b0 (0x7f3ca5d955b0 in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torchvision/_C.so)
frame #5: + 0x70d1e (0x7f3ca5d8fd1e in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torchvision/_C.so)
frame #6: + 0x70fc2 (0x7f3ca5d8ffc2 in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torchvision/_C.so)
frame #7: + 0x5be4a (0x7f3ca5d7ae4a in /home/azuryl/anaconda3/envs/detectron2p37/lib/python3.7/site-packages/torchvision/_C.so)
frame #59: __libc_start_main + 0xf0 (0x7f3d0c2ca830 in /lib/x86_64-linux-gnu/libc.so.6)
Expected behavior:
If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.
If you expect the model to converge / work better, note that we do not give suggestions
on how to train a new model.
Only in one of the two conditions we will help with it:
(1) You're unable to reproduce the results in detectron2 model zoo.
(2) It indicates a detectron2 bug.
Environment:
Please paste the output of
python -m detectron2.utils.collect_env
.If detectron2 hasn't been successfully installed, use
python detectron2/utils/collect_env.py
.If your issue looks like an installation issue / environment issue,
please first try to solve it yourself with the instructions in
https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues
The text was updated successfully, but these errors were encountered: