RuntimeError: CUDA error: no kernel image is available for execution on the device #247

BaophanN · 2025-01-13T09:18:28Z

I follow exactly to the instruction to train the model. This is the environment of my computer:

mmcv-full                 1.5.2
mmdet                     2.26.0
mmdet3d                   1.0.0rc6       /workspace/source/WidthFormer/StreamPETR/mmdetection3d
mmsegmentation            0.29.1
pytorch 1.9.0 
cuda 11.1

on an RTX 2070. However, i got this error:

Traceback (most recent call last):
  File "tools/train.py", line 263, in <module>
    main()
  File "tools/train.py", line 251, in main
    custom_train_model(
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/core/apis/train.py", line 30, in custom_train_model
    custom_train_detector(
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/core/apis/mmdet_train.py", line 203, in custom_train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 138, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 62, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 59, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
    losses = self(**data)
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 226, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/models/detectors/petr3d.py", line 216, in forward
    return self.forward_train(**data)
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/models/detectors/petr3d.py", line 268, in forward_train
    losses = self.obtain_history_memory(gt_bboxes_3d,
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/models/detectors/petr3d.py", line 129, in obtain_history_memory
    loss = self.forward_pts_train(gt_bboxes_3d[i],
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/models/detectors/petr3d.py", line 192, in forward_pts_train
    losses = self.pts_bbox_head.loss(*loss_inputs)
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 226, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/models/dense_heads/streampetr_head.py", line 963, in loss
    losses_cls, losses_bbox = multi_apply(
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmdet/core/utils/misc.py", line 30, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/workspace/source/WidthFormer/StreamPETR/projects/mmdet3d_plugin/models/dense_heads/streampetr_head.py", line 832, in loss_single
    loss_cls = self.loss_cls(
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmdet/models/losses/focal_loss.py", line 233, in forward
    loss_cls = self.loss_weight * calculate_loss_func(
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmdet/models/losses/focal_loss.py", line 139, in sigmoid_focal_loss
    loss = _sigmoid_focal_loss(pred.contiguous(), target.contiguous(), gamma,
  File "/opt/conda/envs/lanesegnet/lib/python3.8/site-packages/mmcv/ops/focal_loss.py", line 56, in forward
    ext_module.sigmoid_focal_loss_forward(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

I do not know why is there this error. Can the author help to fix this? Thank you

The text was updated successfully, but these errors were encountered:

BaophanN · 2025-01-13T09:37:56Z

Here is the full log file:
20250113_092039.log

exiawsh · 2025-01-14T08:51:11Z

Please check your mmcv version. And make sure your mmcv is complied successfully.

BaophanN · 2025-01-14T08:59:02Z

I do not understand why but when I take the same code to another machine using the same docker image. It can run fine. I also meet the exact same problem with BEVDet. The difference is that this time training happens normally. But this CUDA error only happens while running test. Can you help me fix this?

exiawsh · 2025-01-14T09:02:07Z

Are you using the same nvidia gpu drive version and cuda version?

BaophanN · 2025-01-14T09:06:08Z

Here is the info from my computer:

(base) baogp4@VAI-baogp4-L:~/datasets/nuscenes$ nvidia-smi
Tue Jan 14 16:05:30 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   47C    P5              12W /  80W |     82MiB /  8192MiB |     25%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2298      G   /usr/lib/xorg/Xorg                           81MiB |
+---------------------------------------------------------------------------------------+
(base) baogp4@VAI-baogp4-L:~/datasets/nuscenes$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: no kernel image is available for execution on the device #247

RuntimeError: CUDA error: no kernel image is available for execution on the device #247

BaophanN commented Jan 13, 2025 •

edited

Loading

BaophanN commented Jan 13, 2025 •

edited

Loading

exiawsh commented Jan 14, 2025

BaophanN commented Jan 14, 2025 •

edited

Loading

exiawsh commented Jan 14, 2025

BaophanN commented Jan 14, 2025

RuntimeError: CUDA error: no kernel image is available for execution on the device #247

RuntimeError: CUDA error: no kernel image is available for execution on the device #247

Comments

BaophanN commented Jan 13, 2025 • edited Loading

BaophanN commented Jan 13, 2025 • edited Loading

exiawsh commented Jan 14, 2025

BaophanN commented Jan 14, 2025 • edited Loading

exiawsh commented Jan 14, 2025

BaophanN commented Jan 14, 2025

BaophanN commented Jan 13, 2025 •

edited

Loading

BaophanN commented Jan 13, 2025 •

edited

Loading

BaophanN commented Jan 14, 2025 •

edited

Loading