-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: SigmoidFocalLoss is not compiled with GPU support #21
Comments
How do you set up the mmcv library? If you compile it locally, please check up whether your cuda/nvcc is enabled during compiling. |
Thanks for your reply, I found the problem, when I run python mmdet3d/utils/collect_env.py, it shows |
Hi, I modified the previous bug, but when I continue to run sh . /tools/dist_train.sh . /configs/MSMDFusion_nusc_voxel_LC.py 2, it reports the following error: The environment for installation is as follows: packages in environment at /public/home/xzluo/anaconda3/envs/msmd:Name Version Build Channel_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main Do you know what the problem is, please? |
Error "numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject" indicates that your numpy version is not compatible with another library, to solve this problem, you can refer to this site. However, since numpy is a foundation library of other libraries like torch, scipy, etc, modifying the numpy version will arouse more version conflicts. Therefore, I suggest you find the library incompatible with the current numpy version, or setup a new environment by referring to my environment details. |
We use RTX3090 with 24G memory. You can try some techniques (like fp16, pytorch checkpoint, etc.) for saving the GPU memory. |
Hello, when I download the fusion_voxel0075_R50.pth you provided, and run sh . /tools/dist_train.sh . /configs/MSMDFusion_nusc_voxel_LC.py 2 for the 2-nd stage training, the error is reported as follows, tried some solutions on the Internet still did not get a solution, I hope you can point out, thank you!
2023-09-14 10:43:15,801 - mmdet - INFO - Start running, host: xzluo@b5163d5d11c9, work_dir: /public/home/xzluo/zc/MSMDFusion-main/work_dirs/MSMDFusion_nusc_voxel_LC
2023-09-14 10:43:15,801 - mmdet - INFO - workflow: [('train', 1)], max: 6 epochs
Traceback (most recent call last):
File "./tools/train.py", line 283, in
main()
File "./tools/train.py", line 272, in main
train_detector(
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 247, in train_step
losses = self(**data)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/detectors/base.py", line 58, in forward
return self.forward_train(**kwargs)
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/detectors/MSMDFusion.py", line 534, in forward_train
losses_pts = self.forward_pts_train(pts_feats, img_feats, gt_bboxes_3d,
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/detectors/MSMDFusion.py", line 574, in forward_pts_train
losses = self.pts_bbox_head.loss(*loss_inputs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 164, in new_func
return old_func(*args, **kwargs)
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/dense_heads/transfusion_head.py", line 1260, in loss
layer_loss_cls = self.loss_cls(layer_cls_score, layer_labels, layer_label_weights, avg_factor=max(num_pos, 1))
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/models/losses/focal_loss.py", line 170, in forward
loss_cls = self.loss_weight * calculate_loss_func(
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/models/losses/focal_loss.py", line 85, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred.contiguous(), target, gamma, alpha, None,
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/ops/focal_loss.py", line 54, in forward
ext_module.sigmoid_focal_loss_forward(
RuntimeError: SigmoidFocalLoss is not compiled with GPU support
Traceback (most recent call last):
File "./tools/train.py", line 283, in
main()
File "./tools/train.py", line 272, in main
train_detector(
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 247, in train_step
losses = self(**data)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/detectors/base.py", line 58, in forward
return self.forward_train(**kwargs)
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/detectors/MSMDFusion.py", line 534, in forward_train
losses_pts = self.forward_pts_train(pts_feats, img_feats, gt_bboxes_3d,
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/detectors/MSMDFusion.py", line 574, in forward_pts_train
losses = self.pts_bbox_head.loss(*loss_inputs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 164, in new_func
return old_func(*args, **kwargs)
File "/public/home/xzluo/zc/MSMDFusion-main/mmdet3d/models/dense_heads/transfusion_head.py", line 1260, in loss
layer_loss_cls = self.loss_cls(layer_cls_score, layer_labels, layer_label_weights, avg_factor=max(num_pos, 1))
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/models/losses/focal_loss.py", line 170, in forward
loss_cls = self.loss_weight * calculate_loss_func(
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmdet/models/losses/focal_loss.py", line 85, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred.contiguous(), target, gamma, alpha, None,
File "/public/home/xzluo/anaconda3/envs/zc/lib/python3.8/site-packages/mmcv/ops/focal_loss.py", line 54, in forward
ext_module.sigmoid_focal_loss_forward(
RuntimeError: SigmoidFocalLoss is not compiled with GPU support
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 29983) of binary: /public/home/xzluo/anaconda3/envs/zc/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
The text was updated successfully, but these errors were encountered: