Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. In tools/test.py #1461

Open
ammaryasirnaich opened this issue May 4, 2022 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@ammaryasirnaich
Copy link

ammaryasirnaich commented May 4, 2022

Hi,
I am testing the pre-trainined second model along with visualization running the command :

python /workspace/mmdetection3d/tools/test.py \
  /workspace/mmdetection3d/configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py\
  /workspace/working_dir/second_epoch_40.pth \
  --show --show-dir /workspace/working_dir/training_results

However, in the 000005 instance it gets a Runtime Error.

UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/TensorShape.cpp:2228.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[                                                  ] 3/3769, 0.2 task/s, elapsed: 12s, ETA: 15084sTraceback (most recent call last):
  File "/workspace/mmdetection3d/tools/test.py", line 260, in <module>
    main()
  File "/workspace/mmdetection3d/tools/test.py", line 230, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/workspace/mmdetection3d/mmdet3d/apis/test.py", line 48, in single_gpu_test
    model.module.show_results(
  File "/workspace/mmdetection3d/mmdet3d/models/detectors/base.py", line 120, in show_results
    show_result(
  File "/workspace/mmdetection3d/mmdet3d/core/visualizer/show_result.py", line 110, in show_result
    0, 255, size=(pred_labels.max() + 1, 3)) / 256
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

My working environment is:

CUDA available: True
GPU 0: NVIDIA GeForce RTX 3080
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 
TorchVision: 0.12.0
OpenCV: 4.5.5
MMCV: 1.4.8
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.3
MMDetection: 2.23.0
MMSegmentation: 0.22.1
MMDetection3D: 1.0.0rc2+2eed522
spconv2.0: True

Will much appreciate for a help !

@Tai-Wang Tai-Wang added the bug Something isn't working label May 18, 2022
@ApoorvaSuresh
Copy link

Hi, I have the same error :( Did you find a solution for it? If so, could you please share it?
Thanks in advance :)

@ammaryasirnaich
Copy link
Author

Hi, I have the same error :( Did you find a solution for it? If so, could you please share it? Thanks in advance :)

Sorry @ApoorvaSuresh still waiting for help. I have no idea what is causing it !

@Tai-Wang
Copy link
Member

Have you ever tried our pretrained models? Maybe your trained models are not good enough and produce no predictions, which causes the input.numel() == 0.

@ammaryasirnaich
Copy link
Author

@Tai-Wang thanks for your response. I will try once again to re-check with the pre-trained model. However, the re-trained models show more than 72% mAP on Hard, medium, and easy modes.

@Tai-Wang
Copy link
Member

Tai-Wang commented Jun 7, 2022

You can add a breakpoint in the show function and have a look at why the input.numel() == 0. I guess it might be compatible for no predictions during evaluation while not for visualization.

@ammaryasirnaich
Copy link
Author

@Tai-Wang , i am getting the same error with the pre-trained model


  File "mmdetection3d/tools/test.py", line 260, in <module>
    main()
  File "mmdetection3d/tools/test.py", line 230, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/workspace/mmdetection3d/mmdet3d/apis/test.py", line 48, in single_gpu_test
    model.module.show_results(
  File "/workspace/mmdetection3d/mmdet3d/models/detectors/base.py", line 120, in show_results
    show_result(
  File "/workspace/mmdetection3d/mmdet3d/core/visualizer/show_result.py", line 110, in show_result
    0, 255, size=(pred_labels.max() + 1, 3)) / 256
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

@ammaryasirnaich
Copy link
Author

One thing more, I think the pre-trained models must have been trained on spconv1.0. But I have spconv2.0 with my environment is it going to be some mismatch issue because as the model starts I also get the following messing in the terminal

load checkpoint from local path: /workspace/working_dir/hv_second_secfpn.pth
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([16, 3, 3, 3, 4])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 128]).
missing keys in source state_dict: voxel_encoder.vfe_layers.0.norm.weight, voxel_encoder.vfe_layers.0.norm.bias, voxel_encoder.vfe_layers.0.norm.running_mean, voxel_encoder.vfe_layers.0.norm.running_var, voxel_encoder.vfe_layers.0.linear.weight, voxel_encoder.vfe_layers.1.norm.weight, voxel_encoder.vfe_layers.1.norm.bias, voxel_encoder.vfe_layers.1.norm.running_mean, voxel_encoder.vfe_layers.1.norm.running_var, voxel_encoder.vfe_layers.1.linear.weight

@Tai-Wang
Copy link
Member

One thing more, I think the pre-trained models must have been trained on spconv1.0. But I have spconv2.0 with my environment is it going to be some mismatch issue because as the model starts I also get the following messing in the terminal

load checkpoint from local path: /workspace/working_dir/hv_second_secfpn.pth
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([16, 3, 3, 3, 4])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 128]).
missing keys in source state_dict: voxel_encoder.vfe_layers.0.norm.weight, voxel_encoder.vfe_layers.0.norm.bias, voxel_encoder.vfe_layers.0.norm.running_mean, voxel_encoder.vfe_layers.0.norm.running_var, voxel_encoder.vfe_layers.0.linear.weight, voxel_encoder.vfe_layers.1.norm.weight, voxel_encoder.vfe_layers.1.norm.bias, voxel_encoder.vfe_layers.1.norm.running_mean, voxel_encoder.vfe_layers.1.norm.running_var, voxel_encoder.vfe_layers.1.linear.weight

The pretrained models of SECOND are not updated after the coordinate system refactoring. For now, you can try PointPillars with our provided models or train your own SECOND models with our provided configs.

@ammaryasirnaich
Copy link
Author

ammaryasirnaich commented Jun 11, 2022

But @Tai-Wan at the first instant got the mentioned (Posted title) error while training the own SECOND model with your provided configs!

@ammaryasirnaich
Copy link
Author

@Tai-Wang , @ZCMax did you had a chance to further investigate the issue that I have used raised:
1 ) Gives the same error with the pre-trained model with the given config file
2) Gives the same error after retraining the model with the given config file

@ammaryasirnaich
Copy link
Author

ammaryasirnaich commented Jul 5, 2022

It work fine when i run it with the following command
python tools/test.py workspace/mmdetection3d/configs/second/mmdetection3d/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car.py /workspace/mmdetection3d/working_dir/hv_second_kitti-3d-car.pth --eval 'mAP' --eval-options 'show=True' 'out_dir=/workspace/mmdetection3d/working_dir/show_results'

@jialeli1
Copy link

One thing more, I think the pre-trained models must have been trained on spconv1.0. But I have spconv2.0 with my environment is it going to be some mismatch issue because as the model starts I also get the following messing in the terminal

load checkpoint from local path: /workspace/working_dir/hv_second_secfpn.pth
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([16, 3, 3, 3, 4])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 128]).
missing keys in source state_dict: voxel_encoder.vfe_layers.0.norm.weight, voxel_encoder.vfe_layers.0.norm.bias, voxel_encoder.vfe_layers.0.norm.running_mean, voxel_encoder.vfe_layers.0.norm.running_var, voxel_encoder.vfe_layers.0.linear.weight, voxel_encoder.vfe_layers.1.norm.weight, voxel_encoder.vfe_layers.1.norm.bias, voxel_encoder.vfe_layers.1.norm.running_mean, voxel_encoder.vfe_layers.1.norm.running_var, voxel_encoder.vfe_layers.1.linear.weight

This ”mismatch“ problem also happened to me. How to fix it?

@ammaryasirnaich
Copy link
Author

ammaryasirnaich commented Jul 15, 2022

@jialeli1 actually i didn't solve my mismatch problem. It only solved the RuntimeError:max() issue. The pre-trained model for the config hv_second_secfpn_6x8_80e_kitti-3d-3class.py is working, however but if you retraining the model and do the evaluations the model keeps giving size mismatch for middle_encoder.conv_input.0.weight. I am also waiting for help

@holtvogt
Copy link

holtvogt commented Aug 9, 2022

Is it possible to hotfix this by replacing the line in


with

if pred_labels is None or pred_labels.numel() == 0

?

@Tracy-git
Copy link

是否可以通过替换中的行来修复此问题

if pred_labels is None or pred_labels.numel() == 0

i have solved this error with your suggestion ,thks so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants