Potential Bug in CenterPoint #387

XuyangBai · 2021-03-28T10:28:26Z

Describe the bug

When training using CenterPoint, it will raise an error at L404

mmdetection3d/mmdet3d/models/dense_heads/centerpoint_head.py

Lines 402 to 405 in 391a56b

 # transpose heatmaps, because the dimension of tensors in each task is 

 # different, we have to use numpy instead of torch to do the transpose. 

 heatmaps = np.array(heatmaps).transpose(1, 0).tolist() 

 heatmaps = [torch.stack(hms_) for hms_ in heatmaps]

The error message is

*** TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I suspect it is due to the update of numpy, which changes the behavior of np.array function. Now I use numpy 1.20.0, and I have tried using an earlier version of numpy like 1.19.1 but then there will be another error like this #301.

Currently I can solve this error by

        device = heatmaps[0][0].device
        heatmaps = [[y.cpu() for y in x] for x in heatmaps]
        heatmaps = np.array(heatmaps).transpose(1, 0).tolist()
        heatmaps = [torch.stack(hms_).to(device) for hms_ in heatmaps]

but it may increase the training time since it brings memory copy between CPU and GPU. Could you please share your numpy and mmpycocotools version or other solutions to this problem?

Reproduction

What command or script did you run?

./tools/dist_train.sh configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_4x8_cyclic_20e_nus.py 4

Did you make any modifications on the code or config? Did you understand what you have modified?
I didn't make any modifications
What dataset did you use?
nuScenes

Environment

Please run python mmdet3d/utils/collect_env.py to collect necessary environment infomation and paste it here.

sys.platform: linux
Python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.7.0
OpenCV: 4.5.1
MMCV: 1.2.5
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMDetection: 2.10.0
MMDetection3D: 0.11.0+391a56b

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

The text was updated successfully, but these errors were encountered:

tianweiy · 2021-03-29T20:54:51Z

I think for some reason the LiDARInstance3DBoxes is converted to cuda in the current version? It works well a few months ago

XuyangBai · 2021-04-05T03:46:19Z

Thanks @tianweiy for your help, I install np1.19.4 and the problem was solved

ZwwWayne assigned xiliu8006 Mar 28, 2021

XuyangBai closed this as completed Apr 5, 2021

robin-karlsson0 mentioned this issue Aug 22, 2021

[Fix] Centerpoint head nested list transpose #879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Bug in CenterPoint #387

Potential Bug in CenterPoint #387

XuyangBai commented Mar 28, 2021

tianweiy commented Mar 29, 2021

XuyangBai commented Apr 5, 2021

Potential Bug in CenterPoint #387

Potential Bug in CenterPoint #387

Comments

XuyangBai commented Mar 28, 2021

tianweiy commented Mar 29, 2021

XuyangBai commented Apr 5, 2021