Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bug in CenterPoint #387

Closed
XuyangBai opened this issue Mar 28, 2021 · 2 comments
Closed

Potential Bug in CenterPoint #387

XuyangBai opened this issue Mar 28, 2021 · 2 comments
Assignees

Comments

@XuyangBai
Copy link

Describe the bug

When training using CenterPoint, it will raise an error at L404

# transpose heatmaps, because the dimension of tensors in each task is
# different, we have to use numpy instead of torch to do the transpose.
heatmaps = np.array(heatmaps).transpose(1, 0).tolist()
heatmaps = [torch.stack(hms_) for hms_ in heatmaps]

The error message is

*** TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I suspect it is due to the update of numpy, which changes the behavior of np.array function. Now I use numpy 1.20.0, and I have tried using an earlier version of numpy like 1.19.1 but then there will be another error like this #301.

Currently I can solve this error by

        device = heatmaps[0][0].device
        heatmaps = [[y.cpu() for y in x] for x in heatmaps]
        heatmaps = np.array(heatmaps).transpose(1, 0).tolist()
        heatmaps = [torch.stack(hms_).to(device) for hms_ in heatmaps]

but it may increase the training time since it brings memory copy between CPU and GPU. Could you please share your numpy and mmpycocotools version or other solutions to this problem?

Reproduction

  1. What command or script did you run?
./tools/dist_train.sh configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_4x8_cyclic_20e_nus.py 4
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    I didn't make any modifications
  2. What dataset did you use?
    nuScenes

Environment

  1. Please run python mmdet3d/utils/collect_env.py to collect necessary environment infomation and paste it here.
sys.platform: linux
Python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.7.0
OpenCV: 4.5.1
MMCV: 1.2.5
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMDetection: 2.10.0
MMDetection3D: 0.11.0+391a56b
  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)
@tianweiy
Copy link
Contributor

I think for some reason the LiDARInstance3DBoxes is converted to cuda in the current version? It works well a few months ago

@XuyangBai
Copy link
Author

Thanks @tianweiy for your help, I install np1.19.4 and the problem was solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants