Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Failed to build mmcv in rocm/pytorch docker image: call to '__shfl_down' is ambiguous #2919

Closed
2 tasks done
choyuansu opened this issue Sep 1, 2023 · 1 comment · Fixed by #2843
Closed
2 tasks done
Labels

Comments

@choyuansu
Copy link

choyuansu commented Sep 1, 2023

Prerequisite

Environment

OrderedDict([('sys.platform', 'linux'), ('Python', '3.8.16 (default, Jun 12 2023, 18:09:05) [GCC 11.2.0]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0', 'AMD Radeon RX 6600'), ('CUDA_HOME', '/opt/rocm'), ('NVCC', 'HIP version: 5.6.31061-8c743ae5d\nAMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)\nTarget: x86_64-unknown-linux-gnu\nThread model: posix\nInstalledDir: /opt/rocm/llvm/bin'), ('GCC', 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0'), ('PyTorch', '2.0.0a0+git70f6d0c'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.4\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - HIP Runtime 5.6.31061\n - MIOpen 2.20.0\n - Magma 2.6.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, \n'), ('TorchVision', '0.15.0a0+c206a47'), ('OpenCV', '4.8.0'), ('MMEngine', '0.8.4'), ('MMCV', '2.0.1'), ('MMCV Compiler', 'n/a'), ('MMCV CUDA Compiler', 'n/a')])

Reproduces the problem - code sample

version: '3'
services:
  main:
    image: rocm/pytorch:latest
    command:
      - bash
      - -c
      - |
        pip install -U openmim
        mim install mmengine
        python -c 'from mmengine.utils.dl_utils import collect_env;print(collect_env())'

        git clone --single-branch --branch=v2.0.1 --depth=1 https://github.com/open-mmlab/mmcv.git
        cd mmcv
        pip install -r requirements/optional.txt
        MMCV_WITH_OPS=1 ROCM_HOME=/opt/rocm-5.6.0 python setup.py install
        python -c "from mmcv.utils import collect_env; print(collect_env())"
    environment:
      - HSA_OVERRIDE_GFX_VERSION=10.3.0
    cap_add:
      - SYS_PTRACE
    security_opt:
      - seccomp=unconfined
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
    ipc: host
    shm_size: 8G

Reproduces the problem - command or script

docker compose up

Reproduces the problem - error message

Part of the log:

In file included from /var/lib/jenkins/mmcv/mmcv/ops/csrc/pytorch/hip/carafe_hip.hip:4:
/var/lib/jenkins/mmcv/mmcv/ops/csrc/common/cuda/../hip/carafe_hip_kernel.cuh:61:21: error: call to '__shfl_down' is ambiguous
    __PHALF(val) += __shfl_down(val, offset);
                    ^~~~~~~~~~~
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:315:7: note: candidate function
float __shfl_down(float var, unsigned int lane_delta, int width = warpSize) {
      ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:322:8: note: candidate function
double __shfl_down(double var, unsigned int lane_delta, int width = warpSize) {
       ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:300:5: note: candidate function
int __shfl_down(int var, unsigned int lane_delta, int width = warpSize) {
    ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:308:14: note: candidate function
unsigned int __shfl_down(unsigned int var, unsigned int lane_delta, int width = warpSize) {
             ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:336:6: note: candidate function
long __shfl_down(long var, unsigned int lane_delta, int width = warpSize)
     ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:356:15: note: candidate function
unsigned long __shfl_down(unsigned long var, unsigned int lane_delta, int width = warpSize)
              ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:376:11: note: candidate function
long long __shfl_down(long long var, unsigned int lane_delta, int width = warpSize)
          ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h:389:20: note: candidate function
unsigned long long __shfl_down(unsigned long long var, unsigned int lane_delta, int width = warpSize)
                   ^
/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_fp16.h:1759:17: note: candidate function
         __half __shfl_down(__half var, unsigned int lane_delta, int width = warpSize) {
                ^

Entire log: mmcv-log.tar.gz

Additional information

  1. What's your expected result? Build succeed.
  2. What dataset did you use? N/A
  3. What do you think might be the reason? No idea.
@choyuansu choyuansu changed the title [Bug] Failed to build mmcv in rocm/pytorch docker image [Bug] Failed to build mmcv in rocm/pytorch docker image: call to '__shfl_down' is ambiguous Sep 1, 2023
@choyuansu
Copy link
Author

This only happens with the rocm/pytorch:rocm5.6_ubuntu20.04_py3.8_pytorch_2.0.1 image and not the rocm/pytorch:rocm5.5_ubuntu20.04_py3.8_pytorch_1.13.1 image.

@zhouzaida zhouzaida added the ROCm label Sep 3, 2023
@zhouzaida zhouzaida linked a pull request Sep 3, 2023 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants