Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails agains rocm 6.2/pytorch 2.5 #1162

Open
IMbackK opened this issue Nov 25, 2024 · 3 comments · May be fixed by #1164
Open

Build fails agains rocm 6.2/pytorch 2.5 #1162

IMbackK opened this issue Nov 25, 2024 · 3 comments · May be fixed by #1164

Comments

@IMbackK
Copy link

IMbackK commented Nov 25, 2024

🐛 Bug

Build fails at every object with:

clang++: error: unknown argument: '--use_fast_math'
clang++: error: unknown argument: '--extended-lambda'
clang++: error: unknown argument: '--generate-line-info'
clang++: error: unknown argument '--threads'; did you mean '-mthreads'?
clang++: error: unknown argument: '--ptxas-options=-v'
clang++: error: unknown argument: '--ptxas-options=-O2'
clang++: error: unknown argument: '--ptxas-options=-allow-expensive-optimizations=true'

indeed /opt/rocm/lib/llvm/bin/clang++ (and clang in general) dose not support these options looking at setup.py it is unclear to me how this could have ever compiled against llvm.

To Reproduce

PYTORCH_ROCM_ARCH=gfx908 python setup.py bdist_wheel

or install

Environment

Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.2.41134-0

OS: Arch Linux (x86_64)
GCC version: (GCC) 14.2.1 20240910
Clang version: 18.1.8
CMake version: version 3.31.0
Libc version: glibc-2.40

Python version: 3.12.7 (main, Oct 1 2024, 11:15:50) [GCC 14.2.1 20240910] (64-bit runtime)
Python platform: Linux-6.10.6-arch1-1-x86_64-with-glibc2.40
Is CUDA available: True
CUDA runtime version: 12.6.77
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Radeon RX 6800 XT (gfx1030)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.2.41134
MIOpen runtime version: 3.2.0
Is XNNPACK available: True

@lw
Copy link
Contributor

lw commented Nov 26, 2024

Sorry, we're unable to provide support for ROCm as we don't use such devices ourselves. Perhaps someone from AMD will be able to weigh in.

My only comment is that the options that are "unsupported" are those that, with an NVIDIA setup, one would pass to nvcc. I don't know what the equivalent compiler is in an AMD setup. If you can find out, could you check if it's installed, and why it's not being called?

@IMbackK
Copy link
Author

IMbackK commented Nov 26, 2024

So i have figure this out, the problem is that in in my pytorch install
torch.cuda.is_available() is true
torch.utils.cpp_extension.ROCM_HOME is '/opt/rocm'
and torch.utils.cpp_extension.CUDA_HOME is '/opt/cuda'

thus we go down the cuda path here:

(torch.cuda.is_available() and ((CUDA_HOME is not None)))

according to pytorch documentation the correct way to determine if pytorch is compiled against cuda or rocm is to check torch.version.hip/torch.version.cuda for None and indeed this results in the correct values here.

@IMbackK
Copy link
Author

IMbackK commented Nov 26, 2024

it seams torch.utils.cpp_extension.CUDA_HOME is set whenever torch is compiled while cuda was installed and carries no information on what backend pytorch was compiled against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants