-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when trying to compile ggml-cuda.cu
from llama.cpp
#83777
Comments
Reduced to: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
target triple = "amdgcn-amd-amdhsa"
; Function Attrs: sspstrong
define amdgpu_kernel void @_ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii() #0 {
%1 = alloca [4 x [2 x float]], i32 0, align 16, addrspace(5)
call void @llvm.memset.p5.i64(ptr addrspace(5) %1, i8 0, i64 0, i1 false)
ret void
}
; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) #1
attributes #0 = { sspstrong }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) } |
It appears that Disabling stack protector on the GPU side should avoid the problem. |
/usr/lib/llvm/17/bin/clang++ on Gentoo enables This was previously discussed in #62066 and fixed in #70799 in 18.1.0 release. Additionally, on Gentoo side multiple patches were added to hipcc and rocm-runtime to add Regarding Clang-18 support in HIP, today I did few experiments and with few patches it worked, but encountered huge memory consumption in #86332 - which looks like a blocker... So Gentoo will probably stay on LLVM-17 for hipcc in nearest time. |
…ags to fix GPU compilation Add -Xarch_host to CPU-specific flags, so that it does not affects heterogenous code (e. g. HIP). For stack-protector flags: fixes compiler crashes like llvm/llvm-project#83777 Clang 18.1.0 does not try to apply these flags to GPU code, but current ROCm libraries use Clang 17, so add "-Xarch_host" there too. This will allow to drop "-fno-stack-protector" patches from rocm-comgr, hip and hipcc eventually. For -fcf-protection: fixes error: option 'cf-protection=return' cannot be specified on this target. For -fPIE: do not touch, as at least since Clang 15 it only affects host relocation model. See also: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip Related upstream bug: llvm/llvm-project#86450 Closes: https://bugs.gentoo.org/927752 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Add -Xarch_host to CPU-specific flags, so that it does not affects heterogenous code (e. g. HIP). For stack-protector flags: fixes compiler crashes like llvm/llvm-project#83777. Clang 18.1.0 does not try to apply these flags to GPU code, but current ROCm libraries use Clang 17, so add "-Xarch_host" there too. This will allow to drop "-fno-stack-protector" patches from rocm-comgr, hip and hipcc eventually. For -fcf-protection: fixes error: option 'cf-protection=return' cannot be specified on this target. For -fPIE: do not touch, as at least since Clang 15 it only affects host relocation model. See also: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip Bug: llvm/llvm-project#86450 Closes: https://bugs.gentoo.org/927752 Signed-off-by: Sv. Lockal <lockalsash@gmail.com> Closes: #35926 Signed-off-by: Michał Górny <mgorny@gentoo.org>
I don't believe amdgpu has stack-protector either. I would guess the desired behaviour of -x cuda -fstack-protector would be to enable the stack protector on the x64 code and do nothing on the gpu code, at least until such time as that's implemented on the gpu. Maybe emit a warning in the meantime. Do we have a general purpose way of specifying pass some argument to the host clang invocation and some other argument to the device invocation? Openmp has/had some means of doing that which worked in some cases. |
We do not have a consistent way to handle arguments that don't have the same level of support between host and the GPU. So far, in most commonly encountered cases (e.g. sanitizers) we've been filtering out such arguments on the case by case basis, and that's not ideal. We do have |
Backtrace:
LLVM IR file: ggml-cuda.cu.ll.gz
The IR was generated using Clang 17.0.6 and hipBLAS 5.7.1, from
ggml-cuda.cu
in ggerganov/llama.cpp@67be2ceCommand used to generate the IR
/usr/lib/llvm/17/bin/clang++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/labs/llama.cpp/. -isystem /usr/include/rocblas --rocm-device-lib-path=/usr/lib/amdgcn/bitcode/ -O3 -DNDEBUG -std=gnu++11 -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -march=native -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip -MD -MT CMakeFiles/ggml.dir/ggml-cuda.cu.o -o ggml-cuda.cu.ll -S /labs/llama.cpp/ggml-cuda.cu -emit-llvm
clang --version
The text was updated successfully, but these errors were encountered: