Crash when trying to compile `ggml-cuda.cu` from llama.cpp #83777

sin-ack · 2024-03-04T07:39:50Z

Backtrace:

Call parameter type does not match function signature!
  %StackGuardSlot = alloca ptr, align 8, addrspace(5)
 ptr  call void @llvm.stackprotector(ptr %8, ptr addrspace(5) %StackGuardSlot)
in function _ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii
LLVM ERROR: Broken function found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: llc ggml-cuda.cu.ll
1.	Running pass 'CallGraph Pass Manager' on module 'ggml-cuda.cu.ll'.
2.	Running pass 'Module Verifier' on function '@_ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii'
 #0 0x00007f92af85a06e llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xc5a06e)
 #1 0x00007f92af857a2b llvm::sys::RunSignalHandlers() (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xc57a2b)
 #2 0x00007f92af857ba6 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xc57ba6)
 #3 0x00007f92ae675dc0 (/lib64/libc.so.6+0x3ddc0)
 #4 0x00007f92ae6c5d9c __pthread_kill_implementation /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/nptl/pthread_kill.c:44:76
 #5 0x00007f92ae675d12 gsignal /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f92ae65e4ed abort /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/stdlib/abort.c:81:7
 #7 0x00007f92af40ffb7 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0x80ffb7)
 #8 0x00007f92af7921ca (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xb921ca)
 #9 0x00007f92afa55763 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xe55763)
#10 0x00007f92af9bc3bc llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xdbc3bc)
#11 0x00007f92b0f79ca9 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0x2379ca9)
#12 0x00007f92af9bcd51 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xdbcd51)
#13 0x0000560f4c6f11e8 (/usr/lib/llvm/17/bin/llc+0x1b1e8)
#14 0x0000560f4c6e6114 main (/usr/lib/llvm/17/bin/llc+0x10114)
#15 0x00007f92ae65feea __libc_start_call_main /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#16 0x00007f92ae65ffa5 call_init /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/csu/../csu/libc-start.c:128:20
#17 0x00007f92ae65ffa5 __libc_start_main /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/csu/../csu/libc-start.c:347:5
#18 0x0000560f4c6e64e1 _start (/usr/lib/llvm/17/bin/llc+0x104e1)
[1]    15924 IOT instruction  llc ggml-cuda.cu.ll

LLVM IR file: ggml-cuda.cu.ll.gz

The IR was generated using Clang 17.0.6 and hipBLAS 5.7.1, from ggml-cuda.cu in ggerganov/llama.cpp@67be2ce

Command used to generate the IR

/usr/lib/llvm/17/bin/clang++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/labs/llama.cpp/. -isystem /usr/include/rocblas --rocm-device-lib-path=/usr/lib/amdgcn/bitcode/ -O3 -DNDEBUG -std=gnu++11 -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -march=native -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip -MD -MT CMakeFiles/ggml.dir/ggml-cuda.cu.o -o ggml-cuda.cu.ll -S /labs/llama.cpp/ggml-cuda.cu -emit-llvm

clang --version

clang version 17.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/17/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg

The text was updated successfully, but these errors were encountered:

sin-ack · 2024-03-04T07:44:53Z

Reduced to:

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
target triple = "amdgcn-amd-amdhsa"

; Function Attrs: sspstrong
define amdgpu_kernel void @_ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii() #0 {
  %1 = alloca [4 x [2 x float]], i32 0, align 16, addrspace(5)
  call void @llvm.memset.p5.i64(ptr addrspace(5) %1, i8 0, i64 0, i1 false)
  ret void
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) #1

attributes #0 = { sspstrong }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }

Artem-B · 2024-03-04T19:53:30Z

It appears that -fstack-protector somehow got enabled on the GPU side. I'm not sure whether AMDGPU supports it, but I would expect that to be a problem for NVPTX.

Disabling stack protector on the GPU side should avoid the problem.

AngryLoki · 2024-03-22T21:40:42Z

/usr/lib/llvm/17/bin/clang++ on Gentoo enables -fstack-protector-strong for all targets in /etc/clang/x86_64-pc-linux-gnu-clang.cfg -> gentoo-common.cfg -> gentoo-hardened.cfg.

This was previously discussed in #62066 and fixed in #70799 in 18.1.0 release.

Additionally, on Gentoo side multiple patches were added to hipcc and rocm-runtime to add -fno-stack-protector when user compiles code with hipcc wrapper or from rocm runtime while using Clang-17 (sorry, can't do better than that; Gentoo does not backport patches for LLVM). Just use hipcc, it will add multiple flags as described in https://wiki.gentoo.org/wiki/HIP#hipcc_.28Clang_wrapper.29

Regarding Clang-18 support in HIP, today I did few experiments and with few patches it worked, but encountered huge memory consumption in #86332 - which looks like a blocker... So Gentoo will probably stay on LLVM-17 for hipcc in nearest time.

…ags to fix GPU compilation Add -Xarch_host to CPU-specific flags, so that it does not affects heterogenous code (e. g. HIP). For stack-protector flags: fixes compiler crashes like llvm/llvm-project#83777 Clang 18.1.0 does not try to apply these flags to GPU code, but current ROCm libraries use Clang 17, so add "-Xarch_host" there too. This will allow to drop "-fno-stack-protector" patches from rocm-comgr, hip and hipcc eventually. For -fcf-protection: fixes error: option 'cf-protection=return' cannot be specified on this target. For -fPIE: do not touch, as at least since Clang 15 it only affects host relocation model. See also: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip Related upstream bug: llvm/llvm-project#86450 Closes: https://bugs.gentoo.org/927752 Signed-off-by: Sv. Lockal <lockalsash@gmail.com>

Add -Xarch_host to CPU-specific flags, so that it does not affects heterogenous code (e. g. HIP). For stack-protector flags: fixes compiler crashes like llvm/llvm-project#83777. Clang 18.1.0 does not try to apply these flags to GPU code, but current ROCm libraries use Clang 17, so add "-Xarch_host" there too. This will allow to drop "-fno-stack-protector" patches from rocm-comgr, hip and hipcc eventually. For -fcf-protection: fixes error: option 'cf-protection=return' cannot be specified on this target. For -fPIE: do not touch, as at least since Clang 15 it only affects host relocation model. See also: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip Bug: llvm/llvm-project#86450 Closes: https://bugs.gentoo.org/927752 Signed-off-by: Sv. Lockal <lockalsash@gmail.com> Closes: #35926 Signed-off-by: Michał Górny <mgorny@gentoo.org>

JonChesterfield · 2024-04-18T20:50:51Z

I don't believe amdgpu has stack-protector either. I would guess the desired behaviour of -x cuda -fstack-protector would be to enable the stack protector on the x64 code and do nothing on the gpu code, at least until such time as that's implemented on the gpu. Maybe emit a warning in the meantime.

Do we have a general purpose way of specifying pass some argument to the host clang invocation and some other argument to the device invocation? Openmp has/had some means of doing that which worked in some cases.

Artem-B · 2024-04-19T18:37:08Z

We do not have a consistent way to handle arguments that don't have the same level of support between host and the GPU. So far, in most commonly encountered cases (e.g. sanitizers) we've been filtering out such arguments on the case by case basis, and that's not ideal.

We do have -Xarch_host and -Xarch_device which may be used to override top-level flags, but it does not always work if top-level flags get converted into a set of different cc1 arguments.

github-actions bot added the new issue label Mar 4, 2024

EugeneZelenko added cuda and removed new issue labels Mar 22, 2024

AngryLoki mentioned this issue Mar 26, 2024

sys-devel/clang-common: add -Xarch_host to CET and stack-protector flags to fix GPU compilation gentoo/gentoo#35926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash when trying to compile `ggml-cuda.cu` from llama.cpp #83777

Crash when trying to compile `ggml-cuda.cu` from llama.cpp #83777

sin-ack commented Mar 4, 2024

sin-ack commented Mar 4, 2024

Artem-B commented Mar 4, 2024

AngryLoki commented Mar 22, 2024 •

edited

Loading

JonChesterfield commented Apr 18, 2024

Artem-B commented Apr 19, 2024

Crash when trying to compile ggml-cuda.cu from llama.cpp #83777

Crash when trying to compile ggml-cuda.cu from llama.cpp #83777

Comments

sin-ack commented Mar 4, 2024

sin-ack commented Mar 4, 2024

Artem-B commented Mar 4, 2024

AngryLoki commented Mar 22, 2024 • edited Loading

JonChesterfield commented Apr 18, 2024

Artem-B commented Apr 19, 2024

Crash when trying to compile `ggml-cuda.cu` from llama.cpp #83777

Crash when trying to compile `ggml-cuda.cu` from llama.cpp #83777

AngryLoki commented Mar 22, 2024 •

edited

Loading