Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when trying to compile ggml-cuda.cu from llama.cpp #83777

Open
sin-ack opened this issue Mar 4, 2024 · 5 comments
Open

Crash when trying to compile ggml-cuda.cu from llama.cpp #83777

sin-ack opened this issue Mar 4, 2024 · 5 comments
Labels

Comments

@sin-ack
Copy link

sin-ack commented Mar 4, 2024

Backtrace:

Call parameter type does not match function signature!
  %StackGuardSlot = alloca ptr, align 8, addrspace(5)
 ptr  call void @llvm.stackprotector(ptr %8, ptr addrspace(5) %StackGuardSlot)
in function _ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii
LLVM ERROR: Broken function found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: llc ggml-cuda.cu.ll
1.	Running pass 'CallGraph Pass Manager' on module 'ggml-cuda.cu.ll'.
2.	Running pass 'Module Verifier' on function '@_ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii'
 #0 0x00007f92af85a06e llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xc5a06e)
 #1 0x00007f92af857a2b llvm::sys::RunSignalHandlers() (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xc57a2b)
 #2 0x00007f92af857ba6 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xc57ba6)
 #3 0x00007f92ae675dc0 (/lib64/libc.so.6+0x3ddc0)
 #4 0x00007f92ae6c5d9c __pthread_kill_implementation /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/nptl/pthread_kill.c:44:76
 #5 0x00007f92ae675d12 gsignal /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f92ae65e4ed abort /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/stdlib/abort.c:81:7
 #7 0x00007f92af40ffb7 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0x80ffb7)
 #8 0x00007f92af7921ca (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xb921ca)
 #9 0x00007f92afa55763 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xe55763)
#10 0x00007f92af9bc3bc llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xdbc3bc)
#11 0x00007f92b0f79ca9 (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0x2379ca9)
#12 0x00007f92af9bcd51 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/lib/llvm/17/bin/../lib64/libLLVM-17.so+0xdbcd51)
#13 0x0000560f4c6f11e8 (/usr/lib/llvm/17/bin/llc+0x1b1e8)
#14 0x0000560f4c6e6114 main (/usr/lib/llvm/17/bin/llc+0x10114)
#15 0x00007f92ae65feea __libc_start_call_main /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#16 0x00007f92ae65ffa5 call_init /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/csu/../csu/libc-start.c:128:20
#17 0x00007f92ae65ffa5 __libc_start_main /var/tmp/portage/sys-libs/glibc-2.38-r10/work/glibc-2.38/csu/../csu/libc-start.c:347:5
#18 0x0000560f4c6e64e1 _start (/usr/lib/llvm/17/bin/llc+0x104e1)
[1]    15924 IOT instruction  llc ggml-cuda.cu.ll

LLVM IR file: ggml-cuda.cu.ll.gz

The IR was generated using Clang 17.0.6 and hipBLAS 5.7.1, from ggml-cuda.cu in ggerganov/llama.cpp@67be2ce

Command used to generate the IR

/usr/lib/llvm/17/bin/clang++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/labs/llama.cpp/. -isystem /usr/include/rocblas --rocm-device-lib-path=/usr/lib/amdgcn/bitcode/ -O3 -DNDEBUG -std=gnu++11 -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -march=native -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip -MD -MT CMakeFiles/ggml.dir/ggml-cuda.cu.o -o ggml-cuda.cu.ll -S /labs/llama.cpp/ggml-cuda.cu -emit-llvm

clang --version
clang version 17.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/17/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg
@sin-ack
Copy link
Author

sin-ack commented Mar 4, 2024

Reduced to:

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
target triple = "amdgcn-amd-amdhsa"

; Function Attrs: sspstrong
define amdgpu_kernel void @_ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii() #0 {
  %1 = alloca [4 x [2 x float]], i32 0, align 16, addrspace(5)
  call void @llvm.memset.p5.i64(ptr addrspace(5) %1, i8 0, i64 0, i1 false)
  ret void
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) #1

attributes #0 = { sspstrong }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }

@Artem-B
Copy link
Member

Artem-B commented Mar 4, 2024

It appears that -fstack-protector somehow got enabled on the GPU side. I'm not sure whether AMDGPU supports it, but I would expect that to be a problem for NVPTX.

Disabling stack protector on the GPU side should avoid the problem.

@AngryLoki
Copy link
Contributor

AngryLoki commented Mar 22, 2024

/usr/lib/llvm/17/bin/clang++ on Gentoo enables -fstack-protector-strong for all targets in /etc/clang/x86_64-pc-linux-gnu-clang.cfg -> gentoo-common.cfg -> gentoo-hardened.cfg.

This was previously discussed in #62066 and fixed in #70799 in 18.1.0 release.

Additionally, on Gentoo side multiple patches were added to hipcc and rocm-runtime to add -fno-stack-protector when user compiles code with hipcc wrapper or from rocm runtime while using Clang-17 (sorry, can't do better than that; Gentoo does not backport patches for LLVM). Just use hipcc, it will add multiple flags as described in https://wiki.gentoo.org/wiki/HIP#hipcc_.28Clang_wrapper.29

Regarding Clang-18 support in HIP, today I did few experiments and with few patches it worked, but encountered huge memory consumption in #86332 - which looks like a blocker... So Gentoo will probably stay on LLVM-17 for hipcc in nearest time.

AngryLoki added a commit to AngryLoki/gentoo that referenced this issue Mar 26, 2024
…ags to fix GPU compilation

Add -Xarch_host to CPU-specific flags, so that it does not affects heterogenous code (e. g. HIP).

For stack-protector flags: fixes compiler crashes like llvm/llvm-project#83777
Clang 18.1.0 does not try to apply these flags to GPU code, but current ROCm libraries use Clang 17, so add "-Xarch_host" there too.
This will allow to drop "-fno-stack-protector" patches from rocm-comgr, hip and hipcc eventually.

For -fcf-protection: fixes error: option 'cf-protection=return' cannot be specified on this target.

For -fPIE: do not touch, as at least since Clang 15 it only affects host relocation model.
See also: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip

Related upstream bug: llvm/llvm-project#86450
Closes: https://bugs.gentoo.org/927752
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Mar 26, 2024
Add -Xarch_host to CPU-specific flags, so that it does not affects
heterogenous code (e. g. HIP).

For stack-protector flags: fixes compiler crashes like
llvm/llvm-project#83777.  Clang 18.1.0 does
not try to apply these flags to GPU code, but current ROCm libraries use
Clang 17, so add "-Xarch_host" there too.  This will allow to drop
"-fno-stack-protector" patches from rocm-comgr, hip and hipcc
eventually.

For -fcf-protection: fixes error: option 'cf-protection=return' cannot
be specified on this target.

For -fPIE: do not touch, as at least since Clang 15 it only affects host
relocation model.  See also:
https://github.com/llvm/llvm-project/blob/llvmorg-15.0.7/clang/test/Driver/hip-fpie-option.hip

Bug: llvm/llvm-project#86450
Closes: https://bugs.gentoo.org/927752
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Closes: #35926
Signed-off-by: Michał Górny <mgorny@gentoo.org>
@JonChesterfield
Copy link
Collaborator

I don't believe amdgpu has stack-protector either. I would guess the desired behaviour of -x cuda -fstack-protector would be to enable the stack protector on the x64 code and do nothing on the gpu code, at least until such time as that's implemented on the gpu. Maybe emit a warning in the meantime.

Do we have a general purpose way of specifying pass some argument to the host clang invocation and some other argument to the device invocation? Openmp has/had some means of doing that which worked in some cases.

@Artem-B
Copy link
Member

Artem-B commented Apr 19, 2024

We do not have a consistent way to handle arguments that don't have the same level of support between host and the GPU. So far, in most commonly encountered cases (e.g. sanitizers) we've been filtering out such arguments on the case by case basis, and that's not ideal.

We do have -Xarch_host and -Xarch_device which may be used to override top-level flags, but it does not always work if top-level flags get converted into a set of different cc1 arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants