master branch no longer works on AMD integrated #101

unoexperto · 2025-02-01T11:41:10Z

Hi folks,

Unfortunately I've made a mistake and pull changes from master and koboldcpp-rocm no longer works for me :( In the past I spent significant amount of time making it work on my integrated AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics (gfx1103).

I've tried to do clean build like this

make LLAMA_HIPBLAS=1 AMDGPU_TARGETS=gfx1100 -j16

and I launch the app like this (parameters that used to work before)

LD_PRELOAD=/home/xxx/work/sideprojects/force-host-alloction-APU/libforcegttalloc.so HSA_OVERRIDE_GFX_VERSION=11.0.0 AMD_SERIALIZE_KERNEL=3 python koboldcpp.py --threads 6 --blasthreads 6 --usecublas mmq lowvram --gpulayers 32 --blasbatchsize 256 --contextsize 8192 --model /home/xxx/jdata/models/dolphin-2.9-llama3-8b-q8_0.gguf

it fails in runtime with following error

ggml_cuda_compute_forward: RMS_NORM failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at ggml/src/ggml-cuda/ggml-cuda.cu:2207
  err
ggml/src/ggml-cuda/ggml-cuda.cu:73: ROCm error

Could you please advise what I'm missing ?

The text was updated successfully, but these errors were encountered:

arturbac · 2025-02-03T02:58:41Z

On my gfx1100 - RX 7900 XTX it no longer works too

rocBLAS error: Tensile solution found, but exception thrown for { a_type: "f16_r", b_type: "f16_r", c_type: "f16_r", d_type: "f16_r", compute_type: "f16_r", transA: 'T', transB: 'N', M: 128, N: 4, K: 32, alpha: 1, row_stride_a: 1, col_stride_a: 4224, row_stride_b: 1, col_stride_b: 32, row_stride_c: 1, col_stride_c: 128, row_stride_d: 1, col_stride_d: 128, beta: 0, batch_count: 40, strided_batch: false, stride_a: 540672, stride_b: 128, stride_c: 512, stride_d: 512, atomics_mode: atomics_allowed }
Alpha value -1912 doesn't match that set in problem: 1
This message will be only be displayed once, unless the ROCBLAS_VERBOSE_TENSILE_ERROR environment variable is set.
ROCm error: CUBLAS_STATUS_INTERNAL_ERROR
  current device: 0, in function ggml_cuda_mul_mat_batched_cublas at ggml/src/ggml-cuda/ggml-cuda.cu:1719
  hipblasGemmBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), HIPBLAS_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), HIPBLAS_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, HIPBLAS_GEMM_DEFAULT)
ggml/src/ggml-cuda/ggml-cuda.cu:73: ROCm error
ptrace: Operacja niedozwolona.
No stack.
The program is not being run.
fish: Job 1, 'python koboldcpp.py --threads 6…' terminated by signal SIGABRT (Przerwij)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master branch no longer works on AMD integrated #101

master branch no longer works on AMD integrated #101

unoexperto commented Feb 1, 2025

arturbac commented Feb 3, 2025

master branch no longer works on AMD integrated #101

master branch no longer works on AMD integrated #101

Comments

unoexperto commented Feb 1, 2025

arturbac commented Feb 3, 2025