Eval bug: HIP gfx908 (MI100) cublass error when prompt is too long.

An error occur when the prompt is longer than a few tokens. Launch arguments:

#!/bin/bash
export LD_LIBRARY_PATH=/home/tug/Desktop/bin/llama.cpp/build:$LD_LIBRARY_PATH
export HIP_VISIBLE_DEVICES=0

cd "/home/tug/Desktop/bin/llama.cpp/build"

numactl -N 0 -m 0 \
./llama-server \
--n-gpu-layers 99 \
--threads 40 \
--threads-batch 40 \
--ctx-size 35000 \
--batch-size 2048 \
-ub 510 \
--override-tensor exps=CPU \
  --host 0.0.0.0 \
  --port 8080 \
  -fa on \
  --jinja \
  --model "/media/tug/AI NVMe/MODELS/DeepSeek-V3.1-Q4_0/DeepSeek-V3.1-Q4_0-00001-of-00008.gguf"


read -p "Press ENTER to close..."

### Name and Version

llama-server b6399
version: 0 (unknown)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu


### Operating systems

Linux

### GGML backends

HIP

### Hardware

MI100 gfx908 Xeon 2300 X2

### Models

DeepSeek-V3.1-Q4_0

### Problem description & steps to reproduce

Work with small user prompt  but crash when longer.


### First Bad Commit

_No response_

### Relevant log output

slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 35008, n_keep = 0, n_prompt_tokens = 417
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 417, n_tokens = 417, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 417, n_tokens = 417
/shared/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:87: ROCm error
ROCm error: CUBLAS_STATUS_NOT_SUPPORTED
  current device: 0, in function ggml_cuda_op_mul_mat_cublas at /shared/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1302
  hipblasGemmEx(ctx.cublas_handle(id), HIPBLAS_OP_T, HIPBLAS_OP_N, row_diff, src1_ncols, ne10, &alpha, src0_ptr, HIPBLAS_R_16F, ne00, src1_ptr, HIPBLAS_R_16F, ne10, &beta, dst_dd_i, HIPBLAS_R_32F, ldc, HIPBLAS_R_32F, HIPBLAS_GEMM_DEFAULT)
[New LWP 947059]
[New LWP 947058]
[New LWP 947057]
[New LWP 947056]
[New LWP 947055]
[New LWP 947054]
[New LWP 947053]
[New LWP 947052]
[New LWP 947051]
[New LWP 947050]
[New LWP 947049]
[New LWP 947048]
[New LWP 947047]
[New LWP 947046]
[New LWP 947045]
[New LWP 947044]
[New LWP 947043]
[New LWP 947042]
[New LWP 947041]
[New LWP 947040]
[New LWP 947039]
[New LWP 947038]
[New LWP 947037]
[New LWP 947036]
[New LWP 947035]
[New LWP 947034]
[New LWP 947033]
[New LWP 947032]
[New LWP 947031]
[New LWP 947030]
[New LWP 947029]
[New LWP 947028]
[New LWP 947027]
[New LWP 947026]
[New LWP 947025]
[New LWP 947024]
[New LWP 947023]
[New LWP 947022]
[New LWP 947021]
[New LWP 946900]
[New LWP 945802]
[New LWP 945801]
[New LWP 945800]
[New LWP 945799]
[New LWP 945798]
[New LWP 945797]
[New LWP 945796]
[New LWP 945795]
[New LWP 945794]
[New LWP 945793]
[New LWP 945792]
[New LWP 945791]
[New LWP 945790]
[New LWP 945789]
[New LWP 945788]
[New LWP 945787]
[New LWP 945786]
[New LWP 945785]
[New LWP 945784]
[New LWP 945783]
[New LWP 945782]
[New LWP 945781]
[New LWP 945780]
[New LWP 945779]
[New LWP 945778]
[New LWP 945777]
[New LWP 945776]
[New LWP 945775]
[New LWP 945774]
[New LWP 945773]
[New LWP 945772]
[New LWP 945771]
[New LWP 945770]
[New LWP 945769]
[New LWP 945768]
[New LWP 945767]
[New LWP 945766]
[New LWP 945765]
[New LWP 945764]
[New LWP 945763]
[New LWP 945762]
[New LWP 945761]
[New LWP 945760]
[New LWP 945759]
[New LWP 945758]
[New LWP 945757]
[New LWP 945756]
[New LWP 945755]
[New LWP 945754]
[New LWP 945753]
[New LWP 945752]
[New LWP 945751]
[New LWP 945750]
[New LWP 945749]
[New LWP 945748]
[New LWP 945747]
[New LWP 945746]
[New LWP 945745]
[New LWP 945744]
[New LWP 945743]
[New LWP 945742]
[New LWP 945741]
[New LWP 945740]
[New LWP 945739]
[New LWP 945738]
[New LWP 945737]
[New LWP 945736]
[New LWP 945735]
[New LWP 945734]
[New LWP 945733]
[New LWP 945732]
[New LWP 945731]
[New LWP 945730]
[New LWP 945729]
[New LWP 945728]
[New LWP 945727]
[New LWP 945726]
[New LWP 945725]
[New LWP 945724]
[New LWP 945723]
[New LWP 945722]
[New LWP 945406]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/liblber.so.2
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlidec.so.1
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlicommon.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000079a095f107e3 in __GI___wait4 (pid=950067, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0  0x000079a095f107e3 in __GI___wait4 (pid=950067, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000079a0965715f3 in ggml_print_backtrace () from libggml-base.so
#2  0x000079a09657179b in ggml_abort () from libggml-base.so
#3  0x000079a09153ad62 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) () from libggml-hip.so
#4  0x000079a091549b95 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t*) () from libggml-hip.so
#5  0x000079a091547fea in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t*), void (*)(float const*, int const*, void*, ggml_type, long, long, long, long, long, long, long, long, ihipStream_t*)) () from libggml-hip.so
#6  0x000079a091542d86 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from libggml-hip.so
#7  0x000079a091540bad in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from libggml-hip.so
#8  0x000079a09658be07 in ggml_backend_sched_graph_compute_async () from libggml-base.so
#9  0x000079a09669e591 in llama_context::graph_compute(ggml_cgraph*, bool) () from libllama.so
#10 0x000079a09669f994 in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from libllama.so
#11 0x000079a0966a5c6d in llama_context::decode(llama_batch const&) () from libllama.so
#12 0x000079a0966a6baf in llama_decode () from libllama.so
#13 0x000059852106b2a2 in server_context::update_slots() ()
#14 0x00005985210317ac in server_queue::start_loop() ()
#15 0x0000598520ff545b in main ()
[Inferior 1 (process 945365) detached]
/home/tug/Desktop/R1V3HIP.sh: line 20: 945365 Aborted                 (core dumped) numactl -N 0 -m 0 ./llama-server --n-gpu-layers 99 --threads 40 --threads-batch 40 --ctx-size 35000 --batch-size 2048 -ub 510 --override-tensor exps=CPU --host 0.0.0.0 --port 8080 -fa off --jinja --model "/media/tug/AI NVMe/MODELS/DeepSeek-V3.1-Q4_0/DeepSeek-V3.1-Q4_0-00001-of-00008.gguf"

Compiled with:
make -S . -B build \
    -DGGML_HIP=ON \
    -DGPU_TARGETS="gfx908" \
    -DCMAKE_BUILD_TYPE=Release \
    -Dhipblas_DIR=$HIPBLAS_DIR
    -DLLAMA_CURL=OFF

cmake --build build --config Release -- -j16
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: HIP gfx908 (MI100) cublass error when prompt is too long. #15845

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: HIP gfx908 (MI100) cublass error when prompt is too long. #15845

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions