[bug] ROCm segfault when running multi-gpu inference. #3451

hxjerry · 2023-10-03T02:26:38Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Expected Tensor split to leverage multi gpus.

Current Behavior

Segfault after model loading when using multi-gpu. Correct inference when using either GPU(two vega-56s installed) and HIP_VISIBLE_DEVICES to force single GPU inference.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:

Ryzen 1700x
Vega-56 8G*2

Operating System (Ubuntu LTS):

Linux jerryxu-Inspiron-5675 6.2.0-33-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 10:33:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

SDK version, e.g. for Linux:

$ python3 --version
Python 3.10.13
$ make --version
GNU Make 4.3
$ g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Failure Information (for bugs)

See logs.

Steps to Reproduce.

Compile llama.cpp with ROCm
run any model with tensor split (tried 2 quantizations of 7B and 13B)
get segfault

Failure Logs

llama.cpp log:

Log start
main: build = 1310 (1c84003)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1696299120
ggml_init_cublas: found 2 ROCm devices:
  Device 0: Radeon RX Vega, compute capability 9.0
  Device 1: Radeon RX Vega, compute capability 9.0
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 256.00 MB
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size = 76.38 MB
llama_new_context_with_model: VRAM scratch buffer: 70.50 MB
llama_new_context_with_model: total VRAM used: 4801.43 MB (model: 4474.93 MB, context: 326.50 MB)
段错误 (核心已转储)

GDB stacktrace on segfault:

#0  0x00007ffff672582e in ?? () from /opt/rocm/lib/libamdhip64.so.5
#1  0x00007ffff672dba0 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#2  0x00007ffff672fc6d in ?? () from /opt/rocm/lib/libamdhip64.so.5
#3  0x00007ffff66f8a44 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#4  0x00007ffff65688e7 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#5  0x00007ffff65689e5 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#6  0x00007ffff6568ae0 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#7  0x00007ffff65ac7a2 in hipMemcpy2DAsync () from /opt/rocm/lib/libamdhip64.so.5
#8  0x00005555556917e6 in ggml_cuda_op_mul_mat (src0=0x7ffd240e06b0, src1=0x7f8ab9ea0860, dst=0x7f8ab9ea09b0, 
    op=0x55555569f330 <ggml_cuda_op_mul_mat_q(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t* const&)>, convert_src1_to_q8_1=true)
    at ggml-cuda.cu:6706
#9  0x000055555568cc45 in ggml_cuda_mul_mat (src0=0x7ffd240e06b0, src1=0x7f8ab9ea0860, dst=0x7f8ab9ea09b0) at ggml-cuda.cu:6895
#10 0x000055555568c754 in ggml_cuda_compute_forward (params=0x7ffffffebbb0, tensor=0x7f8ab9ea09b0) at ggml-cuda.cu:7388
#11 0x00005555555b4d1d in ggml_compute_forward (params=0x7ffffffebbb0, tensor=0x7f8ab9ea09b0) at ggml.c:16214
#12 0x00005555555b9a94 in ggml_graph_compute_thread (data=0x7ffffffebc00) at ggml.c:17911
#13 0x00005555555bb123 in ggml_graph_compute (cgraph=0x7f8ab9e00020, cplan=0x7ffffffebd00) at ggml.c:18440
#14 0x00005555555c72aa in ggml_graph_compute_helper (buf=std::vector of length 25112, capacity 25112 = {...}, graph=0x7f8ab9e00020, n_threads=1) at llama.cpp:478
#15 0x00005555555da79f in llama_decode_internal (lctx=..., batch=...) at llama.cpp:4144
#16 0x00005555555e6d41 in llama_decode (ctx=0x5555628ba020, batch=...) at llama.cpp:7454
#17 0x0000555555665dcf in llama_init_from_gpt_params (params=...) at common/common.cpp:845
#18 0x0000555555567b32 in main (argc=8, argv=0x7fffffffde08) at examples/main/main.cpp:181

The text was updated successfully, but these errors were encountered:

VitorCaleffi · 2023-10-03T13:55:40Z

Can you better describe the Compile llama.cpp with ROCm step?

ccbadd · 2023-10-03T18:12:25Z

I had to start it up with the -lv option when using multiple cards. Kind of sucks though as it makes it a lot slower. This was on a windows machine. Worked fine without the option on my linux server.

Engininja2 · 2023-10-03T19:38:45Z

What's the output of lspci -v for your GPUs focusing on the lines about memory? lspci -v -d 1002::300 should filter just the GPUs. According to this comment ROCm/HIP#3103 (comment) if the first memory region of a GPU doesn't span the entire amount of VRAM then peer to peer transfers for multi-gpu won't work. There may be a motherboard setting named something like Above 4G Decoding to help fix that.

On Windows I think the equivalent would be resizable BAR, and AMD has that functionality limited to certain GPUs and hardware configurations.

hxjerry · 2023-10-05T04:33:37Z

My motherboard has no support for ReBar or above 4g decoding, so this should be the case. Closing issue for now.

FNsi · 2023-10-05T06:20:47Z

Try this for making ReBar GA😄
ReBarUEFI

hxjerry changed the title ~~[User] ROCm segfault when running multi-gpu inference.~~ [bug] ROCm segfault when running multi-gpu inference. Oct 3, 2023

staviq added the AMD GPU Issues specific to AMD GPUs label Oct 3, 2023

hxjerry closed this as completed Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] ROCm segfault when running multi-gpu inference. #3451

[bug] ROCm segfault when running multi-gpu inference. #3451

hxjerry commented Oct 3, 2023

VitorCaleffi commented Oct 3, 2023

ccbadd commented Oct 3, 2023

Engininja2 commented Oct 3, 2023

hxjerry commented Oct 5, 2023

FNsi commented Oct 5, 2023

[bug] ROCm segfault when running multi-gpu inference. #3451

[bug] ROCm segfault when running multi-gpu inference. #3451

Comments

hxjerry commented Oct 3, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce.

Failure Logs

VitorCaleffi commented Oct 3, 2023

ccbadd commented Oct 3, 2023

Engininja2 commented Oct 3, 2023

hxjerry commented Oct 5, 2023

FNsi commented Oct 5, 2023