Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] ROCm segfault when running multi-gpu inference. #3451

Closed
4 tasks done
hxjerry opened this issue Oct 3, 2023 · 5 comments
Closed
4 tasks done

[bug] ROCm segfault when running multi-gpu inference. #3451

hxjerry opened this issue Oct 3, 2023 · 5 comments
Labels
AMD GPU Issues specific to AMD GPUs

Comments

@hxjerry
Copy link

hxjerry commented Oct 3, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Expected Tensor split to leverage multi gpus.

Current Behavior

Segfault after model loading when using multi-gpu. Correct inference when using either GPU(two vega-56s installed) and HIP_VISIBLE_DEVICES to force single GPU inference.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

Ryzen 1700x
Vega-56 8G*2

  • Operating System (Ubuntu LTS):

Linux jerryxu-Inspiron-5675 6.2.0-33-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 10:33:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  • SDK version, e.g. for Linux:
$ python3 --version
Python 3.10.13
$ make --version
GNU Make 4.3
$ g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Failure Information (for bugs)

See logs.

Steps to Reproduce.

  1. Compile llama.cpp with ROCm
  2. run any model with tensor split (tried 2 quantizations of 7B and 13B)
  3. get segfault

Failure Logs

llama.cpp log:

Log start
main: build = 1310 (1c84003)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1696299120
ggml_init_cublas: found 2 ROCm devices:
  Device 0: Radeon RX Vega, compute capability 9.0
  Device 1: Radeon RX Vega, compute capability 9.0
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 256.00 MB
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size = 76.38 MB
llama_new_context_with_model: VRAM scratch buffer: 70.50 MB
llama_new_context_with_model: total VRAM used: 4801.43 MB (model: 4474.93 MB, context: 326.50 MB)
段错误 (核心已转储)

GDB stacktrace on segfault:

#0  0x00007ffff672582e in ?? () from /opt/rocm/lib/libamdhip64.so.5
#1  0x00007ffff672dba0 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#2  0x00007ffff672fc6d in ?? () from /opt/rocm/lib/libamdhip64.so.5
#3  0x00007ffff66f8a44 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#4  0x00007ffff65688e7 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#5  0x00007ffff65689e5 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#6  0x00007ffff6568ae0 in ?? () from /opt/rocm/lib/libamdhip64.so.5
#7  0x00007ffff65ac7a2 in hipMemcpy2DAsync () from /opt/rocm/lib/libamdhip64.so.5
#8  0x00005555556917e6 in ggml_cuda_op_mul_mat (src0=0x7ffd240e06b0, src1=0x7f8ab9ea0860, dst=0x7f8ab9ea09b0, 
    op=0x55555569f330 <ggml_cuda_op_mul_mat_q(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t* const&)>, convert_src1_to_q8_1=true)
    at ggml-cuda.cu:6706
#9  0x000055555568cc45 in ggml_cuda_mul_mat (src0=0x7ffd240e06b0, src1=0x7f8ab9ea0860, dst=0x7f8ab9ea09b0) at ggml-cuda.cu:6895
#10 0x000055555568c754 in ggml_cuda_compute_forward (params=0x7ffffffebbb0, tensor=0x7f8ab9ea09b0) at ggml-cuda.cu:7388
#11 0x00005555555b4d1d in ggml_compute_forward (params=0x7ffffffebbb0, tensor=0x7f8ab9ea09b0) at ggml.c:16214
#12 0x00005555555b9a94 in ggml_graph_compute_thread (data=0x7ffffffebc00) at ggml.c:17911
#13 0x00005555555bb123 in ggml_graph_compute (cgraph=0x7f8ab9e00020, cplan=0x7ffffffebd00) at ggml.c:18440
#14 0x00005555555c72aa in ggml_graph_compute_helper (buf=std::vector of length 25112, capacity 25112 = {...}, graph=0x7f8ab9e00020, n_threads=1) at llama.cpp:478
#15 0x00005555555da79f in llama_decode_internal (lctx=..., batch=...) at llama.cpp:4144
#16 0x00005555555e6d41 in llama_decode (ctx=0x5555628ba020, batch=...) at llama.cpp:7454
#17 0x0000555555665dcf in llama_init_from_gpt_params (params=...) at common/common.cpp:845
#18 0x0000555555567b32 in main (argc=8, argv=0x7fffffffde08) at examples/main/main.cpp:181
@hxjerry hxjerry changed the title [User] ROCm segfault when running multi-gpu inference. [bug] ROCm segfault when running multi-gpu inference. Oct 3, 2023
@staviq staviq added the AMD GPU Issues specific to AMD GPUs label Oct 3, 2023
@VitorCaleffi
Copy link

Can you better describe the Compile llama.cpp with ROCm step?

@ccbadd
Copy link

ccbadd commented Oct 3, 2023

I had to start it up with the -lv option when using multiple cards. Kind of sucks though as it makes it a lot slower. This was on a windows machine. Worked fine without the option on my linux server.

@Engininja2
Copy link
Contributor

What's the output of lspci -v for your GPUs focusing on the lines about memory? lspci -v -d 1002::300 should filter just the GPUs. According to this comment ROCm/HIP#3103 (comment) if the first memory region of a GPU doesn't span the entire amount of VRAM then peer to peer transfers for multi-gpu won't work. There may be a motherboard setting named something like Above 4G Decoding to help fix that.

On Windows I think the equivalent would be resizable BAR, and AMD has that functionality limited to certain GPUs and hardware configurations.

@hxjerry
Copy link
Author

hxjerry commented Oct 5, 2023

My motherboard has no support for ReBar or above 4g decoding, so this should be the case. Closing issue for now.

@hxjerry hxjerry closed this as completed Oct 5, 2023
@FNsi
Copy link
Contributor

FNsi commented Oct 5, 2023

Try this for making ReBar GA😄
ReBarUEFI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMD GPU Issues specific to AMD GPUs
Projects
None yet
Development

No branches or pull requests

6 participants