WARN: Segmentation fault: address not mapped in llama.cpp #1949

honggyukim · 2024-08-18T14:24:50Z

The llama.cpp project already has an option to add -pg option with LLAMA_GPROF=1.

But it gets crashed when llama-cli is traced with uftrace as follows.

$ git clone https://github.com/ggerganov/llama.cpp.git && cd llama.cpp

$ git checkout 2339a0be1c8e31fcf4531427183b94f2ef019e56

$ make LLAMA_GPROF=1 -j

$ uftrace record --no-libcall ./llama-cli 
Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723990922
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-12-g8382 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [8] (llama_load_model_from_file[5a2eea7cfda4] <= llama_init_from_gpt_params[5a2eea86b78e])
WARN: [7] (llama_init_from_gpt_params[5a2eea86b700] <= main[5a2eea6d4392])
WARN: [6] (log_dump_cmdline_impl[5a2eea8d012d] <= main[5a2eea6d3e46])
WARN: [5] (main[5a2eea6d3bec] <= __libc_start_call_main[7aad7802a1ca])
WARN: [4] (__static_initialization_and_destruction_0[5a2eea6d28a9] <= mcount_return[7aad788657ef])
WARN: [3] (_GLOBAL__sub_I_json_schema_to_grammar.cpp[5a2eea6d3bae] <= __libc_start_main[7aad7802a304])
WARN: [2] (_GLOBAL__sub_I_console.cpp[5a2eea6d268e] <= __libc_start_main[7aad7802a304])
WARN: [1] (__static_initialization_and_destruction_0[5a2eea6d2402] <= mcount_return[7aad788657ef])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[5a2eea6d266e] <= __libc_start_main[7aad7802a304])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

It still gets crashed even with -e option.

$ uftrace record --no-libcall -e ./llama-cli 
Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723990966
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN: Backtrace from uftrace v0.16-12-g8382 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [6] (llama_load_model_from_file[5bc0d1b9ada4] <= llama_init_from_gpt_params[5bc0d1c3678e])
WARN: [5] (llama_model_default_params[5bc0d1b41bae] <= llama_model_params_from_gpt_params[5bc0d1c306af])
WARN: [4] (llama_model_params_from_gpt_params[5bc0d1c30695] <= llama_init_from_gpt_params[5bc0d1c36737])
WARN: [3] (llama_init_from_gpt_params[5bc0d1c36700] <= main[5bc0d1a9f392])
WARN: [2] (main[5bc0d1a9ebec] <= __libc_start_call_main[70fd6b82a1ca])
WARN: [1] (__static_initialization_and_destruction_0[5bc0d1a9d8a9] <= __libc_start_main[70fd6b82a304])
WARN: [0] (_GLOBAL__sub_I_json_schema_to_grammar.cpp[5bc0d1a9ebae] <= __libc_start_main[70fd6b82a304])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

The text was updated successfully, but these errors were encountered:

honggyukim · 2024-08-18T14:26:52Z

Here is the backtrace.

$ gdb -q --args uftrace record --no-libcall -e ./llama-cli
Reading symbols from uftrace...
(gdb) r

Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723991141

Thread 2.1 "llama-cli" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e868c0 (LWP 27737)]
0x000055555568bdc2 in llama_load_model_from_file (path_model=0x5555558c06e0 "models/7B/ggml-model-f16.gguf", params=...) at src/llama.cpp:16936
16936           struct llama_model_params   params) {
(gdb) bt
#0  0x000055555568bdc2 in llama_load_model_from_file (path_model=0x5555558c06e0 "models/7B/ggml-model-f16.gguf", params=...) at src/llama.cpp:16936
#1  0x000055555572778e in llama_init_from_gpt_params (params=...) at common/common.cpp:2107
#2  0x0000555555590392 in main (argc=<optimized out>, argv=<optimized out>) at examples/main/main.cpp:210

namhyung · 2024-08-26T04:36:00Z

Looks like a memory corruption on the stack. I've added a debug message and the problem was gone. :(

namhyung · 2024-08-26T05:29:03Z

It seems the llama_load_model_from_file passes a large struct llama_model_param by value and it's overwritten somewhere..

namhyung · 2024-08-26T06:57:40Z

I think I found the reason.. Strangely, it seems the llama binary didn't followed the x86_64 calling convention and used r10 register across a function call. Here's the disassembly of the llama_load_model_from_file:

(gdb) disas
Dump of assembler code for function llama_load_model_from_file(char const*, llama_model_params):
Address range 0x555555686f80 to 0x555555688d7a:
   0x0000555555686f80 <+0>:	push   %rbp
   0x0000555555686f81 <+1>:	mov    %rsp,%rbp
   0x0000555555686f84 <+4>:	push   %r15
   0x0000555555686f86 <+6>:	push   %r14
   0x0000555555686f88 <+8>:	push   %r13
   0x0000555555686f8a <+10>:	push   %r12
   0x0000555555686f8c <+12>:	push   %r10
   0x0000555555686f8e <+14>:	lea    0x10(%rbp),%r10
   0x0000555555686f92 <+18>:	push   %rbx
   0x0000555555686f93 <+19>:	sub    $0x1e0,%rsp
   0x0000555555686f9a <+26>:	call   *0x16dd28(%rip)        # 0x5555557f4cc8   (mcount)
   0x0000555555686fa0 <+32>:	mov    0x20(%r10),%r15
   0x0000555555686fa4 <+36>:	mov    0x28(%r10),%rax
   0x0000555555686fa8 <+40>:	mov    %r10,-0x1e0(%rbp)
   0x0000555555686faf <+47>:	mov    %rdi,-0x1d8(%rbp)
   0x0000555555686fb6 <+54>:	mov    %rax,-0x208(%rbp)
   0x0000555555686fbd <+61>:	mov    %r15,-0x200(%rbp)
   0x0000555555686fc4 <+68>:	call   0x5555555ad010 <ggml_time_init>
  ...

The r10 register was set at offset 14, but used after calling mcount at offset 32 and 36. The value of r10 was overwritten during mcount because it's a temporary register. I think compiler should generate code to save r10 before offset 26. Probably this is a bug in the compiler but we cannot simply blame it and done with it.

The following patch seems to work well for -e at least. And I found I did the same for the dynamic tracing already. :)

diff --git a/arch/x86_64/mcount.S b/arch/x86_64/mcount.S
index dd32b0a6..8f0f07e0 100644
--- a/arch/x86_64/mcount.S
+++ b/arch/x86_64/mcount.S
@@ -57,9 +57,13 @@ GLOBAL(mcount)
 
        /* save rax (implicit argument for variadic functions) */
        push %rax
+       push %r10
+       push %r11
 
        call mcount_entry
 
+       pop  %r11
+       pop  %r10
        pop  %rax
 
        /* restore original stack pointer */

honggyukim · 2024-08-26T10:13:39Z

Thanks for the investigation.

I see it works fine with -e, but still gets crashed even with your fix about r10 and r11 when -e is unused.

$ uftrace record --no-libcall ./llama-cli
Log start
main: build = 3601 (2339a0be)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1724667048
gguf_init_from_file: failed to open 'models/7B/ggml-model-f16.gguf': 'No such file or directory'
WARN: process crashed by signal 11: Segmentation fault (si_code: 128)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-15-gf8a7 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [8] (gguf_init_from_file[56071b6ece4d] <= llama_model_loader::llama_model_loader[56071b7cf28d])
WARN: [7] (llama_model_loader::llama_model_loader[56071b7cee08] <= llama_load_model_from_file[56071b794731])
WARN: [6] (llama_load_model_from_file[56071b794354] <= llama_init_from_gpt_params[56071b829c2c])
WARN: [5] (llama_init_from_gpt_params[56071b829b90] <= main[56071b689904])
WARN: [4] (log_dump_cmdline_impl[56071b88c43d] <= main[56071b6893a5])
WARN: [3] (main[56071b68910a] <= __libc_start_call_main[7fc4d2a29d90])
WARN: [2] (_GLOBAL__sub_I_common.cpp[56071b687fd4] <= __libc_start_main[7fc4d2a29ebb])
WARN: [1] (__static_initialization_and_destruction_0[56071b687d32] <= mcount_return[7fc4d31b6007])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[56071b687fae] <= __libc_start_main[7fc4d2a29ebb])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

namhyung · 2024-08-26T16:18:24Z

How about this too?

 ENTRY(mcount_return)
        .cfi_startproc
-       sub $48, %rsp
-       .cfi_def_cfa_offset 48
-
-       movq   %rdi,  32(%rsp)
+       sub $96, %rsp
+       .cfi_def_cfa_offset 96
+
+       /* save all caller-saved registers due to -fipa-ra */
+       movq   %r11,  80(%rsp)
+       movq   %r10,  72(%rsp)
+       movq   %r9,   64(%rsp)
+       movq   %r8,   56(%rsp)
+       movq   %rdi,  48(%rsp)
+       movq   %rsi,  40(%rsp)
+       movq   %rcx,  32(%rsp)
+
+       /* below are used to carry return value */
        movdqu %xmm0, 16(%rsp)
        movq   %rdx,   8(%rsp)
        movq   %rax,   0(%rsp)
@@ -113,14 +129,21 @@ ENTRY(mcount_return)
        movq    0(%rsp), %rsp
 
        /* restore original return address in parent */
-       movq    %rax, 40(%rsp)
+       movq    %rax, 88(%rsp)
 
        movq    0(%rsp), %rax
        movq    8(%rsp), %rdx
        movdqu 16(%rsp), %xmm0
-       movq   32(%rsp), %rdi
 
-       add $40, %rsp
+       movq   32(%rsp), %rcx
+       movq   40(%rsp), %rsi
+       movq   48(%rsp), %rdi
+       movq   56(%rsp), %r8
+       movq   64(%rsp), %r9
+       movq   72(%rsp), %r10
+       movq   80(%rsp), %r11
+
+       add    $88, %rsp
        .cfi_def_cfa_offset 8
        retq
        .cfi_endproc

honggyukim · 2024-08-27T06:00:38Z

Thanks but still not working.

$ uftrace record --no-libcall ./llama-cli
Log start
main: build = 3601 (2339a0be)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1724738400
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-17-g06b3a13 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [6] (llama_load_model_from_file[55fbf4d99354] <= llama_init_from_gpt_params[55fbf4e2ec2c])
WARN: [5] (llama_init_from_gpt_params[55fbf4e2eb90] <= main[55fbf4c8e904])
WARN: [4] (log_dump_cmdline_impl[55fbf4e9143d] <= main[55fbf4c8e3a5])
WARN: [3] (main[55fbf4c8e10a] <= __libc_start_call_main[7fe7fca29d90])
WARN: [2] (_GLOBAL__sub_I_common.cpp[55fbf4c8cfd4] <= __libc_start_main[7fe7fca29ebb])
WARN: [1] (__static_initialization_and_destruction_0[55fbf4c8cd32] <= mcount_return[7fe7fd1cafff])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[55fbf4c8cfae] <= __libc_start_main[7fe7fca29ebb])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

namhyung · 2024-08-27T06:56:31Z

For some reason, I cannot reproduce it within gdb.

honggyukim · 2024-08-31T14:21:57Z

That's weird. Do you mean you cannot reproduce it only when gdb is used?

namhyung · 2024-09-01T23:42:58Z

Yep, I can reproduce it without gdb. Anyway, I think we can merge what we found so far.

A couple of fixes and update during investigation of #1949. Signed-off-by: Namhyung Kim <namhyung@gmail.com>

honggyukim · 2024-09-02T01:40:39Z

Hmm.. the second change at #1949 (comment) doesn't fix anything. It fails even with and without -e option while the first change makes it work with -e at least.

namhyung · 2024-09-02T05:19:00Z

Yeah but I think it should be added together.

honggyukim added bug high priority labels Aug 18, 2024

honggyukim mentioned this issue Aug 18, 2024

WARN: child terminated by signal: 11: Segmentation fault fault for ( C++ ) #1948

Open

namhyung added a commit that referenced this issue Sep 2, 2024

Merge branch 'mcount-update'

9d88589

A couple of fixes and update during investigation of #1949. Signed-off-by: Namhyung Kim <namhyung@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARN: Segmentation fault: address not mapped in llama.cpp #1949

WARN: Segmentation fault: address not mapped in llama.cpp #1949

honggyukim commented Aug 18, 2024

honggyukim commented Aug 18, 2024

namhyung commented Aug 26, 2024

namhyung commented Aug 26, 2024

namhyung commented Aug 26, 2024

honggyukim commented Aug 26, 2024

namhyung commented Aug 26, 2024

honggyukim commented Aug 27, 2024

namhyung commented Aug 27, 2024

honggyukim commented Aug 31, 2024

namhyung commented Sep 1, 2024

honggyukim commented Sep 2, 2024

namhyung commented Sep 2, 2024

WARN: Segmentation fault: address not mapped in llama.cpp #1949

WARN: Segmentation fault: address not mapped in llama.cpp #1949

Comments

honggyukim commented Aug 18, 2024

honggyukim commented Aug 18, 2024

namhyung commented Aug 26, 2024

namhyung commented Aug 26, 2024

namhyung commented Aug 26, 2024

honggyukim commented Aug 26, 2024

namhyung commented Aug 26, 2024

honggyukim commented Aug 27, 2024

namhyung commented Aug 27, 2024

honggyukim commented Aug 31, 2024

namhyung commented Sep 1, 2024

honggyukim commented Sep 2, 2024

namhyung commented Sep 2, 2024