Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack unwinding is broken in Linux 6.4 on x86-64 due to ORC format change #303

Closed
osandov opened this issue Jun 8, 2023 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@osandov
Copy link
Owner

osandov commented Jun 8, 2023

$ python3 -m vmtest.vm -k '6.4.*' python3 -Bm drgn
...
>>> prog.stack_trace(1)
Traceback (most recent call last):
  File "/usr/lib64/python3.11/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
Exception: unknown ORC entry type 3

This is because torvalds/linux@fb79944 changed the ORC entry format: the type is now 3 bits, and types now range from 0-4. torvalds/linux@ffb1b4a also previously changed the ORC format in a way that we didn't detect.

We might be able to guess the "version" of the format, but let's see if we can make upstream include a version identifier first.

@osandov osandov added the bug Something isn't working label Jun 8, 2023
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jun 8, 2023
Commits ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change. Using the
kernel version is not a solution because distros frequently backport
changes.

It suffices to store a version number for the ORC format in the vmlinux
and kernel module ELF files (to use when parsing ORC sections from ELF),
and in kernel memory (to use when parsing ORC from a core dump). This
patch adds both of these by creating an .orc_header ELF section
containing a 4-byte version number and the corresponding
__start_orc_header and __stop_orc_header symbols.

The current version number is 3. Version 1 is the original version
merged in commit ee9f8fc ("x86/unwind: Add the ORC unwinder").
Version 2 is the version from commit ffb1b4a ("x86/unwind/orc: Add
'signal' field to ORC metadata"), which obviously didn't include this
header but could get it in a backport to the 6.3 stable branch.

1: https://github.com/osandov/drgn
2: osandov/drgn#303

Signed-off-by: Omar Sandoval <osandov@fb.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jun 13, 2023
Commits ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.

It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.

This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.

1: https://github.com/osandov/drgn
2: osandov/drgn#303

Signed-off-by: Omar Sandoval <osandov@fb.com>
@brenns10
Copy link
Contributor

According to the docs, ORC should not be used unless either (a) DRGN_PREFER_ORC_UNWINDER=1, or (b) there is no DWARF call frame information. I'm curious why the vmtest encountered this issue? The vmtest kernels have DWARF info, so do they run with DRGN_PREFER_ORC_UNWINDER=1 set?

@osandov
Copy link
Owner Author

osandov commented Jun 15, 2023

drgn falls back to using ORC if it can't find DWARF CFI for a given program counter. Most kernel stack traces end up with a frame without DWARF CFI because the kernel entry code is in assembly. For example, in this stack trace:

>>> prog.stack_trace(1)
#0  context_switch (./kernel/sched/core.c:5299:2)
#1  __schedule (./kernel/sched/core.c:6612:8)
#2  schedule (./kernel/sched/core.c:6688:3)
#3  do_wait (./kernel/exit.c:1631:4)
#4  kernel_wait4 (./kernel/exit.c:1775:8)
#5  __do_sys_wait4 (./kernel/exit.c:1803:13)
#6  do_syscall_x64 (./arch/x86/entry/common.c:50:14)
#7  do_syscall_64 (./arch/x86/entry/common.c:80:7)
#8  entry_SYSCALL_64+0xae/0x1aa (./arch/x86/entry/entry_64.S:120)
#9  0x7f00391eac37

entry_SYSCALL_64 doesn't have DWARF, so we used ORC and recovered the saved userspace registers as frame #9.

@brenns10
Copy link
Contributor

Ah interesting, I hadn't gotten the idea that there's sometimes not DWARF for the function at the bottom of the stack. Maybe I should try out dwarfdump to see what data is present for what functions, it looks capable of doing that.

akiyks pushed a commit to akiyks/linux that referenced this issue Jun 20, 2023
Commits ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.

It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.

This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.

1: https://github.com/osandov/drgn
2: osandov/drgn#303

Fixes: ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC metadata")
Fixes: fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in two")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
@osandov
Copy link
Owner Author

osandov commented Jun 22, 2023

Closed by 91ede0c. Hopefully https://lore.kernel.org/linux-debuggers/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com/ is merged so that the terrible kernel version check is limited.

@osandov osandov closed this as completed Jun 22, 2023
@osandov
Copy link
Owner Author

osandov commented Jun 27, 2023

My patch was merged in torvalds/linux@b9f174c.

Whissi pushed a commit to Whissi/linux-stable that referenced this issue Jun 28, 2023
[ Upstream commit b9f174c ]

Commits ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.

It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.

This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.

1: https://github.com/osandov/drgn
2: osandov/drgn#303

Fixes: ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC metadata")
Fixes: fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in two")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
haitaohuang pushed a commit to haitaohuang/linux that referenced this issue Jul 8, 2023
Commits ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.

It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.

This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.

1: https://github.com/osandov/drgn
2: osandov/drgn#303

Fixes: ffb1b4a ("x86/unwind/orc: Add 'signal' field to ORC metadata")
Fixes: fb79944 ("x86,objtool: Split UNWIND_HINT_EMPTY in two")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants