Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matching on functions with similar call stacks breaks optimized backtraces even with debuginfo #134909

Open
jyn514 opened this issue Dec 30, 2024 · 13 comments
Labels
A-backtrace Area: Backtraces A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jyn514
Copy link
Member

jyn514 commented Dec 30, 2024

I tried this code:

//@ compile-flags: -O -C debuginfo=full

fn main() {
    let x = String::from("unwrap_result");
    match &*x {
        "unwrap_result" => unwrap_result(),
        // Commenting out this line fixes the problem.
        "expect_result" => expect_result(),
        _ => {}
    }
}

// This function should appear in the backtrace.
fn unwrap_result() {
    Err(()).unwrap()
}

fn expect_result() {
    // Changing this line fixes the problem.
    Err(()).expect("oops");
}

I expected to see this happen:

stack backtrace:
   0: rust_begin_unwind
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/std/src/panicking.rs:676:5
   1: core::panicking::panic_fmt
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/core/src/result.rs:1704:5
   3: core::result::Result<T,E>::unwrap
             at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1109:23
   4: main::unwrap_result
             at ./src/main.rs:14:5
   5: main::main
             at ./src/main.rs:5:28
   6: core::ops::function::FnOnce::call_once
             at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5

Instead, this happened:

stack backtrace:
   0: rust_begin_unwind
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/std/src/panicking.rs:676:5
   1: core::panicking::panic_fmt
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/core/src/result.rs:1704:5
   3: main::main
   4: core::ops::function::FnOnce::call_once
             at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5

note how the line numbers have disappeared and the frames for unwrap_result and Result::unwrap have disappeared.

any of the following things fix the problem:

  • removing -O
  • commenting out the expect_result branch of the match
  • commenting out Err().expect() in the expect_result function. Note that this function is never called; changing it should not affect runtime behavior.
  • changing let x = String::from("unwrap_result") to let x = "unwrap_result". i suspect this lets llvm do constant folding or something like that?

note that i consider this a bug even though -O is present. the whole point of debuginfo is to work with inlined functions, and extremely similar programs that are optimized will have proper backtraces.

dwarfdump -i reports that there is no debuginfo at all present for unwrap_result; doing any of the 4 things above results in debuginfo being created. i verified with a local build of the compiler that this only affects the llvm backend, not cranelift.

Meta

rustc --version --verbose:

rustc 1.85.0-nightly (14ee63a3c 2024-12-29)
binary: rustc
commit-hash: 14ee63a3c651bb7a243c8b07333749ab4b152e13
commit-date: 2024-12-29
host: x86_64-unknown-linux-gnu
release: 1.85.0-nightly
LLVM version: 19.1.6

@rustbot label A-debuginfo A-backtrace A-llvm

@jyn514 jyn514 added the C-bug Category: This is a bug. label Dec 30, 2024
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-backtrace Area: Backtraces A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Dec 30, 2024
@jyn514
Copy link
Member Author

jyn514 commented Dec 30, 2024

as far as i can tell, this is the main difference in llvm ir between the two (other than the missing debuginfo). left is the broken code, right is working.

<   %bcmp.i = tail call i32 @bcmp(ptr noundef nonnull readonly dereferenceable(NNN) %_0.i.i.i.i.i.i, ptr noundef nonnull readonly dereferenceable(NNN) @alloc_c10dc9048f42cbacafeedc1a0ea109b6, i64 NNN), !dbg !NNN, !alias.scope !NNN
---
>   %bcmp.i = tail call i32 @bcmp(ptr noundef nonnull readonly dereferenceable(NNN) %_0.i.i.i.i.i.i, ptr noundef nonnull dereferenceable(NNN) @alloc_c10dc9048f42cbacafeedc1a0ea109b6, i64 NNN), !dbg !NNN

< bb1:                                              ; preds = %bb9
<   call void @llvm.lifetime.start.p0(i64 NNN, ptr nonnull %e.i.i29), !dbg !NNN
<     #dbg_value(i8 NNN, !NNN, !DIExpression(), !NNN)
<     #dbg_declare(ptr %e.i.i29, !NNN, !DIExpression(), !NNN)
<   br label %bb1.invoke
<
< bb1.invoke:                                       ; preds = %bb3, %bb1
<   %NNN = phi ptr [ @alloc_00ae4b301f7fab8ac9617c03fcbd7274, %bb1 ], [ @alloc_95ff39041e25dfd6a4b273866c4fba8e, %bb3 ]
<   %NNN = phi i64 [ NNN, %bb1 ], [ NNN, %bb3 ]
<   %NNN = phi ptr [ @alloc_923ee864efabd7cfbe01a92263f7c073, %bb1 ], [ @alloc_f96e2b8391de8a5af489db09f1e91d0c, %bb3 ]
< ; invoke core::result::unwrap_failed
<   invoke void @_ZN4core6result13unwrap_failed17h96517441667e9d2aE(ptr noalias noundef nonnull readonly align NNN %NNN, i64 noundef %NNN, ptr noundef nonnull align NNN %e.i.
i29, ptr noalias noundef nonnull readonly align NNN dereferenceable(NNN) @vtable.NNN, ptr noalias noundef nonnull readonly align NNN dereferenceable(NNN) %NNN) #NNN
<           to label %bb1.cont unwind label %bb7, !dbg !NNN
<
< bb1.cont:                                         ; preds = %bb1.invoke
<   unreachable
---
> ; invoke core::result::unwrap_failed
>   invoke void @_ZN4core6result13unwrap_failed17h96517441667e9d2aE(ptr noalias noundef nonnull readonly align NNN @alloc_00ae4b301f7fab8ac9617c03fcbd7274, i64 noundef NNN, p
tr noundef nonnull align NNN %e.i.i, ptr noalias noundef nonnull readonly align NNN dereferenceable(NNN) @vtable.NNN, ptr noalias noundef nonnull readonly align NNN dereferenceable(NNN) @alloc_923ee864efabd7cfbe01a92263f7c073) #NNN
>           to label %.noexc unwind label %bb5, !dbg !NNN
>
> .noexc:                                           ; preds = %bb1
>   unreachable

note that the working code has readonly set for the bcmp.i tail call, and that it has an extra bb1.invoke basic block.

probably some optimization pass that looks at readonly is failing to preserve debuginfo about stack frames it optimizes out?

@jyn514
Copy link
Member Author

jyn514 commented Dec 30, 2024

the only use of .invoke i see in the llvm/lib dir is src/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp, in MergeCompatibleInvokesImpl. that would explain why there have to be two functions for this bug to trigger.

@nikic
Copy link
Contributor

nikic commented Dec 30, 2024

The two unwrap calls get merged, so they will use the merged debug location. This fundamentally has to lose the information about which of unwrap_result or expect_result panicked, as they are one and the same after optimization. I don't think there's anything that can really be done here (apart from compiling without optimizations).

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Dec 30, 2024
@jyn514
Copy link
Member Author

jyn514 commented Dec 30, 2024

it makes sense to me the LLVM has to pick one of the two. But right now it seems to be picking neither: both of the frames involved are removed completely. shouldn’t it be showing one or the other?

@nikic
Copy link
Contributor

nikic commented Dec 30, 2024

it makes sense to me the LLVM has to pick one of the two. But right now it seems to be picking neither: both of the frames involved are removed completely. shouldn’t it be showing one or the other?

Picking one of the two would be incorrect, in case it's the other one that actually gets executed. This would result in a confusing backtrace where (for example) unwrap_result panicked, but the backtrace points to expect_result instead.

@jyn514
Copy link
Member Author

jyn514 commented Dec 30, 2024

that seems better to me than having no frames for that call at all? i agree it’s not correct but i'm confused why you think it’s worse than the current situation; at least it gives you some idea of what happened.

@programmerjake
Copy link
Member

would it be possible to list all functions that could be called for each backtrace frame, so instead of main::unwrap_result or main::expect_result, have something like (merged) main::unwrap_result, main::expect_result?

@jyn514
Copy link
Member Author

jyn514 commented Dec 31, 2024

I don’t think that would play very well with GDB and other tools that let you select frames by name, but maybe we could add two different DW_AT_name attributes to the DW_TAG_subprogram? i’m not sure if dwarf defines the behavior in that case, i would expect tools that don’t understand it to pick one at random. it would work fine for std’s backtrace printer though as long as we taught it to add the merged string in front.

@jyn514
Copy link
Member Author

jyn514 commented Dec 31, 2024

https://wiki.dwarfstd.org/ICF.md has an extension that uses .debug_dcall to disambiguate two merged functions based on the return address. maybe we could use that?

@workingjubilee
Copy link
Member

would the return address be different, here?

@workingjubilee
Copy link
Member

workingjubilee commented Dec 31, 2024

I wonder if we should treat merged functions as if one was inlined fully into the other one. (this doesn't really make sense at the source level, but)

@T-Dark0
Copy link

T-Dark0 commented Dec 31, 2024

Would it be possible/reasonable to replace merged frames with some brief explanation of what happened? For example, by printing the stacktrace in the OP as

stack backtrace:
   0: rust_begin_unwind
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/std/src/panicking.rs:676:5
   1: core::panicking::panic_fmt
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at /rustc/14ee63a3c651bb7a243c8b07333749ab4b152e13/library/core/src/result.rs:1704:5
   3: [function merged]
   4: [function merged]
   5: main::main
             at ./src/main.rs:5:28
   6: core::ops::function::FnOnce::call_once
             at /home/jyn/.local/lib/rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5

Bonus points for including any additional information we can manage, such as, say, the address of the function being merged. Further bonus points for showing which functions were merged, assuming that's implementable.

If nothing else, it seems slightly more intuitive than simply removing the function entirely. I was today years old when I found out backtraces can sometimes just skip frames, I expect if I'd found out by trying to debug one rather than by reading about this issue I'd have been extremely puzzled about what on earth was going on.

@workingjubilee
Copy link
Member

@T-Dark0 We, er, eliminated the data we could use to show that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backtrace Area: Backtraces A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants