Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DwarfCompileUnit::createAndAddScopeChildren gobbles stack #92724

Open
workingjubilee opened this issue May 20, 2024 · 15 comments
Open

DwarfCompileUnit::createAndAddScopeChildren gobbles stack #92724

workingjubilee opened this issue May 20, 2024 · 15 comments

Comments

@workingjubilee
Copy link
Contributor

By far the most common stack overflow bug report nowadays for rustc is rustc dying in LLVM while emitting debuginfo. Lately DwarfCompileUnit::createAndAddScopeChildren has seemed to acquire the power of recursing on itself and destroying the stack essentially without assistance. People are setting hundreds of megabytes for RUST_MIN_STACK to get their code to compile. That's... closer to our resident set size in most executions?

Unlike other reports like #76920 this has been happening with our stock rustc distribution, as far as I can tell, and our bundled LLVM.

@llvmbot
Copy link
Member

llvmbot commented May 20, 2024

@llvm/issue-subscribers-debuginfo

Author: Jubilee (workingjubilee)

By far the most common stack overflow bug report nowadays for rustc is rustc dying in LLVM while emitting debuginfo. Lately `DwarfCompileUnit::createAndAddScopeChildren` has seemed to acquire the power of recursing on itself and destroying the stack essentially without assistance. People are setting hundreds of megabytes for `RUST_MIN_STACK` to get their code to compile. That's... closer to our resident set size in most executions?

Unlike other reports like #76920 this has been happening with our stock rustc distribution, as far as I can tell, and our bundled LLVM.

@dwblaikie
Copy link
Collaborator

Some reproducer(s) and, if it's a regression, an example of this not being a problem on earlier versions, would be helpful.

@EugeneZelenko EugeneZelenko added the incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) label May 20, 2024
@workingjubilee
Copy link
Contributor Author

Textual LLVMIR from a... I can't call it "minimal", but it's a reproducer: createandaddsigsev.ll

It's hard to tell whether this is a "regression" or not, honestly. I hope the reason this is the most common segfault lately is because we've won the game of whack-a-mole with the others.

@EugeneZelenko EugeneZelenko removed the incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) label May 21, 2024
@dwblaikie
Copy link
Collaborator

Might end up leaving this to someone else to look at, but I got a flame graph just cpu-based, didn't look at the memory usage and I didn't see a deep stack related to createAndAddScopeChildren, but a few related to the LLVM IR verifier (calling getSubprogram on the debugloc on each instruction - so it has to recurse a few times up for deeply inlined code - that could be linearized int oa loop rather than recursion), and some other getSubprogram recursions related to DwarfDebug::collectEntityInfo... collectVariableInfoFromMFTable

I guess maybe the deep recursion happened too quick/didn't get sampled. valgrind detects the recursion and diagnoses llvm::DwarfCompileUnit::applyConcreteDbgVariableAttributes(llvm::Loc::MMI const&, llvm::DbgVariable const&, llvm::DIE&) (DwarfCompileUnit.cpp:891) (or points to it at least)

I guess this is some deep inlining or many nested lexical scopes? (I haven't looked at the IR to try to find the shape of it... ) - generally LLVM won't scale well with very large functions, though I guess a lot of inlining doesn't necessarily mean a large function and deeply inlined code only really impacts DWARF handling - outside of DWARF, everything related to inlined code just "goes away"...

@dwblaikie
Copy link
Collaborator

Hmm, in a non-optimized build of the compiler it's stack-overflowing in llvm::AsmPrinter::emitDwarfDIE instead.

I'm guessing this is some deeply recursive code?

@dwblaikie
Copy link
Collaborator

Hmm, nope, all these nestings appear to be lexical blocks...

What is this code? We /might/ be able to make LLVM work better with it, but it seems pretty weird/not sure we'd want to do a lot of work to support it without understanding it better...

@workingjubilee
Copy link
Contributor Author

Yeah, I mean, I wouldn't want you to do much more than the standard trick... I'd PR it myself except my C++ skills are bad enough I'd probably fuck it up somehow!... of hoisting recursion into iteration inside relevant functions, by having an inner function return the state of the "recursion" and then repeatedly calling that inner function to drive the "recursion" forward.

What is this code? We /might/ be able to make LLVM work better with it, but it seems pretty weird/not sure we'd want to do a lot of work to support it without understanding it better...

A Rust stack frame!

...in real programs, instead of this somewhat decontextualized example, this is "someone asked for full debuginfo and then declares MANY variables". The problem is that the breaking points get reached in programs people actually want to build rather than purely hypothetical ones. And not, like... absurd code like entire gate-by-gate CPU simulators, though I have gotten a report of that, but "ordinary" things like cryptography algorithms. My understanding is that yes, inlining does play some part in it.

But otherwise this is just a somewhat overly literalist interpretation of Rust.

Now, it might be that there's a better way to do this, but we started doing it this specific way because we had issues with getting values to be emitted into debuginfo in a way we can easily predict. Otherwise we might do something else.

It seems quite a bit has changed over time with LLVM's debug info handling, so if you have any recommendations as to how to reduce the frequency of this occurring on our side but which might require being aware of the last... 4 years of LLVM goings-on, I'm interested!

@dwblaikie
Copy link
Collaborator

Declaring a lot of variables shouldn't result in this sort of debug info - unless it's being generated somewhat esoterically.

lexical_blocks are for literal language-level lexical blocks like anything with "{}" in C++, roughly. Is rust using a separate lexical_block for every variable lifetime?

in C++, for instance, this code would have one lexical block:

int x = 3;
int a = 2;
{
  int x = 5;
  func();
  int y = 7;
  ...
}

Expressing the name overriding. One of the problems with this is that, at the call to func(), the debugger doesn't know that y doesn't exist yet - now there's a different, but unsupported-in-LLVM, DWARF feature that can be used to address this problem (DW_AT_scope_start).

But if rust is putting in new artificial scopes for every variable to avoid the "we're in the lexical scope, but the variable doesn't exist yet" problem - I'd /highly/ encourage not doing that, and accepting the less-than-ideal behavior that happens. Introducing a scope for every variable is likely not going to scale well with many pieces of the infrastructure here (compiler, object files, debuggers, symbolizers, etc)

Perhaps you could show me a smaller example of rust with a few variables, the source code, the LLVM IR, and the resulting dwarf (llvm-dwarfdump to dump it)?

@jryans
Copy link
Member

jryans commented May 30, 2024

I was curious about this, so I tried a small Rust example on CE:

#[no_mangle]
pub fn square(num: i32) -> i32 {
    let x = num;
    let y = num;
    x + y
}

There are indeed separate DILexicalBlocks in the IR for x and y:

!15 = !DILocalVariable(name: "x", scope: !16, file: !8, line: 3, type: !12, align: 4)
!16 = distinct !DILexicalBlock(scope: !7, file: !8, line: 3, column: 5)
!17 = !DILocalVariable(name: "y", scope: !18, file: !8, line: 4, type: !12, align: 4)
!18 = distinct !DILexicalBlock(scope: !16, file: !8, line: 4, column: 5)

But if rust is putting in new artificial scopes for every variable to avoid the "we're in the lexical scope, but the variable doesn't exist yet" problem - I'd /highly/ encourage not doing that, and accepting the less-than-ideal behavior that happens. Introducing a scope for every variable is likely not going to scale well with many pieces of the infrastructure here (compiler, object files, debuggers, symbolizers, etc)

Rust does seem to do this, but they are also not alone... Swift makes the same choice, as described in this 2023 blog post from @adrian-prantl and others.

@workingjubilee
Copy link
Contributor Author

Yes. Rust supports shadowing, so it's possible... even common, in some styles of code I've seen... to have like 6 different declarations of a variable named x, with three different types and twelve different values.

I don't know of another way to model that so a debugger can understand except by saying "fuck it, new scope time".

@dwblaikie
Copy link
Collaborator

I was curious about this, so I tried a small Rust example on CE:

#[no_mangle]
pub fn square(num: i32) -> i32 {
    let x = num;
    let y = num;
    x + y
}

There are indeed separate DILexicalBlocks in the IR for x and y:

!15 = !DILocalVariable(name: "x", scope: !16, file: !8, line: 3, type: !12, align: 4)
!16 = distinct !DILexicalBlock(scope: !7, file: !8, line: 3, column: 5)
!17 = !DILocalVariable(name: "y", scope: !18, file: !8, line: 4, type: !12, align: 4)
!18 = distinct !DILexicalBlock(scope: !16, file: !8, line: 4, column: 5)

But if rust is putting in new artificial scopes for every variable to avoid the "we're in the lexical scope, but the variable doesn't exist yet" problem - I'd /highly/ encourage not doing that, and accepting the less-than-ideal behavior that happens. Introducing a scope for every variable is likely not going to scale well with many pieces of the infrastructure here (compiler, object files, debuggers, symbolizers, etc)

Rust does seem to do this, but they are also not alone... Swift makes the same choice, as described in this 2023 blog post from @adrian-prantl and others.

Huh, fascinating - thanks for the link.

So, yeah, that does mean that the DWARF's going to grow, and recursion's going to get pretty deep even with a linear sequence of local variables.

I guess this isn't a problem you've had, @adrian-prantl ? Just haven't hit large enough inputs/enough local variables for it to be a problem?

@workingjubilee
Copy link
Contributor Author

workingjubilee commented May 31, 2024

One quirk that I have been noticing regarding Rust usage is that it has become an attractive target for code generators. And many of the breaking inputs (that I've gotten reports of) include partially generated or fully generated code, often extracted from a Rocq proof. And I mean, I can tell the person who "only" needed to double rustc's stack size "you'll have to try harder", but for the person who had to double it 4~5 times, well...

One of the problems with this is that, at the call to func(), the debugger doesn't know that y doesn't exist yet - now there's a different, but unsupported-in-LLVM, DWARF feature that can be used to address this problem (DW_AT_scope_start).

Would that address this issue mentioned in the Swift post while allowing us to bring things down to smaller numbers of lexical blocks?

In the example below, the local variable a is not yet in scope at the call site of getInt() and will only be available after it has been assigned. With the debug information produced by previous versions of the Swift compiler, the debugger might have displayed uninitialized memory as the contents of a at the call site of getInt(). In Swift 5.9, the variable a only becomes visible after it has been initialized:

@jryans
Copy link
Member

jryans commented May 31, 2024

One of the problems with this is that, at the call to func(), the debugger doesn't know that y doesn't exist yet - now there's a different, but unsupported-in-LLVM, DWARF feature that can be used to address this problem (DW_AT_scope_start).

Would that address this issue mentioned in the Swift post while allowing us to bring things down to smaller numbers of lexical blocks?

Potentially so (though there may be some limitation I'm missing with such an approach)... For future archaeologists, the attribute is DW_AT_start_scope (not DW_AT_scope_start). I hadn't noticed this DWARF feature before. It appears it's one of several DWARF features that isn't currently implemented by any of the major tools (it is not emitted by LLVM or GCC, and it is not used by LLDB or GDB). If it is potential solution conceptually, it would take some work throughout the ecosystem to make it usable.

@dwblaikie
Copy link
Collaborator

Yeah - DW_AT_start_scope wouldn't be cheap to add (as you say, due to the ecosystem impact/lack of support) and wouldn't be /as/ expressive as the use of fine grained lexical scopes (start scope would only give a single address - so how do you use that when a variable's values are in a fragmented scope (a scope with DW_AT_ranges) and/or a fragmented location list - which regions of that location list/fragmented scope come before the start_scope address and which ones come after?)

@workingjubilee
Copy link
Contributor Author

cc @khuey this might be a vaguely interesting thread if you aren't already aware of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants