-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThinLTO generates bad DWARF for libtest #45511
Comments
cc'ing @tromey since he's been poking around in LLVM's DWARF generation code recently. |
Sorta smells like an LLVM bug -- but hard to say. Who should investigate? |
triage: P-medium @alexcrichton if you are able to investigate as part of ThinLTO work would be great. |
It's likely that this is the cause of #45731, which even if the underlying segfault were fixed would likely prevent backtraces from happening on OSX. |
This is unfortunately not "automagically fixed" if we upgrade to LLVM 5 |
Making progress. First up, minimized Rust source code via creduce to get:
Next up let's see if we can reduce it to one object file:
Ok looks like only one object file is invalid! Let's take a look the LLVM IR:
Ok let's see what's happening by minimizing the LLVM IR:
yielding this IR which I cleaned up a bit. I checked out LLVM trunk, frobbed the IR a bit, and looks like the error is still on LLVM trunk. |
@rust-lang/compiler I'm not sure at this point if this is an LLVM bug, a rustc bug, or a ThinLTO bug. I've managed to extract what I believe is a "minimal LLVM IR" for this which reproduces the error. I'm hoping fixing that IR will fix all other related issues to this at this point in time. For those of you more familiar with debug information (probably everyone other than me!) mind taking a look at that IR? Does it look "obviously wrong"? If it's "obviously wrong" I can try to work backwards to figure out what piece of the pipeline introduced such a wrong piece, but if it looks correct I can open an LLVM bug saying that we think it may be broken (and maybe they'll say it's "obviously wrong"!) |
Er, I went ahead and opened an upstream bug to go ahead and broaden the search. |
The DWARF is definitely badly formed: it's using a unit-relative offset to reference the parameter type, when it should be using a section-relative offset because the type is in a different unit. The bad offset is occurring for a parameter to subprogram The problem is that subprograms |
@philipc thanks for the info! I know basically nothing about DWARF though and malformed whatnots, so the IR here is heavily edited by both machines (creduce/bugpoint) and me (to make it more readable). My assumption is that if LLVM accepts IR and generates an object file it shouldn't yield such a dwarfdump error, but are you thinking that given possibly valid LLVM IR the dwarfdump error is possible? If so that may make reduction far more difficult... Or do you think that this looks like upstream LLVM needs to get updated? |
I don't think the editing of the IR caused the problem. I'll check later, but I think it's quite likely that the same problem occurs for the unedited IR. I don't know if the LLVM developers intend for this to be valid IR, or if they are missing a validation check somewhere. I think they may have never encountered this possibility. clang divides code into units based on files, so I don't think it ever splits code in the same scope across multiple units, whereas I think rust splits code into units quite differently. That's a bit of conjecture on my part though, it might be better to ask someone who knows more (I only know about DWARF, not LLVM or rustc). Reading the LLVM code a bit more, I think it will be possible to fix there by stopping it from sharing the scope (the scope creation is initiated here and the sharing check is here). It's also worth reconsidering how rust splits code into units though. It might make sense to ensure all code for a given type is kept in the same unit. |
Ah ok that all makes sense. Note though that rustc is very much like clang when it comes to debuginfo/codegen units. The major difference here is that LLVM is doing inlining across codegen units via ThinLTO, and that's likely where these new situations are arising from. |
I see, yep my understanding of how rust partitions units was wrong. So when using ThinLTO, the additional units added by ThinLTO only contain information needed for the additional inlined functions? Do these inlined functions always share scopes/types with unit
|
@philipc interesting! I'd have to look into the internals of ThinLTO to know for sure though. It seems like LLVM isn't ready (maybe?) for multiple codegen units in debug info in one LLVM module (a codegen unit), but it makes sense to me what you're thinking about having all inlined debug info share the same codegen unit information. Now where that change would go in LLVM I'm not entirely sure... |
Right now ThinLTO is generating bad dwarf which is tracked by rust-lang#45511, but this is causing issues on OSX (rust-lang#45768) where `dsymutil` is segfaulting and failing to produce output. Closes rust-lang#45768
Does firefox use fat LTO? There's been a couple of bugs that have some similarities to this (#44412 and maybe #41775). Also note that absence of dwarfdump errors doesn't mean the offsets are all okay. dwarfdump is only complaining if the offset is larger than the unit length, but even if it's less it may still incorrectly be a local ref. So there may be more instances that are unnoticed. |
cc @luser ^ |
I can reproduce using normal LTO. Library crate is:
Binary crate is:
Also here's a thin LTO version that gives smaller IR than the previous Vec test case:
|
Thanks @philipc! It definitely sounds like at this point this is an existing bug we've just never hit before. My inclination would be to paper over this if we can. Unfortunately though I don't know how to fix this at the LLVM IR layer. My naive idea would be to just point everything at one codegen unit for each LLVM module (debug-info-wise) but I don't know how to change existing debuginfo of a module. |
I'm not able to access llvm.org currently, so not sure exactly what Adrian's suggestions were, but here's a patch that fixes LLVM to use a global ref when needed. It's not as simple as comparing parent DIEs, because the variable is initially created without a parent, so the check defaults to the current CU. I don't have a lot of confidence in the patch, but maybe it's a useful starting point for someone who knows more about LLVM. |
We do use normal rustc LTO for Firefox builds, yes. |
Sure, I will do this. I think the issue is more an LLVM thing than a DWARF thing, though; not certain I know enough there to say whether the patch is good, but I will at least do the shepherding. Also I wanted to point out that @philipc knows quite a bit about DWARF; he's one of the principals on https://github.com/gimli-rs/gimli |
Actually maybe there's nothing to do here except merge this into rust-llvm? https://reviews.llvm.org/D39981 |
Yes, I think so. I'll do that. |
Cool, thanks! |
@tromey oh no worries! I've tested it locally now and will submit a PR soon (with another change or two) |
Ok, thanks @alexcrichton |
I've opened up #46008 which I think should close this out (and also re-enables ThinLTO in a few locations) |
This commit updates LLVM to fix rust-lang#45511 (https://reviews.llvm.org/D39981) and also reenables ThinLTO for libtest now that we shouldn't hit rust-lang#45768. This also opportunistically enables ThinLTO for libstd which was previously blocked (rust-lang#45661) on test failures related to debuginfo with a presumed cause of rust-lang#45511. Closes rust-lang#45511
rustbuild: Update LLVM and enable ThinLTO This commit updates LLVM to fix #45511 (https://reviews.llvm.org/D39981) and also reenables ThinLTO for libtest now that we shouldn't hit #45768. This also opportunistically enables ThinLTO for libstd which was previously blocked (#45661) on test failures related to debuginfo with a presumed cause of #45511. Closes #45511
This commit updates LLVM to fix rust-lang#45511 (https://reviews.llvm.org/D39981) and also reenables ThinLTO for libtest now that we shouldn't hit rust-lang#45768. This also opportunistically enables ThinLTO for libstd which was previously blocked (rust-lang#45661) on test failures related to debuginfo with a presumed cause of rust-lang#45511. Closes rust-lang#45511
rustbuild: Update LLVM and enable ThinLTO This commit updates LLVM to fix #45511 (https://reviews.llvm.org/D39981) and also reenables ThinLTO for libtest now that we shouldn't hit #45768. This also opportunistically enables ThinLTO for libstd which was previously blocked (#45661) on test failures related to debuginfo with a presumed cause of #45511. Closes #45511
MIR: adopt borrowck test Fix trailing whitespace span_bug! on unexpected action Make RegionVid use newtype_index! Closes rust-lang#45843 Check rvalue aggregates during check_stmt in tycheck, add initial, (not passing) test Fix failing test Remove attributes and test comments accidentally left behind, add in span_mirbugs Normalize LvalueTy for ops and format code to satisfy tidy check only normalize operand types when in an ADT constructor avoid early return handle the active field index in unions normalize types in ADT constructor Fixes rust-lang#45940 Fix borrowck compiler errors for upvars contain "spurious" dereferences Fixes rust-lang#46003 added associated function Box::leak Box::leak - improve documentation Box::leak - fixed bug in documentation Box::leak - relaxed constraints wrt. lifetimes Box::leak - updated documentation Box::leak - made an oops, fixed now =) Box::leak: update unstable issue number (46179). Add test for rust-lang#44953 Add missing Debug impls to std_unicode Also adds #![deny(missing_debug_implementations)] so they don't get missed again. Amend RELEASES for 1.22.1 and fix the date for 1.22.0 Rename param in `[T]::swap_with_slice` from `src` to `other`. The idea of ‘source’ and ‘destination’ aren’t very applicable for this operation since both slices can both be considered sources and destinations. Clarify stdin behavior of `Command::output`. Fixes rust-lang#44929. Add hints for the case of confusing enum with its variants Add failing testcases Add module population and case of enum in place of expression Use for_each_child_stable in find_module Use multiline text for crate conflict diagnostics Make float::from_bits transmute (and update the documentation to reflect this). The current implementation/documentation was made to avoid sNaN because of potential safety issues implied by old/bad LLVM documentation. These issues aren't real, so we can just make the implementation transmute (as permitted by the existing documentation of this method). Also the documentation didn't actually match the behaviour: it said we may change sNaNs, but in fact we canonicalized *all* NaNs. Also an example in the documentation was wrong: it said we *always* change sNaNs, when the documentation was explicitly written to indicate it was implementation-defined. This makes to_bits and from_bits perfectly roundtrip cross-platform, except for one caveat: although the 2008 edition of IEEE-754 specifies how to interpet the signaling bit, earlier editions didn't. This lead to some platforms picking the opposite interpretation, so all signaling NaNs on x86/ARM are quiet on MIPS, and vice-versa. NaN-boxing is a fairly important optimization, while we don't even guarantee that float operations properly preserve signalingness. As such, this seems like the more natural strategy to take (as opposed to trying to mangle the signaling bit on a per-platform basis). This implementation is also, of course, faster. Simplify an Iterator::fold to Iterator::any This method of once-diagnostics doesn't allow nesting UI tests extract the regular output from the 'rendered' field in json Merge cfail and ui tests into ui tests Add a MIR pass to lower 128-bit operators to lang item calls Runs only with `-Z lower_128bit_ops` since it's not hooked into targets yet. Include tuple projections in MIR tests Add type checking for the lang item As part of doing so, add more lang items instead of passing u128 to the i128 ones where it doesn't matter in twos-complement. Handle shifts properly * The overflow-checking shift items need to take a full 128-bit type, since they need to be able to detect idiocy like `1i128 << (1u128 << 127)` * The unchecked ones just take u32, like the `*_sh?` methods in core * Because shift-by-anything is allowed, cast into a new local for every shift incr.comp.: Make sure we don't lose unused green results from the query cache. rustbuild: Update LLVM and enable ThinLTO This commit updates LLVM to fix rust-lang#45511 (https://reviews.llvm.org/D39981) and also reenables ThinLTO for libtest now that we shouldn't hit rust-lang#45768. This also opportunistically enables ThinLTO for libstd which was previously blocked (rust-lang#45661) on test failures related to debuginfo with a presumed cause of rust-lang#45511. Closes rust-lang#45511 std: Flag Windows TLS dtor symbol as #[used] Turns out ThinLTO was internalizing this symbol and eliminating it. Worse yet if you compiled with LTO turns out no TLS destructors would run on Windows! The `#[used]` annotation should be a more bulletproof implementation (in the face of LTO) of preserving this symbol all the way through in LLVM and ensuring it makes it all the way to the linker which will take care of it. Add enum InitializationRequiringAction Fix tidy tests
Hi, I get a Dwarf Error with GDB 8.0.1 on one of the Is it related to this issue or should I open a new one?
Relevant build flags:
|
@da-x yeah I believe that's this issue, it should be fixed in 1.24 though! |
nightly-2017-10-24 contains a libtest with bad DWARF:
I can reproduce this if I recompile libtest locally:
The problem can be fixed by disabling thinlto (and keeping the codegen units is ok).
I noticed this with nightly-2017-10-24 because it is the first nightly compiled with thinlto, but the problem also occurs if I compile libtest with thinlto enabled using earlier compilers (eg nightly-2017-10-17).
Inspecting the DWARF, a DIE at that local ref offset does exist in another codegen unit, so the problem seems to be that we should be using a global ref instead.
The text was updated successfully, but these errors were encountered: