-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM ERROR: Broken module found, compilation aborted! when building Firefox with full LTO #68929
Comments
The verifier error already occurs on |
Running something like The relevant allocas look like this:
I'm not sure how to debug this further. |
This is also breaking some of our ThinLTO builds (8840da2#commitcomment-129914039) Could it be reverted while the fix is investigated? It looks like somehow the debug declare and alloca get out of sync when the IRMover/IRLinker is invoked? For regular LTO like in the case described here, the IRMover is invoked from LTO each time a module is linked: llvm-project/llvm/lib/LTO/LTO.cpp Line 995 in 844c731
|
@teresajohnson Feel free to revert, I can't do it myself right now. |
@llvm/issue-subscribers-debuginfo Author: Mike Hommey (glandium)
This didn't happen before 8840da2.
STR:
Cc: @nikic (It's also unfortunate that it doesn't tell what specific module is broken) |
I've reverted the patch in a72d88f. I've also reduced the issue to these two files:
And then to the following reasonably minimal test case with the files at https://gist.github.com/nikic/ed40c6ddfaac1a3f32b9e8e4f862e9da:
The cause is pretty obvious now: We have a Linking then picks one of those, which is then incompatible with the alloca and verification complains. Question to the debuginfo folks: What is the expected behavior in this case? Does that just mean verification is just fundamentally doomed due to possible ODR violations in debuginfo? Or should incompatible type definitions not be merged -- somehow, I have no idea if this is even possible? |
…ragment size Reapply now that generation of incorrect debuginfo for FnDef in rustc has been fixed. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in #64149. Differential Revision: https://reviews.llvm.org/D158743
@adrian-prantl @JDevlieghere @pogo59 - I /think/ historically we've let some of these things cause silent problems (like bad/weird debug info, but not a build/link break)), but hard breaks on ODR violations have been avoided? Though I think we could go the other way, if the errors were more meaningful - like perhaps we could verify that merged types have the same size, if that's important (which it is)? But that has to happen during the merging - not sure if we have an option for "extra verification" during merging or not - maybe it's cheap enough to do all the time. Such a failure should mention which input files the inconsistent definitions come from, their name, and probably/ideally, their source location (header file+line number). It's not perfect/not enough in some cases (eg: the same type, from the same header, but due to contextual preprocessor definitions or other compiler flags, the definitions come out with different sizes) but hopefully enough to make it actionable. (total aside: Could be nice to have a "real" ODR checking of some kind - at least Google dropped ODR checking when we switched from GCC to Clang due to the two compilers having a different definition of where a class started (& the ODR checking in |
The compiler is allowed to assume that ODR violations don't happen and I don't think we ever went out of our way to provide any guarantees about the behavior if they do.
You would have to, e.g., build a global DenseSet of all "name-addressed" types that is shared between all MetadataReaders and perform some form of checking (e.g., size, number of members, declcontext) there. I dont' know if we currently deserialize duplicate types from Bitcode, but if we do so already, it may not even be very expensive. |
ODR doesn't apply to types in different compilation units, and presumably, there's no function calls between these that involve the duplicate but different type. Debug info should also be able to discriminate the types (and it does when you don't use LTO) |
I believe it applies to the whole program - all the code that's linked together. To quote cppreference (in lieu of the actual spec, but it's pretty accurate):
There's some other requirements, but those are the classic ones that are relevant here - certainly a type with a different number or size of members would be a violation of this part of the ODR.
Doesn't seem like a safe assumption to me. ODR violations can cause all sorts of quite silent problems/corruption/confusion. (but, yes, most of these ODR violations that survive into a code base for long probably are "safe"-ish - but it's still not valid C++ code)
Incidentally, it doesn't discriminate even outside of LTO: with Essentially, LLVM's DWARF emission depends on the C++ ODR for both LTO and non-LTO (if you're using type units) and has done for a decade or so. It's really important for type deduplication/size optimizations for DWARF - and it's faster to do it based on the name than doing a full structural comparison to decide whether they should be deduplicated. But just checking the size would be enough to fail on the link/merge would probably be cheap enough to implement & diagnose at least some cases of ODR violations & not lead to downstream failures or excessive debug info/slower links/compiles. |
Tag @MaskRay who proposed an ODR checker approach in lld here: https://maskray.me/blog/2022-11-13-odr-violation-detection |
I remember seeing licensee reports on this before (the fragment-size error) which we tracked down to, yes, ODR violation. Licensee fixed their code and we took a request for an ODR checker. We thought about doing something like having the IR linker do a structural-equivalence test, or at least a size check, when it sees debug-info types with the same (linkage) name. But it seems a bit expensive to do all the time? @jmorse am I imagining that someone was toying with an ODR detector internally? |
This didn't happen before 8840da2.
STR:
(unfortunately large, this is Firefox)
cd repr; ld.lld @response.txt
Cc: @nikic
(It's also unfortunate that it doesn't tell what specific module is broken)
The text was updated successfully, but these errors were encountered: