-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious (?) error in "compile-fail\rfc-2126-extern-in-paths\single-segment.rs" etc #48116
Comments
The second victim (#47657) contains nothing useful in the log https://ci.appveyor.com/project/rust-lang/rust/build/1.0.6293/job/k1wbokf5c59o4ghj. |
Does |
There's some more information as well at the top of #47828 (comment), notably that this has been seen at least once on x86_64-apple-darwin. The fact that it's spurious yet deterministic about these three tests is pretty disturbing, and sort of points to maybe this being a nondeterministic miscompilation in librustc_resolve itself. I think it's pretty certain this was introduced via the LLVM 6 upgrade, but given the lack of ability to reproduce or anything else my preferred course of action here is:
@kennytm if this starts bouncing every other PR though we can certainly reconsider! |
@eddyb
Edit³: Nevermind, the ICE detection is functioning properly with #48048. The ICE in #48000 never failed with Edit⁴: The "hide ICE" thing should be fixed in #48127. |
1. When the invalid condition is hit, write out the relevant variables too 2. In compile-fail/parse-fail tests, check for ICE first, so the invalid error patterns won't mask our ICE output.
[WIP] Debug #48116. 1. When the invalid condition is hit, write out the relevant variables too 2. In compile-fail/parse-fail tests, check for ICE first, so the invalid error patterns won't mask our ICE output. r? @alexcrichton cc #48116, cc @eddyb
1. When the invalid condition is hit, write out the relevant variables too 2. In compile-fail/parse-fail tests, check for ICE first, so the invalid error patterns won't mask our ICE output.
Ok I've been debugging this with @eddyb today and we may (?) have made some progress. @eddyb has been able to deterministically reproduce the test failure on his machine and furthermore has a reduced test case which panics on his machine. Unfortunately I have been unable to reproduce this on my machine. What we have found though is also fascinating. After lots of sharing of IR we've found that this particular snippet of IR will optimize differently on his machine than on mine. (using the same version of LLVM!) This points to me as a memory access violation in LLVM or something like that. @eddyb, however, shared the literal LLVM binaries with me and I was again unable to reproduce on my machine! I did find out, however, that @eddyb was locally using clang 4 for compiling LLVM and I was using gcc 5.4.0. After switching to clang 4 locally I was able to reproduce the Given that we're seeing this failure across three builders, the two MinGW ones and 64-bit OSX my current suspicion is that this is basically just undefined behavior in LLVM itself, only exploited on newer version of the compilers we use to compile it than what I was using locally. This I think would explain the various symptoms of:
Unfortunately this still isn't a huge amount to go on. I'm going to try to hone in on the valgrind error here and see where that leads me. I'm sort of just praying at this point that it leads to the cause of these bugs. |
Bisection of the valgrind error locally points to llvm-mirror/llvm@f45aefe. |
1. When the invalid condition is hit, write out the relevant variables too 2. In compile-fail/parse-fail tests, check for ICE first, so the invalid error patterns won't mask our ICE output.
Unfortunately that LLVM commit does not cleanly revert, so I've manually reverted it instead. Reverting that commit makes the valgrind error go away locally for me, and we're currently confirming with @eddyb whether it fixes the miscompile locally. In the meantime though @eddyb also found that removing this |
Removed the `assume()` which we assumed is the cause of misoptimization in issue rust-lang#48116.
Try to fix 48116 and 48192 The bug #48116 happens because of a misoptimization of the `import_path_to_string` function, where a `names` slice is empty but the `!names.is_empty()` branch is executed. https://github.com/rust-lang/rust/blob/4d2d3fc5dadf894a8ad709a5860a549f2c0b1032/src/librustc_resolve/resolve_imports.rs#L1015-L1042 Yesterday, @eddyb had locally reproduced the bug, and [came across the `position` function](https://mozilla.logbot.info/rust-infra/20180214#c14296834) where the `assume()` call is found to be suspicious. We have *not* concluded that this `assume()` causes #48116, but given [the reputation of `assume()`](#45501 (comment)), this seems higher relevant. Here we try to see if commenting it out can fix the errors. Later @alexcrichton has bisected and found a potential bug [in the LLVM side](#48116 (comment)). We are currently testing if reverting that LLVM commit is enough to stop the bug. If true, this PR can be reverted (keep the `assume()`) and we could backport the LLVM patch instead. (This PR also includes an earlier commit from #48127 for help debugging ICE happening in compile-fail/parse-fail tests.) The PR also reverts #48059, which seems to cause #48192. r? @alexcrichton cc @eddyb, @arthurprs (#47333)
The valgrind violation seems harmless. It's from unsigned getNumBuckets() const {
return Small ? InlineBuckets : getLargeRep()->NumBuckets;
} What happens here is that clang happens to ultimately put the branch on That should be fine though, because See also the remark under "Preparing your program" on http://valgrind.org/docs/manual/quick-start.html which says that such bogus violations are expected at higher optimization levels. |
@alexcrichton I'm pretty sure that commit just made the jump threading pass find more opportunities to perform optimizations. So without that commit, it just never gets to hit the code path where the (bogus) violation is reported. |
@dotdash gah bummer! I was wondering if it'd be something like that... (I know we have tons of those "errors" in rustc with valgrind) If that's the case though it's sort of fascinating because it means that a binary on @eddyb's machine optimize IR differently than when it was on my machine... That still implies to me some level of undefined behavior but maybe not the kind flagged by valgrind? |
Ok some more information on this. I've finally been able to reproduce the error that @eddyb was seeing. Given the exact binaries from @eddyb I was originally unable to reproduce the issue he had on his machine. He realized, though, that our machines differed in glibc versions. Notably @eddyb had glibc 2.26 and I had glibc 2.23. Testing more versions revealed that with the binaries @eddyb gave me glibc 2.25 worked ok and glibc 2.26 was the first bad one. With this new information I reran bisection and it fascinatingly pointed at the same commit. I truly have no idea what is going on here. This may be a bug that only "just happens" to show up on glibc 2.26 though. The bots that are reproducing this, OSX and MinGW, are not using glibc. Still digging... |
I've filed https://bugs.llvm.org/show_bug.cgi?id=36386 upstream to hopefully address this issue. I don't think we can be 100% sure that this is the exact same issue that we're seeing on OSX/Windows but I think it's as close as we're gonna get for the time being. |
This comment seems to hint at the same issue: https://bugs.llvm.org/show_bug.cgi?id=32981#c2 Also related to JumpThreading and non-determinism due to pointer comparisons would tally with the glibc changes (some change could result in malloc giving out chunks of memory in a different order) as well as why it affects platforms without glibc. |
I'm going to preemptively close this as I believe symptom has been fixed for us with commenting out the |
Backport LLVM fixes for a JumpThreading / assume intrinsic bug This fixes the original cause of #48116 and restores the assume intrinsic that was removed as a workaround. r? @alexcrichton
Backport LLVM fixes for a JumpThreading / assume intrinsic bug This fixes the original cause of #48116 and restores the assume intrinsic that was removed as a workaround. r? @alexcrichton
First introduced by #47828 (comment).
Symptom: The following 3 test cases involving error E0432 will fail.
Mainly affects Windows and macOS machines (maybe because they are tested first).
Current instances:
The text was updated successfully, but these errors were encountered: