-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc 1.64.0 crashes on riscv64gc-unknown-linux-gnu #102155
Comments
WG-prioritization assigning P-critical for the moment (Zulip discussion). Would be great if someone could identify where this regression started (also using cargo bisect). @lunasophia just curious, did you have a chance to try nightlies or betas between the two stable 1.63 and 1.64? The answer might be there somewhere. Thanks. @rustbot label -I-prioritize +P-critical E-needs-bisection |
I have not, but I'll look at writing a script to automate it over the weekend. |
While testing this (i.e. I specified the date range for my search incorrectly and pulled in a post-1.64.0 nightly), I noticed the nightly I just installed does work without issue:
If further bisecting is desired, I'd be happy to do so, though I'm a little unclear on how nightly dates match up to version numbers. I installed the 2022-08-11 nightly (expecting it to be 1.64 or 1.63), and it's 1.65 (and still broken), so a list of dates to try would help me narrow down my search. |
I joined the Zulip chat and ran cargo-bisect-rustc as instructed. After some trial and error with the date range I found the following: searched nightlies: from nightly-2022-05-30 to nightly-2022-08-01 bisected with cargo-bisect-rustc v0.6.4Host triple: riscv64gc-unknown-linux-gnu cargo bisect-rustc 2022-05-30 --end 2022-08-01 --test-dir . |
Maybe cc @5225225 ? |
Wait what? That code shouldn't take effect unless you're running with Very strange! Maybe try running |
I'm not sure how helpful the gdb output will be. A lot of the locals have been optimized out and there's not a lot of symbol table information. gdb run on cargo
gdb run on rustc
If it matters, syscall 98 is indeed |
Here on Arch Linux RISC-V, we can't even build the config & logs: https://gist.github.com/r-value/61aa4658ec1a1c9bb803e768cb003d19 |
To be clear, we're building natively on RISC-V machines. We also tried native builds under |
EDIT: This is a misconfigured bisection, ignore it plz. searched nightlies: from nightly-2022-06-24 to nightly-2022-08-05 bisected with cargo-bisect-rustc v0.6.4Host triple: riscv64gc-unknown-linux-gnu cargo bisect-rustc --start=1.63.0 --end=1.64.0 |
There are now two bisections in this issue, and two compiler crashes being reported. Is the second bisection for the ICE when building the compiler on RISC-V? |
@saethlin Sorry about the misleading information. I found some misconfiguration in my bisection and did it again just now, finding that the actual regressed commit should be commit 263edd43c from #99033 when building the compiler on RISC-V natively and actually the same as the one from @lunasophia . |
FYI, I have successfully built a functional |
Just to confirm, reverting the commit in question fixes both the compiler panic and the bad codegen? |
I didn't try the cross-compiled compiler (I'm assuming the compiler binary used by @lunasophia is cross-compiled without bootstrap because it clearly can't compile itself). But since the native build bootstrapped successfully and works fine, I suppose it's reasonable to assume that reverting the commit fixes both problem. |
I'll admit that this is a part of the Rust project I'm not very familiar with, but I'm at a loss for how to explain how that panic is possible. So I hate to jump to this, but my best guess is that this is a miscompilation. This is a bit of a long shot, but we do have pretty rich debug assertions. Does setting [rust]
debug = true cause whatever you're doing to cause bootstrapping to do anything at all different than the above panic backtrace? |
I'm running Gentoo on my visionfive and see this issue with the |
Thanks for your advise! I'm setting this and rebuilding w/o the commit reverted. This may take some time to finish on the HiFive Unmatched :) |
Well, things become even more interesting - the 1.64.0 release passed the bootstrap with I suppose there is something wrong with the optimizer. The assertions might have interfered with some optimization procedure and prevented |
I agree with your speculation, but that's about all the help I have to offer. The commit that regresses this really shouldn't change codegen, but if it does I expect the change should be visible in MIR. So the only thing I can think of to do next is compile a bunch of Rust code to MIR with and without that commit and try to find an example that produces different MIR. That might hint at a small example that can be used to reproduce the miscompilation. I'm going to do my best to update the labels. There is a compiler team meeting in about 2 days I think, they should have something to say about this. I hope. @rustbot label -E-needs-bisection +A-LLVM +O-riscv |
I attempted my above suggestion on nextest and the standard library. Exactly the same MIR with and without the commit. Which is so incredibly consistent I almost wonder if I did it wrong. |
This partially reverts commit 53f2e77. Issue: rust-lang/rust#102155 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
FYI I am seeing the same issues in Fedora 37. Native compile on SiFive HiFive Unmatched. |
@davidlt Per this comment: #102155 (comment) can you confirm that nightly works? Or, ideally, beta? The current stable is now 1.65 so it's possible that the referenced nightly that fixes the issue is now beta. If that is the case, stable should work again for you in a few weeks. Though we still have no idea what's going on here so it's possible this was "fixed" by some unrelated change which causes us to no longer tickle a miscompilation in LLVM, which means this bug could resurface at any time. |
What's the actual status for this issue? Thanks @rustbot label -P-critical +P-medium +T-compiler |
I haven't seen this problem in Fedora/RISCV land in 1.66, or 1.67. Soon to be updated to 1.68. |
The problem also disappeared in Arch Linux RISC-V since version 1.65.0. |
thanks for the comments! I'll tentatively close this issue, please feel free to reopen is it's not the case (cc: @lunasophia ) |
Sorry to bump an issue that's already been closed, but I am seeing exactly this error still happen, also on a VisionFive (V1) board. I debugged with GDB and the segfault is happening in I don't think it's entirely deterministic, and sometimes if I repeatedly run the same rustc command from cargo, it consistently happens, but then when I run cargo again from the beginning I get rustc failing on a different crate instead. For example, I failed to build ripgrep twice, but then on the third time it actually succeeded. When there is a segfault it's always in For reference I've tried on both 1.65.0 and 1.71.0, and if anything, 1.71.0 seemed to be worse. I'd love to bisect this issue again to see if I can narrow it down, but I gather it would require me setting up a script to copy the cross compiled rustc over to the board and test it a few times to ensure it is/isn't an issue. I also don't believe it's a hardware issue because I haven't experienced any weird segfaults in anything else, and I've also been able to use software that contains Rust without anything like this happening - for example, Firefox runs great. I would really love to believe that it's just an issue with insufficient power causing weird things to happen, but it's so consistently happening on that specific futex_wait syscall, even though it happens inconsistently. I've been doing a bit more debugging to try to figure out what's going on. This appears to be something that can only happen if there's contention on the atomic word being waited on - sensibly, otherwise we wouldn't wait - so trying to get GDB to catch the value being passed to syscall fails completely, because the breakpoint interrupts any kind of contention and we don't see that happen anymore. I've looked at the disassembly of I wonder if strace might let me see what's happening? One example of a segfault causing
I have noticed that the pointer starting with
but on a failure I saw:
It could be nothing, but it could be something. I examined the core dump and found that the address passed to Part of me now wants to blame the kernel, as it doesn't feel like rustc is doing anything it shouldn't. |
@devyn I don't think it's related to the issue. This issue describes a serious ICE that segfaults almost unconditionally and our logs are not indicating any signs of futex issue. So far this issue never happened again in new releases. You should probably file a new issue fully describing your scenario. |
I upgraded my stable-riscv64gc-unknown-linux-gnu Rust toolchain on my StarFive VisionFive board running the official Ubuntu release this afternoon. I then tried to use the compiler, but I receive a reproducible compiler error. I can confirm that version 1.63.0 works without issue.
Code
I tried this code:
I expected to see this happen: the code to build
Instead, this happened: I receive a reproducible error (see backtrace below).
Version it worked on
It most recently worked on: Rust 1.63
Version with regression
rustc --version --verbose
:Backtrace
Backtrace
@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged
The text was updated successfully, but these errors were encountered: