-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust nightly 2018-08-17 or later causes random segmentation faults or panics #53529
Comments
There's a lot of questionable unsafe code in that project. For example, this function should really be marked as |
@jonastepe Simply marking functions as The various
|
Yes, but marking a function as
Check out the byteorder crate. |
This wouldn't achieve much, as a lot would have to be marked unsafe as VMs and GCs inherently perform many operations Rust may consider unsafe. There are a few cases where I want to get rid of more risky code (e.g. the borrowing of
I avoided it in the past because it also just used |
To better understand what is going on I dumped the MIR output using |
It appears the LLVM IR uses a different order for things when using the different nightly versions. This generates a lot of diff noise, so I'm not sure how useful this information will be. I tried to clean it up a bit by replacing various SHA hashes with placeholders, but even then there is a lot of noise. |
The ASM output was a bit easier to clean up, but there are still lots of differences. To be exact,
Again this is probably because of ordering. The new version has 99 lines more, but I'm really not sure if that's expected or not. |
Using https://github.com/rust-lang-nursery/cargo-bisect-rustc I found that nightly-2018-08-19 is the first version that introduces the regression, while nightly-2018-08-18 works fine. The commits included into the 18th nightly can be found at 1fa9449...33b923f (if I got things right at least). |
@jonas-schievink I don't fully understand the first mentioned PR. What would I need to do to verify if that has any impact? For the second one, I'll see if I can somehow get my code to use a version of |
A custom version of |
I ended up replacing all my uses of cc @Pazzaz |
@yorickpeterse The first PR changes details about the generated code that is passed to LLVM. In fact, it just removes unneeded information. The easiest way to check if it's at fault is to manually build the compiler from It would also be helpful if you could reduce the code as much as possible so that we can get a few hundred line example that shows the issue. |
Specifically, the patch I applied is: https://gist.github.com/YorickPeterse/940db2613c77bec9738124e6df273320 |
I narrowed this down to the following file being broken by the changes: https://gitlab.com/inko-lang/inko/blob/master/vm/src/mailbox.rs. If I switch back to There are two possibilities here:
The Mailbox data structure has multiple threads accessing it concurrently. This is synchronised using a mutex of type
I'll see if I can refactor this code a bit so it's safer, that way we can at least rule out my own code being the problem (or not). |
Upon closer inspection, the VecDeque changes specifically affect self.internal.append(&mut self.external.drain(0..).collect()); I don't remember why this is using Next I changed this code to the following: self.internal.extend(self.external.drain(..)); This is based on the old implementation of self.extend(other.drain(..)); This then produces output like:
This is a bit odd considering the implementation is the same as before, but at least now part of the program is running. |
There was another place where I relied on This suggests the failure is definitely due to the changes made to |
Good news: Thanks to the great investigation of @yorickpeterse and @jonas-schievink, I think I found the reason for the segfault: the new |
@MaloJaffre Aha, thanks! I was looking at the code to see how I might be able to trigger this problem, but I hadn't made my way to |
@MaloJaffre That could be it, nice find! |
The problem AFAIK is that this function is triggering Undefined Behavior, and so can cause the program to segfault randomly, and so it may be difficult to reproduce it reliably. |
I can verify that this is because of my PR. Minimal example that segfaults for me: use std::collections::VecDeque;
fn main() {
let mut dst = VecDeque::new();
dst.push_front(Box::new(1));
dst.push_front(Box::new(2));
dst.pop_back();
let mut src = VecDeque::new();
src.push_front(Box::new(0));
dst.append(&mut src);
for a in dst {
}
} And the reason for the segfault is apparent when run with simpler types use std::collections::VecDeque;
fn main() {
let mut dst = VecDeque::new();
dst.push_front(12);
dst.push_front(1234);
dst.pop_back();
let mut src = VecDeque::new();
src.push_front(1);
dst.append(&mut src);
for (i, a) in dst.iter().enumerate() {
println!("{}, {}", i, a);
}
} This will print
and continue in such a loop forever, printing the buffer over and over again. This happens because of line 1911 hitting an edge case when if dst_high.len() == src_total {
0
} else {
original_head + src_total
} I think that is the only thing that is causing these segfaults. |
…onSapin" This partially reverts commit d5b6b95, reversing changes made to 6b1ff19. Fixes rust-lang#53529. Cc: rust-lang#53564.
Reoptimize VecDeque::append ~Unfortunately, I don't know if these changes fix the unsoundness mentioned in #53529, so it is stil a WIP. This is also completely untested. The VecDeque code contains other unsound code: one example : [reading unitialized memory](https://play.rust-lang.org/?gist=6ff47551769af61fd8adc45c44010887&version=nightly&mode=release&edition=2015) (detected by MIRI), so I think this code will need a bigger refactor to make it clearer and safer.~ Note: this is based on #53571. r? @SimonSapin Cc: #53529 #52553 @yorickpeterse @jonas-schievink @Pazzaz @shepmaster.
This should stop CI from failing until rust-lang/rust#53529 is resolved in Rust.
https://gitlab.com/inko-lang/inko is a programming language that I am working on, and the VM is written in Rust. Up until Rust nightly 2018-08-17, everything works fine. Starting with the nightly from the 17th, I'm observing various crashes and different program behaviour. For example:
The last nightly that did not suffer from these problems was Rust 2018-08-16. Stable Rust also works fine. When the segmentation faults happen, they are usually in different places. For example, for one segmentation fault the backtrace is as follows:
While for another segfault the backtrace is instead:
And a third segfault:
In case of the last segfault, it seems certain local variables that are used are NULL pointers, when this should be impossible. Debugging this in GDB proves to be quite difficult, as a variety of variables are reported as even when debugging symbols are included. For example, for the last backtrace the output of "info locals" is:
The VM test suite passes, even when running
cargo test --release
. I'm wondering if perhaps code is optimised in the wrong way, and this is somehow not triggered in the test suite (certainly possible, code coverage is not 100%).Reproducing this is a bit weird. If we leave the code as-is, the segmentation faults rarely occur, instead the VM panics with the following:
However, if we apply the following patch things will start to segfault really quick:
git clone https://gitlab.com/inko-lang/inko.git
cd inko
make -C vm profile
curl https://gist.githubusercontent.com/YorickPeterse/2be478ab617ad02e9e2495130e8f32f0/raw/38ca8bcab963d5b9fc4d192e126f546bff0f6aa9/crash.patch | patch -p1 -N
env RUBYLIB=./compiler/lib ./compiler/bin/inko-test -d runtime --vm vm/target/release/ivm
Note that the last command requires Ruby 2.3 or newer. This will run the test suite of the standard library, which is where all the crashes happen rather frequently (probably because they run much more than the VM's own test suite).
The text was updated successfully, but these errors were encountered: