-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"died due to signal 11" in collectionstests on arm-android #55861
Comments
Added #55052 (comment) to the list |
I suspect #56869 (comment) on i686-musl is the same as this issue. Perhaps we should run the collectiontest in miri.
The stacktrace doesn't seem meaningful. |
We've asked for help on internals: https://internals.rust-lang.org/t/help-wanted-to-debug-a-segfault-in-a-standard-library-test-on-android/9428 |
I've managed to reproduce this locally, using the same Android emulator used by CI. I'm working on creating a self-contained script to make it easy to test this locally. |
I believe I've obtained a backtrace from the emulator:
Interestingly enough, it appears that the process actually hangs in the emulator, rather than dying. This allowed me to install gdb and connect the process after the test had failed. |
I strongly suspect that this is related to the |
Some additional results: I added the following script as a test (it contains some logic from use std::cell::Cell;
use std::thread;
use std::panic;
thread_local!(static SILENCE_PANIC: Cell<bool> = Cell::new(false));
#[test]
fn test_panic_hook() {
let prev = panic::take_hook();
panic::set_hook(Box::new(move |info| {
if !SILENCE_PANIC.with(|s| s.get()) {
prev(info);
}
}));
for i in 0..1000 {
let _ = thread::spawn(move || {
SILENCE_PANIC.with(|s| s.set(true));
panic!("Panicked from thread: {}", i);
}).join();
}
println!("All done!");
} I then manually uploaded it to the emulator, and invoked it in a loop with the following Bash script (run from
After running this for about 15-20 minutes, I stopped the loop. For reasons I don't yet understand, several of the spawned processes has segfaulted, wthout stopping the loop. I then uploaded a static GDB binary from here to the emulator (this was the only way I could manage to get GDB to work). I managed to obtain the following backtrace from my test program:
From what I can see, the panic hook appears to be executing, and then somehow jumping back into the body of the loop. EDIT: I believe that this is actually showing the normal invocation of my test function. Note that this was the only thread running. I'm not sure if this is related to the original issue, but it suggets that something weird is going on with the panicking threads. My current hypothesis is that there's some sort of concurrency and/or codegen issue with the the |
@Aaron1011 Do you still have instructions for reproducing this bug locally? I will look into this issue. |
@Amanieu: Unfortunately, I don't think that I do (I really should have documented what I previously did...) I'll see if I can reconstruct what I did, and post instructions. |
I ran the @pietroalbini: Could you link to some of the failed Github Actions jobs? |
It turns out that I was accidentally running the @Amanieu: Run crash.sh from this branch. This script runs the |
I was finally able to obtain an android tombstone file: https://gist.github.com/Aaron1011/94e97e3b5a7ba62f024ce4fda8211bdc This was quite difficult to obtain - the emulator appears to get force-killed when a test fails (the outer Docker contain shuts down with the emulator still running), which caused the tombstone written to |
I came across the following code: rust/src/libstd/sys/unix/alloc.rs Lines 55 to 76 in 83f8c02
I don't know if it's related, but it's something to keep in mind. |
API level 9 means Android 2.3-2.3.3 released 9 years ago which seems overly conservative. |
It might be worth switching it to plain |
Unlike x86 on ARM alignment is very important. Normal load and store instructions can't access unaligned memory. |
Sure but the default malloc alignment is enough for that. |
I think it's going to be extremely difficult to debug this further in the android vm. I'm currently trying to reproduce this without android, using a combination of |
So, here's what I got so far. I've narrowed it down to this command line, which runs only 35 tests:
Crashes are rare, but when it does crash it seems to always be in
The segmentation fault is misleading, that's just what Android's libc uses for
From the looks of it, there is no problem in our code: this is purely a bug in Android's dlmalloc. My guess is that there is some integer overflow issue in dlmalloc's internal size calculations, but dependent on some internal malloc state (hence the non-determinism). |
Looking at the memory dumps from the tombstone, we can see that the bug is somewhere in
Note the chunk prefix of Note that this bug is not an issue for modern Android versions since those use jemalloc instead of dlmalloc for the default libc allocator. In light of this I think we should just disable the |
Thanks for investigating this ❤️ |
EDIT: I misunderstood how @Amanieu: How do you know it's caused by
It looks like Rust isn't actually calling into |
Symptom: The
arm-android
test failed with the following messages:Previous instances:
This might be caused by a mis-optimization like #49775.
The text was updated successfully, but these errors were encountered: