Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests segfault with latest stable rustc (1.62.0) and nightly #956

Open
HackerFoo opened this issue Jul 9, 2022 · 7 comments
Open

Tests segfault with latest stable rustc (1.62.0) and nightly #956

HackerFoo opened this issue Jul 9, 2022 · 7 comments

Comments

@HackerFoo
Copy link

On MacOS 12.4, M1 (aarch64):

$ rustc --version
rustc 1.64.0-nightly (06754d885 2022-07-08)

$ git rev-parse HEAD
a92f91bf43aa3fd7f37f57bf603122a315255b9e

$ cargo test
   Compiling autocfg v1.1.0
   Compiling cfg-if v1.0.0
   Compiling libc v0.2.126
   Compiling crossbeam-utils v0.8.8
   Compiling lazy_static v1.4.0
   Compiling scopeguard v1.1.0
   Compiling rayon-core v1.9.3 (/Users/dusty/src/rayon/rayon-core)
   Compiling ppv-lite86 v0.2.16
   Compiling either v1.6.1
   Compiling memoffset v0.6.5
   Compiling crossbeam-epoch v0.9.8
   Compiling rayon v1.5.3 (/Users/dusty/src/rayon)
   Compiling crossbeam-channel v0.5.4
   Compiling getrandom v0.2.6
   Compiling num_cpus v1.13.1
   Compiling rand_core v0.6.3
   Compiling rand_chacha v0.3.1
   Compiling rand_xorshift v0.3.0
   Compiling crossbeam-deque v0.8.1
   Compiling rand v0.8.5
    Finished test [unoptimized + debuginfo] target(s) in 8.99s
     Running unittests src/lib.rs (target/debug/deps/rayon-18b37e9b79a21e5d)

running 193 tests
test iter::collect::test::left_produces_too_many_items - should panic ... ok
test iter::collect::test::only_right_result - should panic ... ok
test iter::collect::test::left_produces_items_with_no_complete - should panic ... ok
test iter::collect::test::produce_fewer_items - should panic ... ok
test iter::collect::test::left_produces_fewer_items - should panic ... ok
test iter::collect::test::only_left_result - should panic ... ok
test iter::collect::test::left_produces_fewer_items_drops ... ok
test iter::collect::test::produce_too_many_items - should panic ... ok
test iter::collect::test::produces_items_with_no_complete ... ok
test iter::collect::test::reducer_does_not_preserve_order - should panic ... ok
test iter::collect::test::right_produces_fewer_items - should panic ... ok
test iter::collect::test::right_produces_items_with_no_complete - should panic ... ok
test iter::collect::test::right_produces_too_many_items - should panic ... ok
test iter::find_first_last::test::find_last_folder_yields_last_match ... ok
test iter::find_first_last::test::find_first_folder_does_not_clobber_first_found ... ok
test iter::find_first_last::test::same_range_first_consumers_return_correct_answer ... ok
test iter::find_first_last::test::same_range_last_consumers_return_correct_answer ... ok
test iter::test::check_chunks_empty ... ok
test delegate::unindexed_example ... ok
test delegate::indexed_example ... ok
test iter::test::check_chunks_even_size ... ok
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `.../src/rayon/target/debug/deps/rayon-18b37e9b79a21e5d` (signal: 11, SIGSEGV: invalid memory reference)
@HackerFoo
Copy link
Author

They also fail with the stable version on my machine: rustc 1.62.0 (a8314ef7d 2022-06-27)

@HackerFoo HackerFoo changed the title Tests segfault with rustc nightly Tests segfault with latest stable rustc (1.62.0) and nightly Jul 9, 2022
@HackerFoo
Copy link
Author

HackerFoo commented Jul 9, 2022

I ran git bisect and found that the tests break right after v1.4.1 when the crossbeam dependencies were updated: e81835c

So it may be a problem in crossbeam. Although I still can't get all tests to pass with v1.4.1.

@cuviper
Copy link
Member

cuviper commented Jul 12, 2022

Can you get backtraces from the crashed program? Preferably limit it to test one at a time with cargo test -j1, if that still reproduces, and get backtraces for all threads at the time of the crash.

I'm also interested if older versions of rustc have the same symptom, in case it's a compiler issue.

@HackerFoo
Copy link
Author

Using -j1 and RUST_BACKTRACE=1 still doesn't produce a backtrace. I think the segfault happens before the program can panic. Using rust-lldb I got this:

(lldb) settings set -- target.run-args  "--test-threads" "1"
(lldb) run
Process 50446 launched: '/Users/dusty/src/rayon/target/debug/deps/rayon-841cdd372d40099f' (arm64)

running 193 tests
test delegate::indexed_example ... Process 50446 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=257, address=0x321031a0313030d)
    frame #0: 0x0000031a0313030d
error: memory read failed for 0x31a03130200
Target 0: (rayon-841cdd372d40099f) stopped.
(lldb) bt
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
* thread #2, stop reason = EXC_BAD_ACCESS (code=257, address=0x321031a0313030d)
  * frame #0: 0x0000031a0313030d
    frame #1: 0x000000010070aef8 rayon-841cdd372d40099f`rayon_core::job::JobRef::execute::hef083bb297ddfbb8(self=JobRef @ 0x0000000170005ef0) at job.rs:57:9
    frame #2: 0x0000000100714e5c rayon-841cdd372d40099f`rayon_core::registry::WorkerThread::execute::h6b475fdee06cf4ea(self=0x0000000170006200, job=JobRef @ 0x0000000170005f20) at registry.rs:752:9
    frame #3: 0x000000010078c4f0 rayon-841cdd372d40099f`rayon_core::registry::WorkerThread::wait_until_cold::h06d8af84470f2e16(self=0x0000000170006200, latch=0x0000000101b06020) at registry.rs:729:17
    frame #4: 0x00000001006f6818 rayon-841cdd372d40099f`rayon_core::registry::WorkerThread::wait_until::h533f59f5b25a144c(self=0x0000000170006200, latch=0x0000000101b06020) at registry.rs:703:13
    frame #5: 0x00000001007152c8 rayon-841cdd372d40099f`rayon_core::registry::main_loop::h4e024679d38df59c(worker=<unavailable>, registry=strong=11, weak=0, index=0) at registry.rs:836:5
    frame #6: 0x0000000100714094 rayon-841cdd372d40099f`rayon_core::registry::ThreadBuilder::run::hdf210ee90ec5ba7a(self=ThreadBuilder @ 0x00000001700066c8) at registry.rs:55:18
    frame #7: 0x00000001006f5750 rayon-841cdd372d40099f`_$LT$rayon_core..registry..DefaultSpawn$u20$as$u20$rayon_core..registry..ThreadSpawn$GT$::spawn::_$u7b$$u7b$closure$u7d$$u7d$::hfe17f534d6418e95 at registry.rs:100:20
    frame #8: 0x0000000100706b84 rayon-841cdd372d40099f`std::sys_common::backtrace::__rust_begin_short_backtrace::h4aef08b0801fd3be(f=<unavailable>) at backtrace.rs:122:18
    frame #9: 0x00000001006ed700 rayon-841cdd372d40099f`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h82938bf9e87dbc2f at mod.rs:505:17
    frame #10: 0x00000001006fe788 rayon-841cdd372d40099f`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h5b4399d67e30d242(self=<unavailable>, _args=<unavailable>) at unwind_safe.rs:271:9
    frame #11: 0x000000010070e2f0 rayon-841cdd372d40099f`std::panicking::try::do_call::hfe8228d56bd19987(data="") at panicking.rs:492:40
    frame #12: 0x000000010070e6f8 rayon-841cdd372d40099f`__rust_try + 32
    frame #13: 0x000000010070d944 rayon-841cdd372d40099f`std::panicking::try::h039485c9ba3650b2(f=<unavailable>) at panicking.rs:456:19
    frame #14: 0x00000001007131f8 rayon-841cdd372d40099f`std::panic::catch_unwind::h9a6cb21655d455b9(f=<unavailable>) at panic.rs:137:14
    frame #15: 0x00000001006ecbe0 rayon-841cdd372d40099f`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::h08a10205cf231394 at mod.rs:504:30
    frame #16: 0x0000000100718a44 rayon-841cdd372d40099f`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hdf85d15302edf554((null)=0x0000600002900000, (null)=<unavailable>) at function.rs:248:5
    frame #17: 0x000000010075dc98 rayon-841cdd372d40099f`std::sys::unix::thread::Thread::new::thread_start::h9203b921991254d0 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h0b34b264ef2d91e2 at boxed.rs:1934:9 [opt]
    frame #18: 0x000000010075dc8c rayon-841cdd372d40099f`std::sys::unix::thread::Thread::new::thread_start::h9203b921991254d0 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h7a89a45f7a1d4561 at boxed.rs:1934:9 [opt]
    frame #19: 0x000000010075dc88 rayon-841cdd372d40099f`std::sys::unix::thread::Thread::new::thread_start::h9203b921991254d0 at thread.rs:108:17 [opt]
    frame #20: 0x0000000194e0426c libsystem_pthread.dylib`_pthread_start + 148

Frame 0 changes each time and is garbage like 0xffffffff0300ffa9.

I'll try older versions of rustc.

@HackerFoo
Copy link
Author

I don't think it's a compiler bug - I tried each stable version back to 1.58.0. All tests pass if I downgrade crossbeam-deque to 0.7.4, so I'm going to write an issue there, and just patch that for now.

diff --git a/Cargo.toml b/Cargo.toml
index a3e0bff..92abea7 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -20,7 +20,7 @@ exclude = ["ci"]

 [dependencies]
 rayon-core = { version = "1.9.2", path = "rayon-core" }
-crossbeam-deque = "0.8.1"
+crossbeam-deque = "0.7"

 # This is a public dependency!
 [dependencies.either]
diff --git a/rayon-core/Cargo.toml b/rayon-core/Cargo.toml
index db0bb48..87b29a4 100644
--- a/rayon-core/Cargo.toml
+++ b/rayon-core/Cargo.toml
@@ -18,7 +18,7 @@ categories = ["concurrency"]
 [dependencies]
 num_cpus = "1.2"
 crossbeam-channel = "0.5.0"
-crossbeam-deque = "0.8.1"
+crossbeam-deque = "0.7"
 crossbeam-utils = "0.8.0"

@HackerFoo
Copy link
Author

I started to write an issue for crossbeam, but I don't think I really have enough information for a useful issue yet.

@cuviper
Copy link
Member

cuviper commented Jul 16, 2022

Using -j1 and RUST_BACKTRACE=1 still doesn't produce a backtrace. I think the segfault happens before the program can panic.

Sorry, yes I meant a backtrace from a debugger. The panic machinery doesn't run for signal exits like SIGSEGV.

 * frame #0: 0x0000031a0313030d
   frame #1: 0x000000010070aef8 rayon-841cdd372d40099f`rayon_core::job::JobRef::execute::hef083bb297ddfbb8(self=JobRef @ 0x0000000170005ef0) at job.rs:57:9

Frame 0 changes each time and is garbage like 0xffffffff0300ffa9.

Ok, JobRef is just a data pointer and function pointer, and JobRef::execute calls the function with the data argument. We get those JobRef by stealing from the deques. So that bogus frame #0 looks like we're getting a garbage function pointer and ending up in the weeds when we call it.

I started to write an issue for crossbeam, but I don't think I really have enough information for a useful issue yet.

They're well aware of rayon too... if you just summarize and link to this issue, especially the fact that it seems to depend on the version of crossbeam-deque, that should be enough to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants