Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Deadlock in verification queue #3686

Closed
arkpar opened this issue Nov 30, 2016 · 1 comment
Closed

Deadlock in verification queue #3686

arkpar opened this issue Nov 30, 2016 · 1 comment
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.

Comments

@arkpar
Copy link
Collaborator

arkpar commented Nov 30, 2016

Test execution halts on verification queue cleanup sometimes

#0  0x00007fbf0ad6f404 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000561165a3b367 in std::sys::imp::condvar::{{impl}}::wait (self=0x7fbf09543210, mutex=0x7fbf09543030) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/sys/unix/condvar.rs:64
#2  0x00005611659e1b85 in std::sys_common::condvar::{{impl}}::wait (self=0x7fbf09543210, mutex=0x7fbf09543030) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/sys_common/condvar.rs:51
#3  0x0000561165a3d7d7 in std::sync::condvar::{{impl}}::wait<()> (self=0x7fbef4815070, guard=...) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/sync/condvar.rs:125
#4  0x00005611661ea9f8 in ethcore::verification::queue::{{impl}}::verify<ethcore::verification::queue::kind::blocks::Blocks> (verification=..., engine=..., wait=..., ready=..., deleting=..., empty=..., sleep=...)
    at /builds/Mirrors/ethcore-parity/ethcore/src/verification/queue/mod.rs:309
#5  0x0000561166349277 in ethcore::verification::queue::{{impl}}::new::{{closure}}::{{closure}}<ethcore::verification::queue::kind::blocks::Blocks> () at /builds/Mirrors/ethcore-parity/ethcore/src/verification/queue/mod.rs:256
#6  0x000056116571f952 in ethcore_io::panics::{{impl}}::catch_panic<closure,()> (self=0x7fbf09543280, g=...) at /builds/Mirrors/ethcore-parity/util/io/src/panics.rs:85
#7  0x00005611663a0b00 in ethcore::verification::queue::{{impl}}::new::{{closure}}<ethcore::verification::queue::kind::blocks::Blocks> () at /builds/Mirrors/ethcore-parity/ethcore/src/verification/queue/mod.rs:255
#8  0x000056116605e3b3 in std::panic::{{impl}}::call_once<(),closure> (self=..., _args=0) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panic.rs:295
#9  0x0000561165a4d048 in std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure>,()> (data=0x7fbee2dfd5b0 "p2T\t\277\177") at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panicking.rs:356
#10 0x0000561166c513eb in panic_unwind::__rust_maybe_catch_panic () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libpanic_unwind/lib.rs:97
#11 0x0000561165a4c2b1 in std::panicking::try<(),std::panic::AssertUnwindSafe<closure>> (f=...) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panicking.rs:332
#12 0x0000561165a3e577 in std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure>,()> (f=...) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panic.rs:351
#13 0x00005611663851f2 in std::thread::{{impl}}::spawn::{{closure}}<closure,()> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/thread/mod.rs:287
#14 0x0000561165ba1bff in alloc::boxed::{{impl}}::call_box<(),closure> (self=0x7fbf094950c0, args=0) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/liballoc/boxed.rs:595
#15 0x0000561166c47145 in alloc::boxed::{{impl}}::call_once<(),()> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/liballoc/boxed.rs:605
#16 std::sys_common::thread::start_thread () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/sys_common/thread.rs:21
#17 std::sys::imp::thread::{{impl}}::new::thread_start () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/sys/unix/thread.rs:84
#18 0x00007fbf0ad6b184 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007fbf0a88237d in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

Thread 14 (LWP 6091):
#0  0x00007fbf0ad6c65b in pthread_join () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000561165a6d835 in std::thread::{{impl}}::join<()> (self=0x7fbeceff8a60) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/thread/mod.rs:719
#2  0x0000561165a720ef in std::thread::{{impl}}::join<()> (self=...) at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/thread/mod.rs:779
#3  0x00005611661e795a in ethcore::verification::queue::{{impl}}::join (self=...) at /builds/Mirrors/ethcore-parity/ethcore/src/verification/queue/mod.rs:94
#4  0x00005611661ef9d9 in ethcore::verification::queue::{{impl}}::drop<ethcore::verification::queue::kind::blocks::Blocks> (self=0x7fbeceff8f50) at /builds/Mirrors/ethcore-parity/ethcore/src/verification/queue/mod.rs:678
#5  0x0000561165b79731 in drop::h085a75d54dd7cbfb () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libcore/iter/iterator.rs:132
#6  0x00005611661f025c in ethcore::verification::queue::tests::can_be_created () at /builds/Mirrors/ethcore-parity/ethcore/src/verification/queue/mod.rs:707
#7  0x00005611663b6dbf in test::run_test::{{closure}} () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libtest/lib.rs:1265
#8  test::{{impl}}::call_box<(),closure> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libtest/lib.rs:141
#9  0x0000561166c513eb in panic_unwind::__rust_maybe_catch_panic () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libpanic_unwind/lib.rs:97
#10 0x00005611663abed0 in std::panicking::try<(),std::panic::AssertUnwindSafe<closure>> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panicking.rs:332
#11 std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure>,()> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panic.rs:351
#12 test::run_test::run_test_inner::{{closure}} () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libtest/lib.rs:1210
#13 std::panic::{{impl}}::call_once<(),closure> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panic.rs:295
#14 std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure>,()> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panicking.rs:356
#15 0x0000561166c513eb in panic_unwind::__rust_maybe_catch_panic () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libpanic_unwind/lib.rs:97
#16 0x00005611663b21d3 in std::panicking::try<(),std::panic::AssertUnwindSafe<closure>> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panicking.rs:332
#17 std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure>,()> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/panic.rs:351
#18 std::thread::{{impl}}::spawn::{{closure}}<closure,()> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/thread/mod.rs:287
#19 alloc::boxed::{{impl}}::call_box<(),closure> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/liballoc/boxed.rs:595
#20 0x0000561166c47145 in alloc::boxed::{{impl}}::call_once<(),()> () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/liballoc/boxed.rs:605
#21 std::sys_common::thread::start_thread () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/sys_common/thread.rs:21
#22 std::sys::imp::thread::{{impl}}::new::thread_start () at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/obj/../src/libstd/sys/unix/thread.rs:84
#23 0x00007fbf0ad6b184 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#24 0x00007fbf0a88237d in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
@arkpar arkpar added F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. labels Nov 30, 2016
@rphmeier
Copy link
Contributor

rphmeier commented Dec 2, 2016

Seems to be waiting on the wait condvar rather than anything else given that it's the pthread impl and not parking_lot.

I think this one might be my fault as a subtle bug in the adjustable verifiers PR.
I believe this is a race condition in drop combined with VerifierHandle::conclude. What's happening is that the last verifier may begin waiting on the more_to_verify condvar after the notify_all is called under the right scheduling circumstances.

A simple fix here would be to ensure the deleting flag is set before waking the thread up. I'm not 100% sure of the atomic ordering of Thread::unpark w.r.t. compiler or CPU reordering so to make this guarantee a fence may be required.

rphmeier added a commit that referenced this issue Dec 2, 2016
arkpar pushed a commit that referenced this issue Dec 5, 2016
* possible fix for #3686

* queue: simplify conclusion, don't block on joining

* queue: park verifiers with timeout to prevent race

* more robust verification loop

* queue: re-introduce wait for verifier joining
@rphmeier rphmeier closed this as completed Dec 5, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

No branches or pull requests

2 participants