-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move public callback methods to internal implementation #4
Labels
improvement
Improves an existing functionality
Comments
rapids-bot bot
pushed a commit
that referenced
this issue
Nov 28, 2023
#123 introduced timeouts to the generic callbacks, preventing failure to acquire lock due to GIL competition. However, those were not exposed to Python and at least one of the reasons it still timeouts is because of that, notice how the default `period=0` (never unblock) is used: ```cpp Thread 1 (Thread 0x7f36d675f740 (LWP 155586) "pytest"): #0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fff45058e58) at ../sysdeps/nptl/futex-internal.h:183 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fff45058e08, cond=0x7fff45058e30) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=0x7fff45058e30, mutex=0x7fff45058e08) at pthread_cond_wait.c:647 #3 0x00007f36d43634d4 in std::condition_variable::wait<ucxx::utils::CallbackNotifier::wait(uint64_t)::<lambda()> > (__p=..., __lock=..., this=0x7fff45058e30) at /opt/conda/envs/test/x86_64-conda-linux-gnu/include/c++/11.4.0/condition_variable:103 #4 ucxx::utils::CallbackNotifier::wait (this=this@entry=0x7fff45058e00, period=period@entry=0) at /datasets/pentschev/src/ucxx-deadlock/cpp/src/utils/callback_notifier.cpp:66 #5 0x00007f36d43470e1 in ucxx::Endpoint::close (this=0x7f369c701a90, period=0, maxAttempts=1) at /datasets/pentschev/src/ucxx-deadlock/cpp/src/endpoint.cpp:171 #6 0x00007f36d4753381 in __pyx_pw_4ucxx_4_lib_7libucxx_11UCXEndpoint_9close(_object*, _object* const*, long, _object*) () from /opt/conda/envs/test/lib/python3.10/site-packages/ucxx/_lib/libucxx.cpython-310-x86_64-linux-gnu.so ``` This PR exposes those arguments to Python and specify a default for Python async API `Endpoint.abort()` to prevent such deadlocks from occurring. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #136
rapids-bot bot
pushed a commit
that referenced
this issue
Dec 5, 2023
It is unclear why but for some reason `notify_all()` is causing futexes never to return in some situations. This occurs very frequently in CI and is also less frequently reproducible locally. The typical stack trace for the blocked thread is shown below: ```cpp Thread 6 (Thread 0x7f13ec84f700 (LWP 2823667) "pytest"): #0 futex_wait (private=<optimized out>, expected=32765, futex_word=0x7ffd5186a874) at ../sysdeps/nptl/futex-internal.h:141 #1 futex_wait_simple (private=<optimized out>, expected=32765, futex_word=0x7ffd5186a874) at ../sysdeps/nptl/futex-internal.h:172 #2 __condvar_quiesce_and_switch_g1 (private=<optimized out>, g1index=<synthetic pointer>, wseq=<optimized out>, cond=0x7ffd5186a860) at pthread_cond_common.c:416 #3 __pthread_cond_broadcast (cond=0x7ffd5186a860) at pthread_cond_broadcast.c:73 #4 0x00007f140fe5f23c in ucxx::BaseDelayedSubmissionCollection<std::function<void ()> >::process() (this=0x560d0effafd0) at /repo/cpp/include/ucxx/delayed_submission.h:154 #5 0x00007f140fe5f399 in ucxx::DelayedSubmissionCollection::processPost (this=<optimized out>) at /repo/cpp/src/delayed_submission.cpp:84 #6 0x00007f140fe7ed71 in ucxx::WorkerProgressThread::progressUntilSync(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>) (progressFunction=..., stop=@0x560d0f6527f8: false, startCallback=..., startCallbackArg=<optimized out>, delayedSubmissionCollection=...) at /opt/conda/envs/test/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/shared_ptr_base.h:1295 #7 0x00007f140fe7f3ee in std::__invoke_impl<void, void (*)(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>), std::function<bool ()>, std::reference_wrapper<bool>, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection> >(std::__invoke_other, void (*&&)(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>), std::function<bool ()>&&, std::reference_wrapper<bool>&&, std::function<void (void*)>&&, void*&&, std::shared_ptr<ucxx::DelayedSubmissionCollection>&&) (__f=<optimized out>, __f=<optimized out>) at /opt/conda/envs/test/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:61 #8 std::__invoke<void (*)(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>), std::function<bool ()>, std::reference_wrapper<bool>, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection> >(void (*&&)(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>), std::function<bool ()>&&, std::reference_wrapper<bool>&&, std::function<void (void*)>&&, void*&&, std::shared_ptr<ucxx::DelayedSubmissionCollection>&&) (__fn=<optimized out>) at /opt/conda/envs/test/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:96 #9 std::thread::_Invoker<std::tuple<void (*)(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>), std::function<bool ()>, std::reference_wrapper<bool>, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection> > >::_M_invoke<0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(std::_Index_tuple<0ul, 1ul, 2ul, 3ul, 4ul, 5ul>) (this=<optimized out>) at /opt/conda/envs/test/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_thread.h:259 #10 std::thread::_Invoker<std::tuple<void (*)(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>), std::function<bool ()>, std::reference_wrapper<bool>, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection> > >::operator()() (this=<optimized out>) at /opt/conda/envs/test/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_thread.h:266 #11 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::function<bool ()>, bool const&, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection>), std::function<bool ()>, std::reference_wrapper<bool>, std::function<void (void*)>, void*, std::shared_ptr<ucxx::DelayedSubmissionCollection> > > >::_M_run() (this=<optimized out>) at /opt/conda/envs/test/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_thread.h:211 #12 0x00007f140f92fe95 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104 #13 0x00007f1412647609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f1412412133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ``` Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #140
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Perhaps as part of #3 or even before a complete pImpl implementation, callbacks should be moved to an internal object, such that we don't need to keep them as as
public
members of the public API.The text was updated successfully, but these errors were encountered: