-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure distributed.comm.core.connect can always be cancelled #6064
Conversation
Unit Test Results 16 files ± 0 16 suites ±0 7h 27m 30s ⏱️ - 11m 23s For more details on these failures, see this check. Results for commit 676bb18. ± Comparison against base commit fc5b460. ♻️ This comment has been updated with latest results. |
b544ac6
to
bda34ea
Compare
5d2fce9
to
73a4da2
Compare
task.cancel() | ||
await watcher.wait() | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure if I need to await the task itself here. There are no warnings
cc @graingert
73a4da2
to
8d125a0
Compare
], | ||
) | ||
@gen_test() | ||
async def test_connection_pool_cancellation(monkeypatch, closing, backend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests do not capture this race condition but the rewrite is much cleaner and covers more ground so I'd like to keep it
8d125a0
to
c4e1120
Compare
There is a CPython issue which looks like the root cause, see python/cpython#86296 |
c4e1120
to
676bb18
Compare
I noticed that tests like
test_worker_waits_for_scheduler
are not properly running even though the test logic is extremely simple.The test
test_worker_waits_for_scheduler
particularly hangs in theasyncio.wait_for
statement waiting for the worker to come up.After some debugging, it appears the worker is stuck in the connect retry implementation trying to connect to the scheduler. This connect should've been cancelled after the
wait_for
hits its timeout but it wasn't.It appears there is a race condition when cancelling a task. If the Task is cancelled right after it is finished and the exception is set, it will not raise a CancelledError but the original Exception. I don't know if this is a bug in CPython but it is not the behaviour I anticipated. I wrote the
ensure_cancellation
wrapper to make sure this behaviour is deterministiccc @graingert
Closes #5861