-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paused workers shouldn't steal tasks #5665
Conversation
idle = s.idle.values() | ||
saturated = s.saturated | ||
# Paused and closing workers must never become thieves | ||
idle = [ws for ws in s.idle.values() if ws.status == Status.running] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the big-o from O(1) to O(n) of the number of workers, for tasks without restrictions, and only if there are some saturated workers (otherwise the time of topk below, which is O(n), dominates).
I'm unconcerned about this - even with 1k~10k workers, this method runs every 100ms so this should be inconsequential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here is to filter out paused workers. I'm wondering why a paused worker would be classified as idle in the first place. Is paused and idle not counterintuitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idle worker is a worker that isn't processing any tasks.
A paused worker is a worker that isn't going to process any new tasks beyond the ones currently running (not queued) on it.
According to this definition, a paused worker may or may not be idle.
We could, of course, change the definition of idle to "not processing any tasks AND a prime candidate for new tasks", but I feel that would be counter-intuitive.
@@ -816,28 +817,47 @@ async def test_steal_twice(c, s, a, b): | |||
|
|||
while len(s.tasks) < 100: # tasks are all allocated | |||
await asyncio.sleep(0.01) | |||
# Wait for b to start stealing tasks | |||
while len(b.tasks) < 30: | |||
await asyncio.sleep(0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this, the title of the test, "steal twice", makes no sense!
|
||
# Army of new workers arrives to help | ||
workers = await asyncio.gather(*(Worker(s.address, loop=s.loop) for _ in range(20))) | ||
workers = await asyncio.gather(*(Worker(s.address) for _ in range(20))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the changes to stealing.py mean that these new workers will start stealing slightly later than before, since they now have to wait to reach running status. It remains to be seen if this causes flakiness.
|
24387cb
to
a26c770
Compare
The flakiness in test_close_gracefully is unrelated to this change - see #5677. |
commit 14bddba1482de035ca416368f918d94b115d3660 Merge: df079fc 9a66b71 Author: crusaderky <crusaderky@gmail.com> Date: Tue Jan 25 17:16:58 2022 +0000 Merge branch 'main' into AMM/RetireWorker commit df079fc Merge: 9676934 5835ce3 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 15:08:51 2022 +0000 Merge branch 'AMM/staging' into AMM/RetireWorker commit 5835ce3 Merge: ee68b02 b0b8e95 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 15:07:27 2022 +0000 Merge branch 'AMM/test_close_gracefully' into AMM/staging commit ee68b02 Merge: 7d4e6ee ccad288 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 15:06:50 2022 +0000 Merge branch 'main' into AMM/staging commit b0b8e95 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 14:32:29 2022 +0000 Code review commit 9676934 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:50:53 2022 +0000 AMM to manage retire_workers() commit 7d4e6ee Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:39:41 2022 +0000 Fix flaky test_close_gracefully and test_lifetime (dask#5677) commit 7faab51 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:39:41 2022 +0000 harden test commit aef3b71 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:29:10 2022 +0000 Increase resilience on slow CI commit af84e40 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:39:18 2022 +0000 Dump cluster state on all test failures (dask#5674) commit 5054c19 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:38:38 2022 +0000 Paused workers shouldn't steal tasks (dask#5665) commit eadb35f Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:37:48 2022 +0000 transition flight to missing if no who_has (dask#5653) commit 581aee8 Merge: 940bb45 60c0d60 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:36:09 2022 +0000 Merge branch 'main' into AMM/test_close_gracefully commit 940bb45 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:20:10 2022 +0000 tweak comment commit 731d132 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:12:03 2022 +0000 Fix flaky test_close_gracefully and test_lifetime
Closes #5664