Fix regression where unknown tasks were allowed to be stolen #5392

fjetter · 2021-10-05T14:22:24Z

This has been marked flaky in #3574. The problem is that the test is written on such a high level that it may be true regardless of whether we allow unknown functions or not. In particular, we're setting duration estimates for every task once they are in state processing. If we do not explicitly verify if a task is unknown, we cannot blacklist it from stealing.

fjetter · 2021-10-06T16:52:59Z

distributed/scheduler.py

-                if s:
-                    for tts in s:
-                        if tts._processing_on is not None:
-                            wws = tts._processing_on
-                            comm: double = self.get_comm_cost(tts, wws)
-                            old: double = wws._processing[tts]
-                            new: double = avg_duration + comm
-                            diff: double = new - old
-                            wws._processing[tts] = new
-                            wws._occupancy += diff
-                            self._total_occupancy += diff


This regression was partially introduced since the occupancy update here would not put a previously unknown task into the stealing whitelist. I didn't wan tto add on top of this function and decided to go for a refactoring instead

fjetter · 2021-10-06T16:54:12Z

distributed/scheduler.py

+    steal = state.extensions.get("stealing")
+    if not steal:
+        return
+    if ws._occupancy > old * 1.3 or old > ws._occupancy * 1.3:


There are also cases where the occupancy actually significantly reduced but we might still want to reapply stealing. Until #5379 is fixed, this might increase the likelihood of that deadlock but we should allow deviations in both directions

fjetter · 2021-10-07T10:39:37Z

Test failures appear to be unrelated

fjetter · 2021-10-13T13:07:11Z

There are the known cythonization errors in #5406 and a hard timeout with an interpreter shutdown in a windows run. I'll go on and merge since I don't think either is related

fjetter mentioned this pull request Oct 5, 2021

Resolve work stealing deadlock caused by race in move_task_confirm #5379

Merged

3 tasks

fjetter force-pushed the delete_tests_steal_unknown_functions branch from 85a0180 to b9d7ad7 Compare October 5, 2021 15:14

crusaderky changed the title ~~Fix regression where unknown tasks where allowed to be stolen~~ Fix regression where unknown tasks were allowed to be stolen Oct 6, 2021

fjetter force-pushed the delete_tests_steal_unknown_functions branch from 359f3bb to 3dd94ac Compare October 6, 2021 16:51

fjetter commented Oct 6, 2021

View reviewed changes

fjetter force-pushed the delete_tests_steal_unknown_functions branch from 68e2731 to 9db9d9a Compare October 7, 2021 09:15

fjetter mentioned this pull request Oct 7, 2021

Long running occupancy #5395

Merged

3 tasks

fjetter requested a review from crusaderky October 7, 2021 16:06

jrbourbeau mentioned this pull request Oct 7, 2021

Release 2021.10.0 dask/community#189

Closed

3 tasks

fjetter mentioned this pull request Oct 8, 2021

Fix a race condition which would allow a rescheduled task to be reported missing even though it is not #5160

Merged

fjetter force-pushed the delete_tests_steal_unknown_functions branch from e06033a to 15a14a5 Compare October 8, 2021 10:07

crusaderky approved these changes Oct 11, 2021

View reviewed changes

fjetter added 2 commits October 13, 2021 13:45

Delete tests test_dont_steal_unknown_functions

717b79a

refactor occupancy update in scheduler.py

7303aa2

fjetter force-pushed the delete_tests_steal_unknown_functions branch from 15a14a5 to 7303aa2 Compare October 13, 2021 11:53

fjetter merged commit 0959f50 into dask:main Oct 13, 2021

gjoseph92 mentioned this pull request Dec 6, 2021

Task stealing regression in 2021-11-0+ (preventing task load balancing) #5564

Closed

fjetter mentioned this pull request Dec 8, 2021

Allow unknown tasks to be stolen #5572

Merged

1 task

fjetter added the stealing label Jun 20, 2022

fjetter mentioned this pull request Aug 11, 2022

Root-ish tasks all schedule onto one worker #6573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression where unknown tasks were allowed to be stolen #5392

Fix regression where unknown tasks were allowed to be stolen #5392

fjetter commented Oct 5, 2021

fjetter Oct 6, 2021

fjetter Oct 6, 2021

fjetter commented Oct 7, 2021

fjetter commented Oct 13, 2021

Fix regression where unknown tasks were allowed to be stolen #5392

Fix regression where unknown tasks were allowed to be stolen #5392

Conversation

fjetter commented Oct 5, 2021

fjetter Oct 6, 2021

Choose a reason for hiding this comment

fjetter Oct 6, 2021

Choose a reason for hiding this comment

fjetter commented Oct 7, 2021

fjetter commented Oct 13, 2021