-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regression where unknown tasks were allowed to be stolen #5392
Fix regression where unknown tasks were allowed to be stolen #5392
Conversation
85a0180
to
b9d7ad7
Compare
359f3bb
to
3dd94ac
Compare
if s: | ||
for tts in s: | ||
if tts._processing_on is not None: | ||
wws = tts._processing_on | ||
comm: double = self.get_comm_cost(tts, wws) | ||
old: double = wws._processing[tts] | ||
new: double = avg_duration + comm | ||
diff: double = new - old | ||
wws._processing[tts] = new | ||
wws._occupancy += diff | ||
self._total_occupancy += diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regression was partially introduced since the occupancy update here would not put a previously unknown task into the stealing whitelist. I didn't wan tto add on top of this function and decided to go for a refactoring instead
steal = state.extensions.get("stealing") | ||
if not steal: | ||
return | ||
if ws._occupancy > old * 1.3 or old > ws._occupancy * 1.3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also cases where the occupancy actually significantly reduced but we might still want to reapply stealing. Until #5379 is fixed, this might increase the likelihood of that deadlock but we should allow deviations in both directions
68e2731
to
9db9d9a
Compare
Test failures appear to be unrelated |
e06033a
to
15a14a5
Compare
15a14a5
to
7303aa2
Compare
There are the known cythonization errors in #5406 and a hard timeout with an interpreter shutdown in a windows run. I'll go on and merge since I don't think either is related |
This has been marked flaky in #3574. The problem is that the test is written on such a high level that it may be true regardless of whether we allow unknown functions or not. In particular, we're setting duration estimates for every task once they are in state processing. If we do not explicitly verify if a task is unknown, we cannot blacklist it from stealing.