Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor gather_dep #6388

Merged
merged 14 commits into from
Jun 10, 2022
Merged

Refactor gather_dep #6388

merged 14 commits into from
Jun 10, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented May 20, 2022

Changes in transition logic

Simplify transitions both in case of missing keys in a successful response and in case of network failure.
This will slightly delay transitions to missing, but it allows for much cleaner and simpler code.

@crusaderky crusaderky requested review from fjetter and gjoseph92 May 20, 2022 00:42
@crusaderky crusaderky self-assigned this May 20, 2022
@crusaderky crusaderky marked this pull request as ready for review May 20, 2022 00:42
@github-actions
Copy link
Contributor

github-actions bot commented May 20, 2022

Unit Test Results

       15 files  +       15         15 suites  +15   6h 54m 37s ⏱️ + 6h 54m 37s
  2 857 tests +  2 857    2 733 ✔️ +  2 733    81 💤 +  81  39 +39  4 🔥 +4 
21 165 runs  +21 165  20 157 ✔️ +20 157  945 💤 +945  59 +59  4 🔥 +4 

For more details on these failures and errors, see this check.

Results for commit d985e6b. ± Comparison against base commit bde90af.

♻️ This comment has been updated with latest results.

crusaderky added a commit to crusaderky/distributed that referenced this pull request May 21, 2022
crusaderky added a commit to crusaderky/distributed that referenced this pull request May 21, 2022
crusaderky added a commit to crusaderky/distributed that referenced this pull request May 31, 2022
@crusaderky crusaderky marked this pull request as draft May 31, 2022 14:11
crusaderky added a commit to crusaderky/distributed that referenced this pull request Jun 1, 2022
crusaderky added a commit to crusaderky/distributed that referenced this pull request Jun 1, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  +       15         15 suites  +15   6h 32m 18s ⏱️ + 6h 32m 18s
  2 857 tests +  2 857    2 773 ✔️ +  2 773    81 💤 +  81    3 +  3 
21 165 runs  +21 165  20 204 ✔️ +20 204  944 💤 +944  17 +17 

For more details on these failures, see this check.

Results for commit de0adb0. ± Comparison against base commit bde90af.

@crusaderky
Copy link
Collaborator Author

@crusaderky crusaderky marked this pull request as ready for review June 8, 2022 18:05
@crusaderky crusaderky requested a review from fjetter June 8, 2022 18:05
@crusaderky
Copy link
Collaborator Author

@fjetter Ready for review!

Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm excited about this, and it seems good to me. I'm excited to see that finally clause gone finally.

distributed/worker.py Show resolved Hide resolved
else:
refetch.add(ts)

return merge_recs_instructions(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we'll eventually be able to refactor all these merge_recs_instructions out, or is this here to stay because of the pattern of using the helper functions and ensure_communicating?

Copy link
Collaborator Author

@crusaderky crusaderky Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here to stay. The alternative would be to pass recommendations, transitions to the helper functions and let them write back into them.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  +       15         15 suites  +15   6h 27m 52s ⏱️ + 6h 27m 52s
  2 861 tests +  2 861    2 779 ✔️ +  2 779    80 💤 +  80  1 +1  1 🔥 +1 
21 196 runs  +21 196  20 256 ✔️ +20 256  938 💤 +938  1 +1  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit b31c3fc. ± Comparison against base commit 059798a.

♻️ This comment has been updated with latest results.

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like us reusing code in the different handlers. I think we can simplify this significantly, see #6544

Comment on lines 3371 to 3373
def _refetch_missing_data(
self, worker: str, tasks: Iterable[TaskState], stimulus_id: str
) -> RecsInstrs:
Copy link
Member

@fjetter fjetter Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly dislike the idea of reusing this code. This "helper" already shows signs of too much complexity, we're dealing with many individual ts.state values to make a decision. If we restrict ourselves to the specialized invocation this should be much less complex.

Reusing code was one of the major reasons why the except/except/finally block caused so many problems.

Particularly with the MissingDataMsg singal in here (#6445) I do not trust this to be the correct answer for both success-but-missing and network-failure responses, see also #6112 (comment)

Comment on lines 3470 to 3471
refetch = set(self._gather_dep_done_common(ev))
refetch |= {self.tasks[key] for key in self.has_what[ev.worker]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these two sets should be mixed, see my earlier comment about the shared "helper" function.

@crusaderky
Copy link
Collaborator Author

@fjetter code review incorporated (with amendments) in 10031c8 + #6546

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@crusaderky crusaderky merged commit df1eaba into dask:main Jun 10, 2022
@crusaderky crusaderky deleted the WSMR/gather_dep branch June 10, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate ensure_communicating transitions to new WorkerState event mechanism
3 participants