Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate and remove unusual scheduler transitions to memory #7210

Closed
gjoseph92 opened this issue Oct 27, 2022 · 0 comments · Fixed by #7205
Closed

Investigate and remove unusual scheduler transitions to memory #7210

gjoseph92 opened this issue Oct 27, 2022 · 0 comments · Fixed by #7205
Assignees

Comments

@gjoseph92
Copy link
Collaborator

There are a few strange ways the scheduler lets tasks end up in memory:

  • transition_processing_memory (with unexpected worker)
  • transition_waiting_memory
  • transition_no_worker_memory

These cases don't have much testing, and it's hard to think of cases where it would be valid for them to happen.

They all revolve around the idea of a task completing on multiple workers at once. This is of course possible (anything is possible in a distributed system), but since removing worker reconnect #6361, it shouldn't be possible that a task completes on multiple connected workers at once. That is, before the scheduler would receive the task-finished message, the BatchedStream carrying that message should be disconnected, so the message wouldn't actually be processed.

So far, the only "valid" way we've come up with to trigger these strange transitions is Scheduler.reschedule, which shouldn't be used anyway #7209.

See discussions for background in:

We should investigate whether these transitions are actually still valid, and if not, remove them.

cc @crusaderky @fjetter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants