Investigate and remove unusual scheduler transitions to memory #7210

gjoseph92 · 2022-10-27T18:02:39Z

There are a few strange ways the scheduler lets tasks end up in memory:

transition_processing_memory (with unexpected worker)
transition_waiting_memory
transition_no_worker_memory

These cases don't have much testing, and it's hard to think of cases where it would be valid for them to happen.

They all revolve around the idea of a task completing on multiple workers at once. This is of course possible (anything is possible in a distributed system), but since removing worker reconnect #6361, it shouldn't be possible that a task completes on multiple connected workers at once. That is, before the scheduler would receive the task-finished message, the BatchedStream carrying that message should be disconnected, so the message wouldn't actually be processed.

So far, the only "valid" way we've come up with to trigger these strange transitions is Scheduler.reschedule, which shouldn't be used anyway #7209.

See discussions for background in:

We should investigate whether these transitions are actually still valid, and if not, remove them.

cc @crusaderky @fjetter

The text was updated successfully, but these errors were encountered:

gjoseph92 added the scheduler label Oct 27, 2022

gjoseph92 mentioned this issue Oct 27, 2022

Edge and impossible transitions to memory #7205

Merged

crusaderky self-assigned this Oct 28, 2022

crusaderky closed this as completed in #7205 Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate and remove unusual scheduler transitions to memory #7210

Investigate and remove unusual scheduler transitions to memory #7210

gjoseph92 commented Oct 27, 2022

Investigate and remove unusual scheduler transitions to memory #7210

Investigate and remove unusual scheduler transitions to memory #7210

Comments

gjoseph92 commented Oct 27, 2022