-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal worker state transitions #4413
Comments
Thank you for investigating this and for writing this up.
In the scheduler we do this as a catch-all. If a transition doesn't exist we transition to released and then transition to the desired state. Ideally we implement all valid transitions, but this allows us to be robust to cases that we have not anticipated. |
I've been working on a more complete definition of the worker state machine which is still WIP but I'd like to share already to get some feedback. This proposal is to some extend much more complex than what we have right now but in other aspects much simpler and I believe the bottom line is a more robust state machine which is also easier to be extended. In particular, I can see much more room for optimistic task assignments or even the possibility for graph reconstruction upon scheduler failure (I won't go into this other than saying that by not deleting stuff prematurely we have all necessary information on the cluster). I will try to not discuss implementation unless absolutely required but rather focus on the desired state of how things should look like. Below you can see the state transition diagram for what I would propose.
(Note: PNG might be out of date. For most recent version check link below, feel free to leave comment but its still WIP) https://lucid.app/lucidchart/invitations/accept/inv_6298eb14-8172-4eb8-9f68-e4f6cb0b36ef cc @gforsyth |
This is great, @fjetter -- thanks for sharing it!
I'm very much on board with this (and the rest of your points) -- clarifying question here, the |
I would love to see that image (or some future version of that image) end
up in developer docs at distributed.dask.org. I think that that would help
future folks.
…On Mon, Apr 12, 2021 at 12:13 PM Gil Forsyth ***@***.***> wrote:
This is great, @fjetter <https://github.com/fjetter> -- thanks for
sharing it!
- We will never remove or delete information from a TaskState
instance. In particular, the *intention* is never inferred by whether
or not a given attribute exists, is null, etc. This is particularly
important for the runnable vs not-runnable transition which should be
replaced with a dedicated attribute instead of basing this decision on runspec
is None. This allows for easier recovery for the state machine by
simply "starting from the beginning"
I'm very much on board with this (and the rest of your points) --
clarifying question here, the state of a TaskState instance changes --
these are currently tracked in self.story but would we want to include a
history of previous states / transitions in the TaskState instance itself?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4413 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTF5GXGEJLRH2BIRAZDTIMS4RANCNFSM4V2JONHQ>
.
|
Short summary of an offline discussion with @gforsyth with some hints about possible implementations
|
We've recently merged a big PR which addresses some of the deadlock situations we've seen lately. See #4784 The deadlock fixes will be released later today, see dask/community#165 |
In #4360 various deadlock situations are investigated where some of them are connected to the way valid states and state transitions of the worker[1]
TaskState
objects are defined.The current documentation is slightly outdated since dependencies and runnable tasks where consolidated in #4107. The current state transitions are now following the pipeline (omitting
long-running
,error
,constrained
for the sake of clarity [2])What's remarkable about this transition pipeline is that virtually all states allow a transition to
memory
and there are multiple allowed transition paths which are only allowed in very specific circumstances and only upon intervention via the scheduler. For instance, a task in stateflight
may be transitioned viaready
toexecuting
but this is only possible if the worker actually possess the knowledge about how to execute a given task, i.e. theTaskState
object possesses a set attributerunspec
. This attribute is usually only known to the worker if the scheduler intents for this worker to actually execute the task. This transition path is, however, allowed since a dependency, i.e. a task without the knowledge of how to compute it, is reassigned by the scheduler for computation on this worker. This may happen if the worker where the task was intended to be computed on originally is shut down.This ambiguity is essentially introduced by not distinguishing between dependencies and runnable tasks anymore. What I would propose is to make this distinction explicit in the
state
of the tasks. Consider the following pipelineEvery task starts off in
new
. This is effectively a dummy state and could be omitted. It represents a known task which hasn't been classified into "can the worker compute this task or not". Based on the answer of this question it is put into the stateswaiting_for_dependencies
: This task is intended to be computed by this worker. Once all dependencies are available on this worker, it is supposed to be transitioned toready
to be queued up for execution. (No dependencies is a special subset of this case)waiting_to_fetch
: This task is not intended to be computed on this worker but theTaskState
on this worker merely is a reference to a remote data key we are about to fetch.The red transition is only possible via scheduler interference once the scheduler reassigns a task to be computed on this worker. This is relatively painless as long as the
TaskState
is in a valid state (in particularrunspec
is set)Purple is similar but in this case the worker was already trying to fetch a dependency. It is similar to the red transition with the exception that a
gather_dep
was already scheduled and this worker is currently trying to fetch a result. If that was actually successful we might be in a position where we fast track the "to be executed" task.I believe defining these transitions properly is essential and we should strive to set up a similar, if not identical, state machine as in the scheduler (w/ recommendations / chained state transitions). This is especially important since there are multiple data structures to keep synchronized (e.g.
Worker.ready
,Worker.data_needed
,Worker.in_flight_workers
to name a few) on top of the tasks themselves.Last but not least, there have been questions around how
Worker.release_key
works, when it is called and what data is actually stored inWorker.data
(is it always a subset of tasks or not). I believe settling the allowed state transitions should help settle these questions.Alternative: Instead of implementing red/purple we could just reset to
new
and start all transitions from scratch. that would help reduce the number of edges/allowed transitions but would pose similar problems as the purple path in case thegather_dep
is still runningMy open questions:
cc @gforsyth
[1] The
TaskState
objects of the scheduler follow a different definition and allow different state transitions. I consider the consolidation of the two out of scope for this issue.[2] Especially the state
error
is very loosely defined and tasks can be transitioned to error from almost every start statePossibly related issues
#4724
#4587
#4439
#4550
#4133
#4721
#4800
#4446
The text was updated successfully, but these errors were encountered: