-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative scheduling for new tasks #2940
Conversation
Rather than placing new tasks with no dependencies on the first idle worker, we try placing them on a worker executing tasks they're a co-depenency with.
# If time weren't an issue, we might find the worker with the | ||
# most siblings. But that's expensive. | ||
# | ||
for sts in dts.dependencies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several situations where a single task has very many dependents. In these cases I think that we'll hit N^2 scaling and bring things down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about cases where we don't have siblings, but cousins n'th removed
a1
|
a2 b1
\ /
c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my initial got a full count of where each of our co-dependencies was running. That blew up very quickly. The early break
once we find a co-dependency was a first attempt to avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach won't help in that case (I think a-1
and b-1
are niblings 😄).
|
||
a-1 a-2 a-3 a-4 | ||
\ / \ / | ||
b-1 b-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that all ascii art diagrams in the codebase so far have computation going from bottom to top. This is also the way that visualize works.
A long while ago we used to schedule things differently if they call came in in the same batch. We wouldn't do things one by one, we would take all of the initially free tasks, sort them by their dask.order value, and then partition them among the workers in that order. This worked well because nodes that have similar ordering values are likely to be closely related. However this works poorly if...
|
As an aside, a common cause of the graphs that you're dealing with come from, I think, not doing high-level-graph fusion aggressively enough. I think that if we had data ingestion operations fused as we currently fuse blockwise that this situation would occur much less frequently. This is a less general solution, but handling it well would be an unambiguous benefit, while core scheduling always has tradeoffs. I don't know the exact operation that you're trying to deal with, but it might be better handled by bringing operations like read_parquet, from_array, and others, under the Blockwise banner. |
That's interesting to hear. I briefly looked into trying to fix things earlier on since it's so hard to satisfy the "schedule co-dependencies together" goal this late in the scheduling process (at the single-task level). I didn't explore it much, since it seems to go against how things are done currently.
Does too aggressive of fusion have a negative impact when you have multiple threads per worker? e.g. with
we might want to ensure that I'll look into blockwise a bit. Perhaps updating Xarray's |
Maybe, but it's not common with collections (which is where we'll get high level blockwise fusion), where we commonly have far more partitions than we have threads.
The challenge is that blockwise currently expects to operate on Dask collections. There isn't a clean way of using it to start up a new graph. |
I'm not actively working on this at the moment. Closing to clear the backlog. |
Rather than placing new tasks with no dependencies on the first
idle worker, we try placing them on a worker executing tasks they're
a co-dependency with. This helps to reduce memory usage of graphs like
This is meant to address #2602. Will require some testing
I'm writing up a bunch of benchmarks on synthetic workloads now. Will try out on some real workloads as well.