-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically change task priorities #2860
Comments
Short term I have no personal plans to implement it. You could do it yourself if you were interested. Current policy is here: http://distributed.readthedocs.io/en/latest/scheduling-policies.html?highlight=priority#choosing-tasks In code you would want to search for |
Would be interested to hear your thought process as to how you would go about reprioritizing tasks. |
We would probably just send an The real challenge here is that the scheduler currently auto-increments priority on every new update_graph call, preferring older tasks to newer ones. It's not clear to me how we would change this policy. Maybe there is a third policy that overrides the other two.
|
In my implementation, I subclassed the scheduler and added a new message that just changes priorities stored on all workers. It doesn't influence already assigned tasks and it doesn't change priorities in previously submitted graphs, but in my case, at the start of my project, that's perfectly fine. |
Copying this comment over as it is relevant to this discussion.
|
Not sure I see the use case for altering the generation priority. Do you know of one @p-himik? However, for disabling it, this use case seems clear. It would be pretty easy to add a flag like |
@jakirkham The only thing I can think of is some high-priority tasks that appeared after a bunch of low-priority tasks have already been scheduled. E.g. all workers may be loaded with some week-long analysis, but we decided that we want some quick summary statistics on the data right now. |
Sure I understand the need for prioritizing tasks independently of when they are submitted. Just think that should be represented with a priority independent of their generation. |
There is a start to this here: dask/distributed#1651 I did it while I had some free time on a plane. I am not planning on continuing this work short term (my todo list is somewhat long) but if someone else wants to take it on that would be very welcome. |
Thanks for starting this. Happy to give it a look once we wrap up PR ( #2980 ). |
Opened issue ( dask/distributed#1753 ), which is similar to this one except that it would auto-propagate priority changes via user operations on Dask collections that have |
Dask is great for the current workflow that I'm using - read a number of tables, run some functions on each, output the result as a table where each cell either shows the initial table ID or the result of one of the functions in the pipeline.
Now that I'm implementing a UI for this table, the need to sort and filter interactively it becomes apparent, as well as the need to show intermediate results right as they're ready.
Judging by the source of
distributed.scheduler
, the current implementation doesn't allow for dynamic task priority. Are there any plans to implement it? Or maybe any ideas on how I can implement it in the best way possible?The text was updated successfully, but these errors were encountered: