Dynamically change task priorities #2860

p-himik · 2017-11-03T07:24:50Z

Dask is great for the current workflow that I'm using - read a number of tables, run some functions on each, output the result as a table where each cell either shows the initial table ID or the result of one of the functions in the pipeline.

Now that I'm implementing a UI for this table, the need to sort and filter interactively it becomes apparent, as well as the need to show intermediate results right as they're ready.

Judging by the source of distributed.scheduler, the current implementation doesn't allow for dynamic task priority. Are there any plans to implement it? Or maybe any ideas on how I can implement it in the best way possible?

The text was updated successfully, but these errors were encountered:

mrocklin · 2017-11-06T14:35:11Z

Short term I have no personal plans to implement it. You could do it yourself if you were interested. Current policy is here: http://distributed.readthedocs.io/en/latest/scheduling-policies.html?highlight=priority#choosing-tasks

In code you would want to search for priority in client.py and scheduler.py.

jakirkham · 2017-11-22T00:02:02Z

Would be interested to hear your thought process as to how you would go about reprioritizing tasks.

mrocklin · 2017-11-22T02:45:57Z

We would probably just send an update-graph message with new priority values without any graph. This almost certainly doesn't work now, but could be made to.

The real challenge here is that the scheduler currently auto-increments priority on every new update_graph call, preferring older tasks to newer ones. It's not clear to me how we would change this policy. Maybe there is a third policy that overrides the other two.

(user-defined-priority, scheduler first come first served priority, priority from graph placement)

p-himik · 2017-11-23T14:08:33Z

In my implementation, I subclassed the scheduler and added a new message that just changes priorities stored on all workers. It doesn't influence already assigned tasks and it doesn't change priorities in previously submitted graphs, but in my case, at the start of my project, that's perfectly fine.
However, as more users start to use the service, I think it would make sense to have the ability to alter and/or completely disable the increment of the generation priority, so even if a user submits a graph a bit later than another user, he won't have to wait till all of the workers are finished with the tasks from that user.

jakirkham · 2017-12-13T19:34:28Z

Copying this comment over as it is relevant to this discussion.

Pointers as to what to look at and try wouldn't hurt. Whether I'll be able to follow through is another question. At least it would give me an opportunity to familiarize myself more with how job scheduling works.

I would start at distributed/scheduler.py::Scheduler.update_graph, in particular this line
           self.priority[key] = (generation, new_priority[key])  # prefer old
This is where we decide to first prefer first-come-first-served (the generation variable), and then to prefer the graph-based priority. Probably we want to add a third element just before generation that is user-defined priorities.

The user provides information to this function in the distributed/client.py::Client._graph_to_futures method, which is used from methods like compute and persist.

jakirkham · 2017-12-15T20:15:35Z

it would make sense to have the ability to alter and/or completely disable the increment of the generation priority

Not sure I see the use case for altering the generation priority. Do you know of one @p-himik?

However, for disabling it, this use case seems clear. It would be pretty easy to add a flag like inc_generation to update_graph, which defaults to True, but could be set to False to skip increasing the generation when calling update_graph. This assuming that we agree update_graph is the place where we want to address this issue.

p-himik · 2017-12-17T12:48:39Z

@jakirkham The only thing I can think of is some high-priority tasks that appeared after a bunch of low-priority tasks have already been scheduled. E.g. all workers may be loaded with some week-long analysis, but we decided that we want some quick summary statistics on the data right now.

jakirkham · 2017-12-17T18:14:44Z

Sure I understand the need for prioritizing tasks independently of when they are submitted. Just think that should be represented with a priority independent of their generation.

mrocklin · 2017-12-21T14:09:49Z

There is a start to this here: dask/distributed#1651

I did it while I had some free time on a plane. I am not planning on continuing this work short term (my todo list is somewhat long) but if someone else wants to take it on that would be very welcome.

jakirkham · 2018-01-02T21:02:02Z

Thanks for starting this. Happy to give it a look once we wrap up PR ( #2980 ).

jakirkham · 2018-02-12T19:14:08Z

Opened issue ( dask/distributed#1753 ), which is similar to this one except that it would auto-propagate priority changes via user operations on Dask collections that have Futures in them.

jakirkham mentioned this issue Nov 22, 2017

ENH: Delayed variant of persist (pin?) #2156

Open

jakirkham added the scheduler label Dec 4, 2017

mrocklin mentioned this issue Dec 21, 2017

Add user-defined priorities dask/distributed#1651

Merged

jakirkham mentioned this issue Jan 30, 2018

Dask image subpackage #3111

Closed

mrocklin closed this as completed in dask/distributed#1651 Feb 7, 2018

jakirkham mentioned this issue Feb 12, 2018

Dynamically adjusting priority via Futures dask/distributed#1753

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically change task priorities #2860

Dynamically change task priorities #2860

p-himik commented Nov 3, 2017

mrocklin commented Nov 6, 2017

jakirkham commented Nov 22, 2017

mrocklin commented Nov 22, 2017

p-himik commented Nov 23, 2017

jakirkham commented Dec 13, 2017

jakirkham commented Dec 15, 2017

p-himik commented Dec 17, 2017

jakirkham commented Dec 17, 2017

mrocklin commented Dec 21, 2017

jakirkham commented Jan 2, 2018

jakirkham commented Feb 12, 2018

Dynamically change task priorities #2860

Dynamically change task priorities #2860

Comments

p-himik commented Nov 3, 2017

mrocklin commented Nov 6, 2017

jakirkham commented Nov 22, 2017

mrocklin commented Nov 22, 2017

p-himik commented Nov 23, 2017

jakirkham commented Dec 13, 2017

jakirkham commented Dec 15, 2017

p-himik commented Dec 17, 2017

jakirkham commented Dec 17, 2017

mrocklin commented Dec 21, 2017

jakirkham commented Jan 2, 2018

jakirkham commented Feb 12, 2018