Dynamic annotations #5207

madsbk · 2021-08-12T15:14:23Z

As part of the discussion of Heterogeneous Computing Design, this PR implements dynamic annotations inspired by @mrocklin's option 4 in #4656.
The idea is that a user can specify a function that updates the annotations and restrictions of a task after its execution:

The worker runs the user specified annotations functions on the task output.
The worker sends the updated attributes back to the scheduler using a new "annotate-task" message.
The scheduler updates the attributes of the task and all tasks depending on it.

The following in an example of an dynamic annotation function that sets the worker executor to "gpu" when the task output is a CUDA device object. Together with a default GPU executor (#5084), this will make sure that all tasks preceding the first CUDA task will use the GPU executor.

def set_gpu_executor(ts: TaskState, value: object) -> bool:
    if dask_cuda.is_device_object(value):
        ts.annotations["executor"] = "gpu"
        return True
    return False

Notice, this will not annotate the first use of a GPU task, only its dependent tasks are annotated automatically. The user can of cause still annotate tasks manually.

Tests added / passed
Passes black distributed / flake8 distributed / isort distributed

mrocklin · 2021-08-16T13:33:21Z

distributed/scheduler.py

+        worker=None,
+        key: str = None,
+        annotations: Mapping = None,
+        resource_restrictions: Mapping = None,


I recommend making restrictions a separate route. I don't think that there is any reason for annotations and restrictions to be intertwined.

However, I do think that it would be useful to support a variety of different kinds of restrictions. You should probably roll this into the Scheduler.set_restrictions method. I think that it has a worker= keyword today as a hint that host= and resource= should come later.

Good point.
A related question, what is the different between regular handlers and stream handlers?
Can I use self.batched_stream.send for a "op": "set_restrictions" message when set_restrictions is in Scheduler.handlers?

Stream handlers are used with BatchedSend objects. They are strictly fire and forget. They're also very cheap to send. We accumulate several small messages in a list and then send off that list periodically (with a very short period)

The handlers are what receive the await rpc.foo(...) commands. They wait for and collect a response.

If you need a response then use await rpc.foo(...). If you want to send lots of little small messages without really affecting overhead then try to use a stream handler. You can certainly put a method in both stream handlers and handlers, although if the receiving end needs a response then this might require some cleverness.

Does that provide enough information?

Yes, thanks!

mrocklin · 2021-08-16T13:33:56Z

distributed/scheduler.py

+        key: str = None,
+        annotations: Mapping = None,
+        resource_restrictions: Mapping = None,
+        annotate_dependents=True,


I'm curious about the motivation behind this keyword. Can you expand here?

I am considering if an annotator function should be able to enable/disable annotations of task dependents. Alternativevly, we can always apply annotations to task dependents.

I agree that it's possibly in scope. I'm curious if there are active use cases for it today that you have in mind. If so, I would be curious to hear them (for example, maybe you need the results of gpu tasks are likely to also be gpu tasks). If not, then I would recommend waiting on this until we know more.

My gut reaction is not to do this until we have a motivating use case.

for example, maybe you need the results of gpu tasks are likely to also be gpu tasks

Yes, to implement automatic use of a GPU ThreadPoolExecutor (#5084). If the initial array or dataframe creation is annotated with the GPU executor, all the following operations are as well.

This will also be very helpful when mixing data types. By tracking the GPU use of each individual partition, we can utilize GPU and CPU workers simultaneously.

…ations

mrocklin

Some small comments. In general this looks fine to me though.

mrocklin · 2021-08-19T13:35:12Z

distributed/tests/test_worker.py

+        }
+        res1, res2 = c.get(dsk, ["g1", "g2"], sync=False, asynchronous=True)
+        assert "Executor1" in await res1
+        assert "Executor2" in await res2


The fact that you had to use get/compute here is intersting. There are interesting challenges with doing this with futures / dependencies that might change. I still think that this is a useful feature to have, but it seems like this approach might not be comprehensive if we want to solve things across update_graph calls. Agree or disagree?

Agree! I hadn't thought about this issue before today. As you say, it is still useful but I think we should put this PR on hold for a bit and see if we can come up with a better approach.

mrocklin · 2021-08-19T13:38:03Z

distributed/worker.py

+
+        # Separate annotation updates into two cases:
+        a1 = {}  # when the task's dependents should also be updated
+        a2 = {}  # when only the task itself should be updated


Do you have thoughts on better names here?

Heh, yeah I can do that :)

madsbk force-pushed the dynamic_annotations branch from 5247116 to c56bc9b Compare August 12, 2021 15:17

Clean up and type hints

dae7d59

madsbk force-pushed the dynamic_annotations branch from c56bc9b to dae7d59 Compare August 16, 2021 08:04

Implemented annotators

1ea3725

mrocklin reviewed Aug 16, 2021

View reviewed changes

Merge branch 'main' of github.com:dask/distributed into dynamic_annot…

ee5374e

…ations

madsbk mentioned this pull request Aug 16, 2021

Heterogeneous Computing Design #5201

Open

madsbk added 2 commits August 19, 2021 09:12

Merge branch 'main' into dynamic_annotations

682e3d1

Redesign annotator functions

8ad9cee

madsbk force-pushed the dynamic_annotations branch from 519ce1c to 8ad9cee Compare August 19, 2021 09:32

madsbk added 2 commits August 19, 2021 12:59

Fix tests

da122c3

annotate_task(): don't access private attributes

7494179

madsbk force-pushed the dynamic_annotations branch from 449ca4b to 7494179 Compare August 19, 2021 12:52

mrocklin reviewed Aug 19, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic annotations #5207

Dynamic annotations #5207

madsbk commented Aug 12, 2021 •

edited

Loading

mrocklin Aug 16, 2021

madsbk Aug 19, 2021 •

edited

Loading

mrocklin Aug 19, 2021

madsbk Aug 19, 2021

mrocklin Aug 16, 2021

madsbk Aug 16, 2021

mrocklin Aug 16, 2021

madsbk Aug 16, 2021 •

edited

Loading

mrocklin left a comment

mrocklin Aug 19, 2021

madsbk Aug 19, 2021

mrocklin Aug 19, 2021

madsbk Aug 19, 2021

Dynamic annotations #5207

Are you sure you want to change the base?

Dynamic annotations #5207

Conversation

madsbk commented Aug 12, 2021 • edited Loading

Choose a reason for hiding this comment

madsbk Aug 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madsbk Aug 16, 2021 • edited Loading

Choose a reason for hiding this comment

mrocklin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madsbk commented Aug 12, 2021 •

edited

Loading

madsbk Aug 19, 2021 •

edited

Loading

madsbk Aug 16, 2021 •

edited

Loading