In-process communication transports #887

pitrou · 2017-02-21T16:05:35Z

This adds a new communication transport named inproc that is backed by in-process queues. The aim is to later allow local cluster instances to use such communication channels instead of TCP, to reduce the I/O overhead. That will be a bit more involved than I initially assumed, though, due to the notion of host and port being ingrained in some places (especially the scheduler).

…ng module

pitrou · 2017-02-21T16:06:49Z

distributed/comm/inproc.py

+            raise QueueEmpty
+
+
+def _maybe_deserialize(msg):


I would rather avoid such code in the communication layer. @mrocklin, do you think it's ok to put this in distributed.protocol?

Or perhaps you want to suggest another approach.

I don't have a strong opinion on this.

pitrou · 2017-02-21T16:08:48Z

distributed/comm/tests/test_comms.py

+@gen.coroutine
+def check_deserialize(addr):
+    # Create a valid Serialized object
+    # (if using serialize(), it will lack a compression header)


Is this by design? Do you think we can change distributed.protocol.loads to assume no compression if the compression header is absent?

If no compression information is present in the header I would assume that the data is not compressed. Is this not normally default behavior?

No, loads() always calls decompress() which fails with a KeyError if compression is not found in the header.

Avoiding that behavior seems sensible to me

mrocklin

It would be useful to include an integration test that actually uses inproc for some trivial dask.array work. It would also be useful to make it easier for naive users to set up a local cluster using inproc. I think that some basic user-focused documentation would go a long way here.

mrocklin · 2017-02-23T19:28:32Z

distributed/comm/tests/test_comms.py

+
+
+def run_coro(func, *args, **kwargs):
+    return func(*args, **kwargs)


from .compatibility import apply ?

Uh, why not indeed. I guess I never use apply :-)

mrocklin · 2017-02-23T19:36:39Z

distributed/protocol/serialize.py

+                other.frames == self.frames)
+
+    def __ne__(self, other):
+        return not (self == other)


Why was defining eq necessary? Should we also now define __hash__?

Just for the unit tests. It has no use in the rest of the code base. __hash__ isn't necessary as we don't use those objects as keys in associative containers.

We don't in this codebase, that's true, but including objects in containers is generally useful. I'm fairly confident that I've put serialized objects in containers before while testing. I'm somewhat confident that this will come up for someone in the future when building something atypical. If it is cheap to keep this door open then I would prefer to do so.

I see. For Serialize objects we can easily piggyback on the original object's __hash__ method. For Serialized object it's quite dangerous to assume hashability while the original object may not be hashable.

Concerned about mutation? I'm generally willing to assume that the user won't change values after giving them to us. We don't really have a reasonable way of proceeding correctly if we don't make this assumption.

My reasoning was that if a Serialized object is somehow meant to represent the original object, then its hashability should also reflect the original object's hashability.

To be frank, I'm not sure what hashability really brings for Serialize and Serialized objects, but I'm happy to defer to your judgement.

I don't have a specific example concern in this case, I've just been bitten by projects in the past where people define __eq__ for convenience and then suddenly I can't use those objects with other python containers. I now avoid __eq__ by a strong habit unless there is a good API reason to implement it (such as with dask.array). So this may just be an irrational avoidance on my part, but it's still something I'd prefer to avoid if it's not particularly costly to avoid having an __eq__ method.

Ok, so I think I'll remove Serialized.__eq__ (which doesn't seem to be needed actually) and also implement Serialize.__hash__.

mrocklin · 2017-02-23T19:38:32Z

distributed/tests/test_core.py

-    def f():
-        server = Server({'ping': pingpong})
-        server.listen(8883)
+    with listen_on(Server, 'inproc://') as server:


The inproc stuff still seems pretty experimental to me. I hesitate to integrate it into our core tests. It would be nice to make sure that we can extract this cheaply in the future if we decide we don't want to keep it around.

That was meant to verify that the basic core infrastructure works with inproc. I think a grep -r inproc distributed would easily point the places where it's used :-)

Ah, I see. I was misreading things because of how the diff was laid out. I thought that you were replacing existing tests with inproc tests. Looking at the flat file makes it more clear that this isn't occurring.

pitrou · 2017-02-23T19:43:17Z

It would be useful to include an integration test that actually uses inproc for some trivial dask.array work.

As stated, it doesn't work yet. The followup PRs start doing the cleanup and refactor work needed to make it happen.

mrocklin · 2017-02-23T21:10:07Z

As stated, it doesn't work yet. The followup PRs start doing the cleanup and refactor work needed to make it happen.

How would you like to go about this then? Refactor out the small pieces of those PRs into clean commits and merge them before this? Merge all of the PRs into a single PR? Merge this and then merge those later?

pitrou · 2017-02-23T21:11:06Z

Merge this and then merge those later?

This would have my preference. Reordering is not really doable: the refactors are based on this PR.

mrocklin · 2017-02-23T21:59:08Z

I would also still like to release before merging this if possible. I will try to get that done tonight.

mrocklin · 2017-02-25T14:00:19Z

This has a merge conflict. I'm +1 otherwise.

pitrou added 6 commits February 20, 2017 19:35

Add distributed.comm.inproc draft

f8aa70e

Replace close_request hack with a custom peekable queue

1f5f2ca

Add deserialization tests

0615b3c

Add multi-thread test for inproc queues

aba7147

Add inproc tests to test_core

b8dc668

Refactor address manipulation functions in a separate d.comm.addressi…

95c4148

…ng module

pitrou commented Feb 21, 2017

View reviewed changes

pitrou added 3 commits February 21, 2017 17:26

Disable zmq test when zmq isn't explicitly enabled

e1e2c92

Fix loads() bug with hand-created Serialized object

978a161

Move deserialization helper into distributed.protocol

c55656a

This was referenced Feb 21, 2017

Remove deserialize argument from Comm.read() #888

Merged

Add communications benchmarks dask/dask-benchmarks#9

Merged

pitrou added the enhancement Improve existing functionality or make things work better label Feb 23, 2017

Merge branch 'master' of https://github.com/dask/distributed into inproc

5af1895

pitrou mentioned this pull request Feb 23, 2017

Alternative destination-specific serialization #400

Open

mrocklin reviewed Feb 23, 2017

View reviewed changes

pitrou added 2 commits February 27, 2017 10:28

Add a __hash__ to Serialize

fdda02f

Merge branch 'master' of https://github.com/dask/distributed into inproc

dd345c6

pitrou merged commit 8f620a1 into dask:master Feb 27, 2017

jakirkham mentioned this pull request Mar 24, 2020

Investigate UNIX domain sockets #3630

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-process communication transports #887

In-process communication transports #887

pitrou commented Feb 21, 2017

pitrou Feb 21, 2017

pitrou Feb 21, 2017

mrocklin Feb 21, 2017

pitrou Feb 21, 2017

mrocklin Feb 21, 2017

pitrou Feb 21, 2017

mrocklin Feb 21, 2017

mrocklin left a comment

mrocklin Feb 23, 2017

pitrou Feb 23, 2017

mrocklin Feb 23, 2017

pitrou Feb 23, 2017

mrocklin Feb 23, 2017

pitrou Feb 23, 2017

mrocklin Feb 23, 2017

pitrou Feb 23, 2017

mrocklin Feb 23, 2017

pitrou Feb 23, 2017

mrocklin Feb 23, 2017

pitrou Feb 23, 2017 •

edited

Loading

mrocklin Feb 23, 2017

pitrou commented Feb 23, 2017

mrocklin commented Feb 23, 2017

pitrou commented Feb 23, 2017

mrocklin commented Feb 23, 2017

mrocklin commented Feb 25, 2017



		def run_coro(func, args, *kwargs):
		return func(args, *kwargs)

In-process communication transports #887

In-process communication transports #887

Conversation

pitrou commented Feb 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou Feb 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Feb 23, 2017

mrocklin commented Feb 23, 2017

pitrou commented Feb 23, 2017

mrocklin commented Feb 23, 2017

mrocklin commented Feb 25, 2017

pitrou Feb 23, 2017 •

edited

Loading