Merge frames in `deserialize_bytes` #3639

jakirkham · 2020-03-26T06:48:24Z

It appears that we are splitting frames in serialize_byteslist so that we can compress them. However we are not merging them back together afterwards during deserialization. This can cause an exception to be raised by serializers that expected their frames to be structured in a particular way.

To fix this, we make sure to call merge_frames after decompress in deserialize_bytes. Further we make sure to pack the lengths of the original frames in the header (if not already present) in serialize_byteslist. This should ensure deserializers get the original frame structuring back when they operate on them.

quasiben · 2020-03-26T14:03:21Z

distributed/protocol/serialize.py

@@ -440,6 +441,8 @@ def replace_inner(x):

 def serialize_bytelist(x, **kwargs):
    header, frames = serialize(x, **kwargs)
+    if "lengths" not in header:


We (you :) ) recently added lengths to CUDA object headers. When is lengths not in header ? Is this something we should be requiring ?

Yeah still wrapping my head around this. My understanding is the header from the object has already gone through msgpack at this point so is actually a frame as well. So it may be we always need to set the lengths here.

jakirkham · 2020-03-27T18:56:42Z

@mrocklin, do you have any thoughts on this? 🙂

mrocklin · 2020-03-27T22:14:16Z

I have no particular thoughts on this.

mrocklin · 2020-03-28T16:37:35Z

Does this need a test?

mrocklin · 2020-04-05T17:14:38Z

This appears to be stalled. @jakirkham @quasiben are you all still active here? Should we close this?

jakirkham · 2020-04-07T08:26:10Z

Yeah still working on this.

martindurant · 2020-04-13T14:07:04Z

Another friendly ping and status request, just to keep this thread going.

jakirkham · 2020-04-13T15:08:22Z

I’ve been out on PTO for the last week

In some cases where the frames are particularly large, we may opt to split them into smaller frames. This may be due to performance reasons when transmitting data or it may be due to limitations like those of compressors used to compact frames. So include a test case that we know will get split to make sure it is handled correctly. Or at least make sure we are catching errors that would cause it to be mishandled.

It appears that we are splitting frames in `serialize_byteslist` so that we can compress them. However we are not merging them back together afterwards during deserialization. This can cause an exception to be raised by serializers that expected their frames to be structured in a particular way. To fix this, we make sure to call `merge_frames` after `decompress` in `deserialize_bytes`. Further we make sure to pack the `lengths` of the original `frames` in the `header` (if not already present) in `serialize_byteslist`. This should ensure deserializers get the original frame structuring back when they operate on them.

jakirkham · 2020-06-04T02:06:42Z

@michaelnarodovitch @gshimansky, could you please give this a try?

jakirkham · 2020-06-04T03:08:47Z

Also sorry for the long delay Martin and Matt. This now has a test. First commit demonstrates this fails without the change. Second commit demonstrates the change fixes it.

martindurant · 2020-06-04T13:07:22Z

In as much as this fixes something that was shown broken, I am happy; but I can't pretend to understand the reason.
Does the extra call to merge_frames have any cost for the typical single-frame case? I expect not.

mnarodovitch · 2020-06-04T15:05:17Z

Pulled via pip install git+https://github.com/jakirkham/distributed.git@33594d3c22dfa3fac4fdb82f7860da3492dafd49` and reran my (previously failing) usecase with that version. Works flawless now.

The fix reveals the failure mode in my usecase. The dataframe, read from disk after it was spilled, ended up being split into multiple frames, which, in turn, caused pickle being called with the buffers keyword from python 3.8 via
https://github.com/dask/distributed/blob/master/distributed/protocol/serialize.py#L62

gshimansky · 2020-06-04T15:32:13Z

Yes! I checked your branch merge_bytes_frames and on that branch I don't have this problem any more. Please merge this change and publish a new release.

martindurant · 2020-06-04T15:35:30Z

I'd just like to repeat, @jakirkham : merge_frames does not special-case one frame exactly, so it may produce a performance regression, no?

jakirkham · 2020-06-04T15:39:14Z

Yeah I'm just looking at this now, Martin. Will see if we have a sensible way to fast path it.

Had asked folks to try this out to confirm it actually works for them (before spending more time). Sounds like it is working and therefore worth spending more time on.

martindurant · 2020-06-04T15:41:27Z

Perfect. I imagine an if len(frames) == 1 branch in merge_frames would be reasonable

As using the `bytes` constructor will cause a copy (even if it is provided a `bytes` object), special case handling a `bytes` object and just return.

jakirkham · 2020-06-04T19:32:12Z

Have pushed a commit to short circuit where we want a single frame from merging. Also added another test case where there are two frames that both get split. Finally realized that ensure_bytes was copying data that it shouldn't. So pushed a fix for that as well. Please let me know what you think 🙂

martindurant · 2020-06-04T19:37:04Z

distributed/utils.py

@@ -931,7 +931,9 @@ def ensure_bytes(s):
    >>> ensure_bytes(b'123')
    b'123'
    """
-    if hasattr(s, "encode"):
+    if isinstance(s, bytes):


I don't believe bytes(s) does make a copy if s is already bytes

Hmm...maybe I'm misremembering. In any event we seem to have similar code in Dask.

OK, well never mind - this version shouldn't hurt. I wonder why not just call the dask version? I suppose this one can work on memoryviews and bytearrays too.

Yeah I was thinking about that as well, but didn't want to go down a rabbit hole here. Am ok pulling this out into a separate PR so we can explore orthogonally. WDYT?

I'm saying your version is fine :)

FWIW updating the Dask implementation to contain the code from Distributed in PR ( dask/dask#9050 ). Then we should be able to switch to the Dask implementation in Distributed.

Edit: Switched over to using the Dask implementation in Distributed with PR ( #6295 ).

martindurant · 2020-06-04T19:47:53Z

distributed/protocol/utils.py

@@ -67,6 +67,9 @@ def merge_frames(header, frames):
    if all(len(f) == l for f, l in zip(frames, lengths)):
        return frames

+    if len(lengths) == 1:
+        return [b"".join(map(ensure_bytes, frames))]


Does it not imply only one frame, so you can return ensure_bytes(frames[0]) ?

No it doesn't. It implies there was only one frame. However we may already have split it up during serialization. Noted the line where this happens below.

Is it worth, then, checking for exactly one frame, to avoid the copy during join?

Not sure. Was trying to integrate your feedback above. Happy to revert it if you prefer 🙂

Tried to respond inline, but GitHub didn't like it. Moved here: #3639 (review)

jakirkham · 2020-06-04T20:11:02Z

distributed/protocol/serialize.py

@@ -473,6 +474,8 @@ def replace_inner(x):

 def serialize_bytelist(x, **kwargs):
    header, frames = serialize(x, **kwargs)
+    if "lengths" not in header:
+        header["lengths"] = tuple(map(nbytes, frames))
    frames = sum(map(frame_split_size, frames), [])


Note that the frames are already split here. This is due to constraints caused by compression (which happens below).

jakirkham · 2020-06-04T20:22:36Z

distributed/protocol/utils.py

+    if len(lengths) == 1:
+        return [b"".join(map(ensure_bytes, frames))]


Or do you mean you would like to see something like this?

Suggested change

if len(lengths) == 1:

return [b"".join(map(ensure_bytes, frames))]

if len(lengths) == 1 and len(frames) == 1:

return frames

Yes, return ensure_bytes(frames[0]) (although I don't see how it could not be bytes).

I think we have bytearray in the TCP code path and NumPy arrays in the UCX code path. So coercing to bytes can make sense. Used a list to wrap it as well since that seems to be expected (based on related testing before).

Suggested change

if len(lengths) == 1:

return [b"".join(map(ensure_bytes, frames))]

if len(lengths) == 1 and len(frames) == 1:

return [ensure_bytes(frames[0])]

That said, I guess merge_frames is collecting some frames unaltered, some as memoryviews, and some as bytes. So maybe we can just pass frames through as-is? Would potentially save us a copy.

I don't know about that, depends if all callers explicitly expect bytes or not.

Just for context, here are the lines I'm looking at:

Preserve frame

Convert to memoryview

Convert to bytes (if there are multiple frames)

My guess is the only expectation is that frames must be bytes-like, which they are when inputted or when coerced to bytes or memoryviews. So I think we are safe to pass the frames as-is.

Maybe a good use case for typing - even though I'm not sure how you say "bytes-like".

(that was just musing, not suggesting a change)

Good question. I guess it's an open issue ( python/typing#593 ).

jakirkham · 2020-06-04T22:33:46Z

Alright I think the latest changes address the last round of reviews. Please let me know if there's anything else 🙂

martindurant · 2020-06-05T13:10:45Z

distributed/protocol/utils.py

@@ -67,6 +67,9 @@ def merge_frames(header, frames):
    if all(len(f) == l for f, l in zip(frames, lengths)):
        return frames

+    if len(lengths) == 1 and len(frames) == 1:


I think I may have been stupid here: is there any way that this condition can be triggered, given the assert and the previous if? If there is only one frame and one length, that length must be the same as the length of the one frame.

If this block came before the previous, then we would avoid calculating lengths again - which should be instantaneous in any case, so should we just get rid of it?

(indeed, the if all... condition also catches if not frames above, and would be fast to iterate over a zero-length list; and the assert is only needed after that if...)

Actually I think you were right to request this and it can come up. For frames that are not too large and for situations where only a single frame is needed, this could be quite common. For example NumPy arrays would pass through this path. So having a fast path makes sense. This saves us going through the while, which does a fair bit of work.

Well we would hope the frame is the same size as the length specified. However that might not be true if the message was truncated or otherwise corrupted (and I think this is why the asserts are there). So having the asserts makes sense. If we moved this before the asserts, I think we should add a check to ensure the frame is the expected length.

I might not be following this last bit, but I think it could make sense to move the if not frames after the asserts or drop it all together. No strong thoughts here. Happy to leave it as-is.

Ah I see what you mean (I think). This check already acts as a fast path. Yeah we can drop our case.

Ok pushed two commits that drop these.

This is already handled in the check above.

This is already handled by the fast path for more than zero frames.

martindurant · 2020-06-05T17:42:00Z

+1 here, sorry for the back-and-forth.

mrocklin · 2020-06-05T17:43:59Z

distributed/protocol/tests/test_serialize.py

+        2 ** 26 * b"ab",
+        (2 ** 25 * b"ab", 2 ** 25 * b"ab"),


Beware here that I've seen Python pre-compute literals like these as an unfortunate optimization, which results in large constant values in the program code. It might be wise to call a function around 2 **26 like int(2 ** 26) even though it has no semantic effect.

My bad experience on this was a long time ago, and so maybe things have improved..

Hmm...yeah I'm not sure. Did push some int wrappers around these though. One of these cases was already here (so we may have already been ok).

Ah ok, probably not a problem then. Thanks again for resolving the full issue quickly.

Of course 🙂

mrocklin · 2020-06-05T17:44:19Z

FWIW these changes seem fine to me

jakirkham · 2020-06-05T17:48:07Z

Not at all. Thanks for the feedback! 😄

jakirkham · 2020-06-05T18:39:07Z

At least for me the Travis CI status is not being shown on GitHub (not sure why), but it did run and pass.

mrocklin · 2020-06-05T18:43:00Z

It is visible and has a green check mark to me. Merging

jakirkham · 2020-06-05T18:55:51Z

Great! Yeah it now shows up for me too. Not sure what was up.

Thanks all! 😄

jakirkham mentioned this pull request Mar 26, 2020

Use "dask" serialization to move to/from host rapidsai/dask-cuda#256

Merged

quasiben reviewed Mar 26, 2020

View reviewed changes

jakirkham mentioned this pull request Mar 27, 2020

WIP: Simplify DeviceSerialized and usage thereof rapidsai/dask-cuda#268

Closed

jakirkham mentioned this pull request Jun 4, 2020

pickle function call uses kwargs added in python 3.8 #3851

Closed

jakirkham force-pushed the merge_bytes_frames branch from daf9a5f to 33f197f Compare June 4, 2020 01:25

jakirkham mentioned this pull request Jun 4, 2020

Support Pickle's protocol 5 #3784

Merged

jakirkham mentioned this pull request Jun 4, 2020

WIP: Track frame lengths in pickle_dumps #3841

Closed

jakirkham added 2 commits June 4, 2020 09:13

Add fast-path when only one frame is needed

59afbde

Test serializing a collection of bytes objects

9ab510f

jakirkham force-pushed the merge_bytes_frames branch from fae29d8 to 9ab510f Compare June 4, 2020 16:16

Avoid copying bytes

f688b21

As using the `bytes` constructor will cause a copy (even if it is provided a `bytes` object), special case handling a `bytes` object and just return.

martindurant reviewed Jun 4, 2020

View reviewed changes

jakirkham commented Jun 4, 2020

View reviewed changes

Only fast-path single frame case

198f81f

martindurant reviewed Jun 5, 2020

View reviewed changes

mrocklin mentioned this pull request Jun 5, 2020

Release 2.18.0 dask/community#64

Closed

jakirkham added 2 commits June 5, 2020 10:24

Drop fast-path for single frame

3b14b99

This is already handled in the check above.

Drop zero frame case

eafbf77

This is already handled by the fast path for more than zero frames.

mrocklin reviewed Jun 5, 2020

View reviewed changes

Wrap ** with with int

63a2086

mrocklin merged commit 6144eb1 into dask:master Jun 5, 2020

jakirkham deleted the merge_bytes_frames branch June 5, 2020 18:55

jakirkham mentioned this pull request Jun 5, 2020

Skip 2nd serialization pass of DeviceSerialized rapidsai/dask-cuda#309

Merged

This was referenced Jun 26, 2020

Require Dask + Distributed 2.18.0 rapidsai/integration#64

Merged

Evaluate further serialization performance improvements rapidsai/dask-cuda#106

Closed

		if len(lengths) == 1:
		return [b"".join(map(ensure_bytes, frames))]

Merge frames in deserialize_bytes #3639

Merge frames in deserialize_bytes #3639

Conversation

jakirkham commented Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

jakirkham Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

jakirkham commented Mar 27, 2020

mrocklin commented Mar 27, 2020

mrocklin commented Mar 28, 2020

mrocklin commented Apr 5, 2020

jakirkham commented Apr 7, 2020

martindurant commented Apr 13, 2020

jakirkham commented Apr 13, 2020

jakirkham commented Jun 4, 2020

jakirkham commented Jun 4, 2020

martindurant commented Jun 4, 2020

mnarodovitch commented Jun 4, 2020 • edited Loading

gshimansky commented Jun 4, 2020

martindurant commented Jun 4, 2020

jakirkham commented Jun 4, 2020

martindurant commented Jun 4, 2020

jakirkham commented Jun 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham May 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham Jun 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham Jun 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham commented Jun 4, 2020

Choose a reason for hiding this comment

jakirkham Jun 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindurant commented Jun 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Jun 5, 2020

jakirkham commented Jun 5, 2020

jakirkham commented Jun 5, 2020

mrocklin commented Jun 5, 2020

jakirkham commented Jun 5, 2020

Merge frames in `deserialize_bytes` #3639

Merge frames in `deserialize_bytes` #3639

jakirkham commented Mar 26, 2020 •

edited

Loading

jakirkham Mar 26, 2020 •

edited

Loading

mnarodovitch commented Jun 4, 2020 •

edited

Loading

jakirkham May 6, 2022 •

edited

Loading

jakirkham Jun 4, 2020 •

edited

Loading

jakirkham Jun 4, 2020 •

edited

Loading

jakirkham Jun 5, 2020 •

edited

Loading