Track mutable frames #4004

jakirkham · 2020-07-31T21:01:26Z

Fixes #3994
Closes #3995 (as it's included)

Instead of making all frames writeable as was done in PR ( #3967 ), this tracks which frames were writeable and only makes sure those are writeable. This also fixes a performance regression. Thanks @jsignell for the suggestion 🙂

In [1]: import cupy
   ...: import cudf
   ...: import rmm
   ...: 
   ...: from distributed.protocol import serialize_bytes, deserialize_bytes

In [2]: rmm.reinitialize(pool_allocator=True,
   ...:                  initial_pool_size=int(30 * 2**30))

In [3]: df = cudf.DataFrame({
   ...:     k: cupy.random.random(1_000_000)
   ...:     for i, k in enumerate(map(chr, range(ord("A"), ord("K"))))
   ...: })

In [4]: b = serialize_bytes(df)

In [5]: %timeit deserialize_bytes(b)
10.1 ms ± 50.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Edit: Had a typo in my benchmark before. It has been updated.

Note: The merge_frames code in this PR is now closer to what it looked like in 2.21.0 (thanks to some simplifications :). So it may be easier to compare what is here against the code in that version for a cleaner diff.

cc @quasiben

Keeps logic simple here in exchange for potentially copying later if this is writeable.

We don't need to bother making these frames writeable as they will be transferred to GPU anyways where they will then be writeable. As a result this saves us a copy that we don't otherwise need.

Only try to figure out `writeable` if it is not otherwise specified. This is important for CUDA objects for example, which are not `memoryview` coercible.

Make sure all frames are `bytes` as they are read-only. This should test whether the deserialization logic is smart enough to coerce them back into something that is writeable.

As we are only using `"dask"` or `"pickle"` serialization when spilling here, we can be confident that there will not be any CUDA frames. So drop this unneeded check.

jakirkham · 2020-07-31T23:27:04Z

Additionally have managed to back out the requirement that all frames be memoryviews. Also included tests to make sure NumPy arrays and Pandas objects built from read-only frames are reconstructed with writeable frames (if they were originally).

As we now guarantee frames are readonly or writeable as appropriate when splitting/merging frames in all cases, there is no need to run this code in all cases. We simply need it for the fast path. So add it there.

Make sure we force readonly frames to be readonly.

This special case of `None` will bypass both cases.

This is very slightly faster than `.append()` ```python In [1]: %timeit L = [] 17.4 ns ± 0.0668 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each) In [2]: L2 = [5] In [3]: %timeit L = []; L.extend(L2) 73.7 ns ± 1.72 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [4]: %timeit L = []; L.append(L2[0]) 84.5 ns ± 0.211 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) ```

quasiben · 2020-08-03T16:35:56Z

Thanks for doing continuing to push here @jakirkham. I have two comments:

I think updating a docstring somewhere on how writeable is used would be helpful. Maybe in core.py -- this is only used for GPU data correct ?
Would you be willing to confirm test_cupy/numba/rmm/test_collection_cuda/test_ucx pass as well.

Try to coerce the frame through `memoryview` and check the `readonly` attribute. If the frame cannot be coerced through `memoryview`, use `None` to indicate the frame is ambiguous, which means we will bypass any special handling of the frame.

distributed/utils.py

jakirkham · 2020-08-03T18:42:21Z

I think updating a docstring somewhere on how writeable is used would be helpful. Maybe in core.py -- this is only used for GPU data correct ?

Hopefully is_writeable addresses this. Though happy to expand on this further if there are particular points that remain unclear

Would you be willing to confirm test_cupy/numba/rmm/test_collection_cuda/test_ucx pass as well.

I've been running the full protocol test suite on a DGX-1, which has worked well. Haven't been testing comms though. However it seems there are other issues without this change as evidenced by PR ( rapidsai/ucx-py#567 ).

jakirkham · 2020-08-03T19:00:25Z

Would you be willing to confirm test_cupy/numba/rmm/test_collection_cuda/test_ucx pass as well.

I've been running the full protocol test suite on a DGX-1, which has worked well. Haven't been testing comms though. However it seems there are other issues without this change as evidenced by PR ( rapidsai/ucx-py#567 ).

Ok the UCX comms tests pass for me locally too. Though we should still figure out what is up with UCX-Py's CI.

jakirkham · 2020-08-03T21:00:50Z

Would you be willing to confirm test_cupy/numba/rmm/test_collection_cuda/test_ucx pass as well.

I've been running the full protocol test suite on a DGX-1, which has worked well. Haven't been testing comms though. However it seems there are other issues without this change as evidenced by PR ( rapidsai/ucx-py#567 ).

Ok the UCX comms tests pass for me locally too. Though we should still figure out what is up with UCX-Py's CI.

Turns out to be related to how the test was written. PR ( rapidsai/ucx-py#568 ) should fix this.

quasiben · 2020-08-04T00:10:36Z

Thanks John. Merging

jakirkham mentioned this pull request Jul 31, 2020

Copy for mutable frames can introduce a slowdown #3994

Closed

jakirkham force-pushed the track_mutable_frames branch from 1607d5e to e3d3d8c Compare July 31, 2020 21:26

jakirkham added 6 commits July 31, 2020 14:29

Flag mutable frames as part of serialization

27ec2c7

Just use bytes in merge_frames

7dda1b7

Keeps logic simple here in exchange for potentially copying later if this is writeable.

Only copy mutable frames

64c58ca

Mark all CUDA frames as non-writeable

78f301f

We don't need to bother making these frames writeable as they will be transferred to GPU anyways where they will then be writeable. As a result this saves us a copy that we don't otherwise need.

Test NumPy array preserves writeability

2d75629

Fix-up merge_frames test with writeable header

7ffa2c2

jakirkham force-pushed the track_mutable_frames branch from e3d3d8c to 7ffa2c2 Compare July 31, 2020 21:30

jakirkham added 12 commits July 31, 2020 15:20

Relax the memoryview requirement of frames

3ecaff8

Drop test for frames being memoryviews

95a2f9f

Assert writeable and lengths have same # items

a581f01

Optionally determine writeable

934a55e

Only try to figure out `writeable` if it is not otherwise specified. This is important for CUDA objects for example, which are not `memoryview` coercible.

Rename m to w for clarity

0c12ee1

Fix Pandas test name

c0427cb

Test writing to Pandas Series after serialization

5e712be

Force Pandas serialized frames to be readonly

f5f2f12

Make sure all frames are `bytes` as they are read-only. This should test whether the deserialization logic is smart enough to coerce them back into something that is writeable.

Test merge_frames with other writeable values

58e3b03

Use read-only frames for serialized NumPy array

90bf603

Use list to compare writeable

73f34b0

Drop unneeded CUDA array interface check

2473f6d

As we are only using `"dask"` or `"pickle"` serialization when spilling here, we can be confident that there will not be any CUDA frames. So drop this unneeded check.

jakirkham added 4 commits July 31, 2020 16:55

Use a bytearray to join writeable frames

10abc22

Handle singleton frame case as well

01d41d7

Handle fast-path at the beginning

4b25ada

As we now guarantee frames are readonly or writeable as appropriate when splitting/merging frames in all cases, there is no need to run this code in all cases. We simply need it for the fast path. So add it there.

Fix-up readonly case in fast path

cc78d30

Make sure we force readonly frames to be readonly.

jakirkham mentioned this pull request Aug 1, 2020

Mark host frames as not needing to be writeable rapidsai/cudf#5824

Merged

jakirkham force-pushed the track_mutable_frames branch 3 times, most recently from 5b82157 to 9f83aaa Compare August 1, 2020 01:03

jakirkham force-pushed the track_mutable_frames branch from 9f83aaa to e36b592 Compare August 1, 2020 01:06

Mark CUDA frames as neither readonly no writeable

09c1ce7

This special case of `None` will bypass both cases.

jakirkham force-pushed the track_mutable_frames branch from e36b592 to 09c1ce7 Compare August 1, 2020 01:12

jakirkham added 5 commits July 31, 2020 18:21

Fix-up logic for copying singleton frame

2ef5de3

Test a few more merge_frames cases

913cf8a

Drop unneeded CuPy customization

040ef75

Always use join path when copying is needed

00bc781

jakirkham force-pushed the track_mutable_frames branch from ba49387 to 7012d85 Compare August 1, 2020 01:49

jakirkham added 2 commits July 31, 2020 19:45

Skip assignment and return out

41dcec4

One more test with bytearrays

f40d91c

jakirkham mentioned this pull request Aug 3, 2020

Improve bytes and bytearray serialization #4009

Merged

jakirkham added 3 commits August 3, 2020 10:11

Use is_writeable to detect writeable frames

942d7ea

Merge dask/master into jakirkham/track_mutable_frames

60190e9

quasiben reviewed Aug 3, 2020

View reviewed changes

distributed/utils.py Show resolved Hide resolved

Explain return results from is_writeable

87252c9

quasiben merged commit 5e7285c into dask:master Aug 4, 2020

jakirkham deleted the track_mutable_frames branch August 4, 2020 00:11

jakirkham mentioned this pull request Aug 4, 2020

Test Pandas Series is writeable #3995

Closed

pentschev mentioned this pull request Aug 11, 2020

[BUG] Can't create LocalCUDACluster with 0.16 conda nightlies rapidsai/dask-cuda#355

Closed

jakirkham mentioned this pull request Aug 14, 2020

Bump Dask + Distributed to 2.23.0 rapidsai/integration#108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track mutable frames #4004

Track mutable frames #4004

jakirkham commented Jul 31, 2020 •

edited

Loading

jakirkham commented Jul 31, 2020

quasiben commented Aug 3, 2020

jakirkham commented Aug 3, 2020

jakirkham commented Aug 3, 2020

jakirkham commented Aug 3, 2020

quasiben commented Aug 4, 2020

Track mutable frames #4004

Track mutable frames #4004

Conversation

jakirkham commented Jul 31, 2020 • edited Loading

jakirkham commented Jul 31, 2020

quasiben commented Aug 3, 2020

jakirkham commented Aug 3, 2020

jakirkham commented Aug 3, 2020

jakirkham commented Aug 3, 2020

quasiben commented Aug 4, 2020

jakirkham commented Jul 31, 2020 •

edited

Loading