-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure writable frames #3967
Ensure writable frames #3967
Conversation
Should make it easier to extend these tests to more cases.
User code working with NumPy or Pandas objects often expects the objects to be mutable. However if read-only frames (like `bytes`) objects are used, this is not true. So add a test to check for this so that we can make sure this is true and we can catch and fix cases where that may not be true.
38a7d14
to
2fb24e5
Compare
This makes it easier to operate with the frames in later parts of the code.
Instead of having a fast path, guard against going through the slow path. Should make it easier to do some common post-processing work.
Instead of using `b""`, create the `bytes` object explicitly.
Instead of using a `bytes` object to merge frames together, use a `bytearray`. This ensures the result is mutable, which downstream objects from serialization may care about.
As users expect objects that are deserialized like NumPy arrays and Pandas objects are mutable, make sure the frames used to build them up are also mutable. Do this by checking if they are read-only (as is the case for `bytes` as an example). If they are, copy them into a `bytearray` backed `memoryview`. Otherwise pass them through as-is.
Thanks for handling this @jakirkham . I'm only seeing tests in this PR so far. Is that correct? |
Yeah I'm just trying to demonstrate the failure on CI. Will push the rest of the fix after. |
Alright have pushed the changes to fix the tests. Please let me know what you think 🙂 |
out.append(memoryview(bytearray().join(L))) | ||
frames = out | ||
|
||
frames = [memoryview(bytearray(f)) if f.readonly else f for f in frames] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fun I decided to micro-benchmark this:
In [1]: L = [b'0' * i for i in [10, 100, 10, 100, 1000] * 2]
In [2]: %timeit b"".join(L)
203 ns ± 1.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [3]: %timeit memoryview(bytearray().join(L))
463 ns ± 5.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
260ns doesn't really matter today I don't think
In principle this looks good to me. Thank you for identifying and resolving this @jakirkham ! |
Thanks Matt! 😄 |
Adding a test for a Pandas Series as well in PR ( #3995 ). |
Fixes #1978
Fixes #3943
Closes #3918 (superseded)
User code often expects NumPy and Pandas objects to be writable. However this fails in various cases. After some investigation this appears to be due to some frames being read-only (as they are coerced to
bytes
via compression, logic inmerge_frames
, etc.). To fix this we update the logic inmerge_frames
to check frames are writable. If frames are read-only, we copy them to a writable object (likebytearray
). We include tests ofmerge_frames
and NumPy array serialization to make sure this is working correctly.