Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support de/compress_into(ptr, len) #23

Closed
milesgranger opened this issue Feb 15, 2021 · 4 comments · Fixed by #26
Closed

Support de/compress_into(ptr, len) #23

milesgranger opened this issue Feb 15, 2021 · 4 comments · Fixed by #26

Comments

@milesgranger
Copy link
Owner

de/compress directly into a Python buffer

@milesgranger
Copy link
Owner Author

milesgranger commented Feb 15, 2021

@martindurant will something like this work for you?

>>> import numpy as np
>>> from cramjam import snappy
>>> values = np.zeros(100, dtype=np.uint8)
>>> snappy.compress_into(b"bytes", values)
>>> values
array([255,   6,   0,   0, 115,  78,  97,  80, 112,  89,   1,   9,   0,
         0, 181, 139, 168, 219,  98, 121, 116, 101, 115,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0], dtype=uint8)
>>> values.tobytes()
b'\xff\x06\x00\x00sNaPpY\x01\t\x00\x00\xb5\x8b\xa8\xdbbytes\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

then decompress_into(bytes, array) as well?

@martindurant
Copy link

Yes, perfect. In practice, I only expect to use decompress_into, although compress_into might have an interesting application with memmapped files. I would have either function return the number of bytes written.

Am I right in assuming that the zero padding you have if the length isn't exactly right would cause the decompress to panic?

@milesgranger
Copy link
Owner Author

milesgranger commented Feb 15, 2021

Ok, when prototyping that, felt like one might have wanted to compress_into(source_array, dst_array) but if it's typically compress_into(bytes, array) then I'm good with that as well. 👍

and yes, just noticed that myself, should probably return the number of bytes written. Good catch.

Most of the de/compression APIs implement the std::io::Read, meaning that if passed a slice, we can read the input which will de/encode until it reaches the end of the output buffer. While a Vec<u8> would continue to grow until all bytes are de/encoded with read_to_end. The point being, with a reference to the array, we can only have a slice, thus only have the ability to write to the end of the buffer.

So long as it successfully gets the references, it would not panic, regardless if the output buffer was too short, or too long.

@martindurant
Copy link

The point being, with a reference to the array, we can only have a slice, thus only have the ability to write to the end of the buffer.

There's no particular reason to handle this on the rust side, since making a slice/view/memoryview on the python side is no-copy, almost zero-cost. More a case of idle curiosity. Probably needs just a couple of comments in the eventual test-suite - i.e., we'll get the number of bytes back at compression, and then slice the resultant data for decompression and get the original back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants