Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support de/compress_into #26

Merged
merged 11 commits into from
Feb 18, 2021
Merged

Support de/compress_into #26

merged 11 commits into from
Feb 18, 2021

Conversation

milesgranger
Copy link
Owner

Will close #23

@milesgranger milesgranger marked this pull request as ready for review February 18, 2021 13:52
@milesgranger milesgranger mentioned this pull request Feb 18, 2021
@martindurant
Copy link

Does adding this significantly increase the install size? I am hoping no, since numpy itself contains all the called code.
At some point, someone might request this kind of thing for arbitrary buffers, not just numpy - be warned.

@milesgranger
Copy link
Owner Author

milesgranger commented Feb 18, 2021

I couldn't find a reliable way to pass the primitive pointer from python/numpy as a double *ptr to rust as a ptr as *mut u8 that rust wanted, without doing some more adventurous things.. this was so simple, and it added ~11kb in size on my machine:

❯ maturin build -i $(which python) --release --out wheels
🔗 Found pyo3 bindings
🐍 Found CPython 3.7m at /home/milesg/Projects/pyrus-cramjam/venv/bin/python
📦 Built source distribution to wheels/cramjam-2.0.0_rc1.tar.gz
warning: version requirement `0.6.0+zstd.1.4.8` for dependency `zstd` includes semver metadata which will be ignored, removing the metadata is recommended to avoid confusion
   Compiling pyo3 v0.13.2
   Compiling numpy v0.13.0
   Compiling cramjam v2.0.0-rc1 (/home/milesg/Projects/pyrus-cramjam)
    Finished release [optimized] target(s) in 7.62s
📦 Built wheel for CPython 3.7m to wheels/cramjam-2.0.0_rc1-cp37-cp37m-manylinux2010_x86_64.whl

pyrus-cramjam on  support-de-compress-into [⇕] is 📦 v2.0.0-rc1 via 🐍 v3.7.4 (venv) via 🦀 v1.50.0 on ☁️  (eu-north-1) took 8s 
❯ ls -ls wheels/
1672 -rw-rw-r-- 1 milesg milesg 1711444 feb.  18 15:20 cramjam-2.0.0_rc1-cp37-cp37m-manylinux2010_x86_64.whl

vs current master

❯ maturin build -i $(which python) --release --out wheels
🔗 Found pyo3 bindings
🐍 Found CPython 3.7m at /home/milesg/Projects/pyrus-cramjam/venv/bin/python
📦 Built source distribution to wheels/cramjam-2.0.0_rc1.tar.gz
warning: version requirement `0.6.0+zstd.1.4.8` for dependency `zstd` includes semver metadata which will be ignored, removing the metadata is recommended to avoid confusion
   Compiling cramjam v2.0.0-rc1 (/home/milesg/Projects/pyrus-cramjam)
    Finished release [optimized] target(s) in 2.01s
📦 Built wheel for CPython 3.7m to wheels/cramjam-2.0.0_rc1-cp37-cp37m-manylinux2010_x86_64.whl

pyrus-cramjam on  master is 📦 v2.0.0-rc1 via 🐍 v3.7.4 (venv) via 🦀 v1.50.0 on ☁️  (eu-north-1) took 2s 
❯ ls -ls wheels/
1664 -rw-rw-r-- 1 milesg milesg 1700482 feb.  18 15:20 cramjam-2.0.0_rc1-cp37-cp37m-manylinux2010_x86_64.whl

It's a good note about perhaps requests for other buffer bytes, but IMO, numpy is so pervasive it's a big plus to support, and it's an eventuality that supporting python file-like objects is a goal; as in #14

@milesgranger milesgranger merged commit 67f368c into master Feb 18, 2021
@milesgranger milesgranger deleted the support-de-compress-into branch February 18, 2021 14:47
@martindurant
Copy link

martindurant commented Feb 18, 2021

arr.ctypes.data or arr..__array_interface__['data'][0] gives you the raw pointer as a python int.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ctypes.html

x = np.ones(1000, dtype='uint8')
ptr = x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8))
ptr[0] = 2
x[:10]
array([2, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=uint8)

I'm not sure if the ctypes pointer is what you want. The point is, that for any numpy array you can do .view("uint8") (and ensure contiguity, if necessary), but memoryviews of python buffers should be simpler.

class Py_buffer(ctypes.Structure):
	_fields_ = [
		("buf", c_void_p),
		("obj", py_object),
		("len", c_ssize_t),
		("itemsize", c_ssize_t),
		("readonly", c_int),
		("ndim", c_int),
		("format", c_char_p),
		("shape", c_ssize_p),
		("strides", c_ssize_p),
		("suboffsets", c_ssize_p),
		("internal", c_void_p)
	]
buf = Py_buffer.get_from(data)
ptr = ct.cast(buf.buf, ct.POINTER(ct.c_uint8))  # buf.buf is the pointer, as an integer
ptr[0] = 100
assert data[0] == 100

@milesgranger
Copy link
Owner Author

Thanks for that! 💯

I suppose there is a useability portion to the API as well; passing in the array directly to de/compress_into.

However, to support memoryviews/pointer specific ability, we can do what was done with bytes/bytearray and accept an enum of pointer / numpy, that seem reasonable?

If you need this support vs what this PR did with numpy, feel free to make an issue. 👍

@martindurant
Copy link

No, I don't immediately need this. You can, after all, just wrap any python buffer in a numpy array anyway! Yes, I think it would be reasonable for the same _into function to accept numpy or a buffer, but probably users don't want to extract the pointer from the buffer themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support de/compress_into(ptr, len)
2 participants