-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support resize on PyBytes when initialised via with_new #4003
Comments
If the first proposal is the chosen one, then it would probably be good to have a version that doesn't zero out the bytes first. Something like: fn with_new_uninit<F>(py: Python<'_>, len: usize, init: F) -> PyResult<&Self>
where
F: FnMut(&[MaybeUninit<u8>]) -> PyResult<()> |
I think that is already the case?
|
Yes, but PyO3 writes zeros to the entire slice. |
Ah, yes, missed that. |
There's some previous discussion to this in #1074 We could explore making it work but |
Whats the process to get that going? I assumed that the function had underscore just because its usable in a very specific context. |
Could we do this by adding an alternative utility function sth like: pub fn resizeable_new_with<F>(py: Python<'_>, len: usize, init: F) -> PyResult<&PyBytes>
where
¦ F: FnOnce(&mut [u8]) -> PyResult<usize>,
{
¦ unsafe {
¦ ¦ let mut pyptr =
¦ ¦ ¦ ffi::PyBytes_FromStringAndSize(std::ptr::null(), len as ffi::Py_ssize_t);
¦ ¦ if pyptr.is_null() {
¦ ¦ ¦ return Err(PyRuntimeError::new_err(format!(
¦ ¦ ¦ ¦ "failed to allocate python bytes object of size {}",
¦ ¦ ¦ ¦ len
¦ ¦ ¦ )));
¦ ¦ }
¦ ¦ // Check for an allocation error and return it
¦ ¦ let buffer = ffi::PyBytes_AsString(pyptr) as *mut u8;
¦ ¦ debug_assert!(!buffer.is_null());
¦ ¦ // If init returns an Err, pypybytearray will automatically deallocate the buffer
¦ ¦ let new_len = init(std::slice::from_raw_parts_mut(buffer, len))?;
¦ ¦ if _PyBytes_Resize(ptr::addr_of_mut!(pyptr), new_len as ffi::Py_ssize_t) != 0 {
¦ ¦ ¦ return Err(PyRuntimeError::new_err("failed to resize bytes object"));
¦ ¦ }
¦ ¦ let pypybytes: Py<PyBytes> = Py::from_owned_ptr_or_err(py, pyptr)?;
¦ ¦ Ok(pypybytes.into_ref(py))
¦ }
} |
@damelLP: The @AudriusButkevicius and @davidhewitt: I'm talking about things I'm not very familiar with, but wouldn't it be possible to construct the data with PyByteArrays and then simply make a PyBytes object from the array? I'm not fully certain how much overhead that would add but it would be a stable / public way to do this. |
That would perform a copy (as far as I understand) which is what I want to avoid. As it stands, I dont think there is a zero copy way to allocate PyBytes |
It uses the buffer protocol which, if I understand correctly, doesn't copy the underlying data.
|
It should also be possible to create a new type that has the buffer protocol and pass that to |
I assumed it was copying. Do you have docs that suggests its zero copy? |
Scattered throughout the buffer protocol documentation
These are not things I've used before, but the documentation heavily suggests, if not says outright, that it should be zero copy. |
Considering the resizeable |
Sorry that I fell off this thread a little.
There's probably good reasons why a general
👍 on this, and in particular we plan to make this easier with #3148. |
I'm building a rust library that I expose to python via pyo3. This library is generating quite a lot of data in rust and handing it over to python (100GB for example).
One problem with how I generate the data is that I don't know exactly how much data there will be until it's generated (decompression), so I cannot allocate a correctly sized PyBytes object ahead of time, I need to first decompress the data in a rust allocated buffer, check its size, then allocate a PyBytes of the right size, and copy the data over. Given the amount of data, the copy from rust storage to PyBytes becomes a large overall cost of the api.
There also seems to be no C python api to allocate a bytes object from an existing block of memory, without copying the data, this is because the object definition has to be immediately before the memory that is the content of the object.
However, C api supports _Py_Resize
There are some constraints of when it can and cannot be used, but I think if used from within (or shortly after) the PyBytes::with_new closure, passes all the checks.
My proposal is to either:
Currently I'm working around this by allocating a PyObject myself, doing object pointer lifetime juggling myself. Seems to work however I'm not sure I understand who owns the object at what time, or how to free it properly in case ownership never gets handed over to the interpreter. Tons of unsafe code I don't feel comfortable with.
I also cannot use bytearray because I genuinely do not want the data to be modified from python.
I'd be happy to implement 1, however I feel I might be too new to rust to implement 2 (could do with hand holding I guess).
The text was updated successfully, but these errors were encountered: