-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for PyByteArray::new
#4058
Comments
I don't think that is possible, especially with I think a real zero-copy solution would be to implement a custom pyclass which implements the buffer protocol and is directly backed by the |
@adamreichold
it mean the data will always copy since we pass data from rust to python?
I checked the let mut dims = dims.into_dimension();
let ptr = PY_ARRAY_API.PyArray_NewFromDescr(
py,
PY_ARRAY_API.get_type_object(py, npyffi::NpyTypes::PyArray_Type),
T::get_dtype_bound(py).into_dtype_ptr(),
dims.ndim_cint(),
dims.as_dims_ptr(),
strides as *mut npy_intp, // strides
data_ptr as *mut c_void, // data
npyffi::NPY_ARRAY_WRITEABLE, // flag
ptr::null_mut(), // obj
);
PY_ARRAY_API.PyArray_SetBaseObject(
py,
ptr as *mut npyffi::PyArrayObject,
container as *mut ffi::PyObject,
);
Bound::from_owned_ptr(py, ptr).downcast_into_unchecked() Can i use your |
Basically because someone has to free the data and know how to do that, i.e. in our case the main task of
I don't think so because this type is pretty much tied to backing a NumPy array by a rust-allocated I fear you will have to figure this out via https://pyo3.rs/v0.21.1/class/protocols.html?highlight=__getbuffer__#buffer-objects which is a thin layer over the CPython FFI objects until we provide a more ergonomic API to achieve this. Thinking outside of the box, I am not sure how exactly you are using Rust here but you could also just use Python's |
Thanks for you advice. For my case copy mmap is reasonable for load weight to MEM. The core problem is memcpy for mmap memory is very slow. see: https://stackoverflow.com/questions/52845387/improving-mmap-memcpy-file-read-performance for my case, which is mach more slower than read file. |
This seems reasonable: If you are going to read the whole file, I guess you could reduce the difference by applying Using |
Hello, recently i found the loading speed of
safetensors
is more than 50% slow than loading pickle.huggingface/safetensors#460
for a
1GB
numpy array, loading by pickle only need1s
, but safetensors need1.8s
Then, i profiled the performance, and found that, the most slower is calling
PyByteArray::new
, consuming almost all time!https://github.com/ZHUI/safetensors/tree/performance/load_speed
ZHUI/safetensors@889c432
for
PyByteArray::new(py, data);
, the data is mmap address, thePyByteArray::new
simple calledPyByteArray_FromStringAndSize
https://github.com/python/cpython/blob/733e56ef9656dd79055acc2a3cecaf6054a45b6c/Objects/bytearrayobject.c#L134-L142
it seems just
memcpy
the mmap data to memory and the data is copying fordisk->mem
.and i don't know why it is so slow, does it have additional
mem->mem
copy for this optional? thusdisk->mem->mem
It there any way to create a PyByteArray from an exists memory without copy data?
The text was updated successfully, but these errors were encountered: