-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support new Arrow PyCapsule Interface for Python FFI #5067
Comments
I took a quick stab at this but got stuck. In particular, I tried: use arrow::array::ArrayData;
use arrow::ffi;
use pyo3::exceptions::{PyTypeError, PyValueError};
use pyo3::prelude::*;
use pyo3::types::{PyCapsule, PyTuple};
use pyo3::{PyAny, PyResult};
pub fn pyobj_to_array(ob: &'_ PyAny) -> PyResult<ArrayData> {
if ob.hasattr("__arrow_c_array__")? {
let tuple = ob.getattr("__arrow_c_array__")?.call0()?;
if !tuple.is_instance_of::<PyTuple>() {
return Err(PyTypeError::new_err(
"Expected __arrow_c_array__ to return a tuple.",
));
}
let schema_capsule = tuple.get_item(0)?;
if !schema_capsule.is_instance_of::<PyCapsule>() {
return Err(PyTypeError::new_err(
"Expected __arrow_c_array__ first element to be PyCapsule.",
));
}
let schema_capsule: &PyCapsule = PyTryInto::try_into(schema_capsule)?;
let schema_capsule_name = schema_capsule.name()?;
if schema_capsule_name.is_none() {
return Err(PyValueError::new_err(
"Expected PyCapsule to have name set.",
));
}
let schema_capsule_name = schema_capsule_name.unwrap().to_str()?;
if schema_capsule_name != "arrow_schema" {
return Err(PyValueError::new_err(
"Expected name 'arrow_schema' in PyCapsule.",
));
}
let array_capsule = tuple.get_item(1)?;
if !array_capsule.is_instance_of::<PyCapsule>() {
return Err(PyTypeError::new_err(
"Expected __arrow_c_array__ second element to be PyCapsule.",
));
}
let array_capsule: &PyCapsule = PyTryInto::try_into(array_capsule)?;
let array_capsule_name = array_capsule.name()?;
if array_capsule_name.is_none() {
return Err(PyValueError::new_err(
"Expected PyCapsule to have name set.",
));
}
let array_capsule_name = array_capsule_name.unwrap().to_str()?;
if array_capsule_name != "arrow_array" {
return Err(PyValueError::new_err(
"Expected name 'arrow_array' in PyCapsule.",
));
}
let arr = unsafe {
ffi::from_ffi(
*array_capsule.reference::<ffi::FFI_ArrowArray>(),
schema_capsule.reference::<ffi::FFI_ArrowSchema>(),
)
.unwrap()
};
return Ok(arr);
}
Err(PyValueError::new_err(
"Expected an object with dunder __arrow_c_array__",
))
} but this doesn't work because of
This has a different setup because instead of provisioning FFI structs ourselves for python to export into, python exports its own ffi structs inside the capsule. Maybe I'm missing something in the pyo3 api to get an owned pointer, but I assume that's not possible. I figure here that the arrow-rs ffi code would need to be changed to allow a reference to a |
The arrow-rs FFI interfaces expect to take ownership of the provided data, as per the C data specification. That is when the last reference within arrow-rs created data structures is freed it should invoke release. If this is what the PyCapsule is providing, then it should just be a case of std::ptr::read to obtain an owned value. However, if this is not how ownership in PyCapsule works, in particular if the ownership of the PyCapsule itself needs to be transfered and not the underlying C data array, then a new FFI mechanism would be required. |
Awesome! Sorry, I'm still learning low-level FFI stuff in Rust and didn't know With that I got it to work! #[pyfunction]
pub fn read_array(ob: &'_ PyAny) -> PyResult<()> {
let arr = pyobj_to_array(ob)?;
println!("{:?}", arr);
Ok(())
}
pub fn pyobj_to_array(ob: &'_ PyAny) -> PyResult<ArrayData> {
if ob.hasattr("__arrow_c_array__")? {
let tuple = ob.getattr("__arrow_c_array__")?.call0()?;
if !tuple.is_instance_of::<PyTuple>() {
return Err(PyTypeError::new_err(
"Expected __arrow_c_array__ to return a tuple.",
));
}
let schema_capsule = tuple.get_item(0)?;
if !schema_capsule.is_instance_of::<PyCapsule>() {
return Err(PyTypeError::new_err(
"Expected __arrow_c_array__ first element to be PyCapsule.",
));
}
let schema_capsule: &PyCapsule = PyTryInto::try_into(schema_capsule)?;
let schema_capsule_name = schema_capsule.name()?;
if schema_capsule_name.is_none() {
return Err(PyValueError::new_err(
"Expected PyCapsule to have name set.",
));
}
let schema_capsule_name = schema_capsule_name.unwrap().to_str()?;
if schema_capsule_name != "arrow_schema" {
return Err(PyValueError::new_err(
"Expected name 'arrow_schema' in PyCapsule.",
));
}
let array_capsule = tuple.get_item(1)?;
if !array_capsule.is_instance_of::<PyCapsule>() {
return Err(PyTypeError::new_err(
"Expected __arrow_c_array__ second element to be PyCapsule.",
));
}
let array_capsule: &PyCapsule = PyTryInto::try_into(array_capsule)?;
let array_capsule_name = array_capsule.name()?;
if array_capsule_name.is_none() {
return Err(PyValueError::new_err(
"Expected PyCapsule to have name set.",
));
}
let array_capsule_name = array_capsule_name.unwrap().to_str()?;
if array_capsule_name != "arrow_array" {
return Err(PyValueError::new_err(
"Expected name 'arrow_array' in PyCapsule.",
));
}
let array_ptr = array_capsule.pointer();
let array_ptr = array_ptr as *const ffi::FFI_ArrowArray;
let owned_array_ptr = unsafe { std::ptr::read(array_ptr) };
let arr = unsafe {
ffi::from_ffi(
owned_array_ptr,
schema_capsule.reference::<ffi::FFI_ArrowSchema>(),
)
.unwrap()
};
return Ok(arr);
}
Err(PyValueError::new_err(
"Expected an object with dunder __arrow_c_array__",
))
} Then in Python: >>> import pyarrow as pa
>>> arr = pa.array([1, 2, 3, 4])
>>> geoarrow.rust.rust.read_array(arr)
ArrayData { data_type: Int64, len: 4, offset: 0, buffers: [Buffer { data: Bytes { ptr: 0x596540300c0, len: 32, data: [1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0] }, ptr: 0x596540300c0, length: 32 }], child_data: [], nulls: None } I'm still not 100% understanding the ownership model of the pycapsule, but at least the proof of concept works |
Well, I take that back, if I call
So clearly I'm doing something wrong with the ownership model |
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There's existing support for interop with Python via the C data interface in
arrow::pyarrow
.As of pyarrow v14, there's a new specification for Python-based interop, called the Arrow PyCapsule Interface. In particular, this allows interfacing with any arrow-based Python library, instead of only pyarrow. It also ensures memory doesn't leak if the capsule isn't used.
Describe the solution you'd like
Updates to the Python interop to support the new capsule-based approach.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: