-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api!): make ArrowArrayStreamReader Send #4232
Conversation
06753ea
to
c51d589
Compare
c51d589
to
0f4aa77
Compare
I'm not sure about the changes to the python interface, we've run into challenges in the past because python can't transfer ownership. I'll have a play later today and see if we can't make it Send without requiring a breaking change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, I like where this is headed. I think there is one memory safety issue, but otherwise mostly just nits
//! | ||
//! // consumed and used by something else... | ||
//! | ||
//! // import it | ||
//! let stream_reader = unsafe { ArrowArrayStreamReader::from_raw(stream_ptr).unwrap() }; | ||
//! let stream_reader = unsafe { ArrowArrayStreamReader::from_raw(&mut stream).unwrap() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! let stream_reader = unsafe { ArrowArrayStreamReader::from_raw(&mut stream).unwrap() }; | |
//! let stream_reader = ArrowArrayStreamReader::try_new(stream).unwrap(); |
I think we could deprecate from_raw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still relevant. For example, there are cases where we need to implement a C API where we will be passed a pointer to an ArrowArrayStreamReader. For example, this is done in the "bind stream" function in ADBC:
@@ -231,8 +227,7 @@ impl ExportedArrayStream { | |||
let struct_array = StructArray::from(batch); | |||
let array = FFI_ArrowArray::new(&struct_array.to_data()); | |||
|
|||
unsafe { std::ptr::copy(addr_of!(array), out, 1) }; | |||
std::mem::forget(array); | |||
unsafe { std::ptr::write_unaligned(out, array) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will cause the array to be released when array
goes out of scope, i.e. immediately. I think eventually this will result in a double-free
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. However, the docs for write_unaligned
say that src
is guaranteed not to be dropped:
https://doc.rust-lang.org/std/ptr/fn.write_unaligned.html
Internally, it's calling intrinsics::forget()
. So I think this is sound.
https://doc.rust-lang.org/src/core/ptr/mod.rs.html#1447
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, I had no idea that was the semantic - makes sense
let stream = Box::new(FFI_ArrowArrayStream::empty()); | ||
let stream_ptr = Box::into_raw(stream) as *mut FFI_ArrowArrayStream; | ||
|
||
unsafe { export_reader_into_raw(Box::new(self.clone()), stream_ptr) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why IntoPyArrow is needed, this needs to "consume" the stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. It's a little messy, but I did want to convey we are moving ownership, while all the other conversion are either creating entirely new objects (data types) or sharing ownership (arrays).
|
||
impl<'source, T: PyArrowConvert> FromPyObject<'source> for PyArrowType<T> { | ||
impl<'source, T: FromPyArrow + IntoPyArrow> FromPyObject<'source> for PyArrowType<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl<'source, T: FromPyArrow + IntoPyArrow> FromPyObject<'source> for PyArrowType<T> { | |
impl<'source, T: FromPyArrow> FromPyObject<'source> for PyArrowType<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason both traits are there is because PyArrowType<T>
requires. If I remove one we get
the trait bound `T: FromPyArrow` is not satisfied
fn extract(value: &'source PyAny) -> PyResult<Self> { | ||
Ok(Self(T::from_pyarrow(value)?)) | ||
} | ||
} | ||
|
||
impl<'a, T: PyArrowConvert> IntoPy<PyObject> for PyArrowType<T> { | ||
impl<T: FromPyArrow + IntoPyArrow> IntoPy<PyObject> for PyArrowType<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl<T: FromPyArrow + IntoPyArrow> IntoPy<PyObject> for PyArrowType<T> { | |
impl<T: IntoPyArrow> IntoPy<PyObject> for PyArrowType<T> { |
} | ||
} | ||
|
||
impl<T: PyArrowConvert> From<T> for PyArrowType<T> { | ||
impl<T: FromPyArrow + IntoPyArrow> From<T> for PyArrowType<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl<T: FromPyArrow + IntoPyArrow> From<T> for PyArrowType<T> { | |
impl<T> From<T> for PyArrowType<T> { |
The integration test failure looks legitimate and would suggest something is off with this PR somewhere...
|
Yes, thanks for the reminder. I'll look into this soon. |
* feat(api make ArrowArrayStreamReader Send * simplify ptr handling * rename pyarrow traits to conform to guidelines * pr feedback * remove dangling Box::from_raw
Which issue does this PR close?
Closes #4222
Rationale for this change
The underlying implementation is arguably Send, and making it not Send makes it extremely hard to work with. This drops support for Clone, but if someone wants to sacrifice Send for Clone, they can wrap it in an Arc themselves.
What changes are included in this PR?
ffi_stream
code to remove some unnecessary heap allocations.Are there any user-facing changes?
This is a breaking change to
PyArrowConvert
.