-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup FFI interface (#3684) (#3683) #3685
Conversation
@@ -911,6 +928,7 @@ impl ArrowArray { | |||
} | |||
|
|||
/// exports [ArrowArray] to the C Data Interface | |||
#[deprecated(note = "Use FFI_ArrowArray and FFI_ArrowSchema directly")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of this method are really unclear, it is never documented that the returned pointers need to be fed to Arc::from_raw
in order to free them
@@ -878,6 +894,7 @@ impl ArrowArray { | |||
/// on managing the allocation of the structs by themselves. | |||
/// # Error | |||
/// Errors if any of the pointers is null | |||
#[deprecated(note = "Use ArrowArray::new")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has a very misleading signature, it actually mutates the data pointed to by array
and schema
} | ||
|
||
// (simulate consumer) import it | ||
unsafe { | ||
let array = ArrowArray::try_from_raw(out_array_ptr, out_schema_ptr).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without try_from_raw
, how do we import raw pointers from external into ArrowArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You either do something like in this doc - https://github.com/apache/arrow-rs/pull/3685/files#diff-c6cab492853dc6dec63fbf818294b78b87bdef628515910d91873084ff32d6d7R61
Or you use std::ptr::replace
.
We could add something like try_from_raw
that accepts *mut
pointers, but I figured it was better to push people towards using Rust ownership instead of passing pointers around with unclear ownership
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, do you mean to use std::ptr::replace
in ForeignArray.export
? Isn't it jut like try_from_raw
did now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I mean import from external pointers, not export.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it jut like try_from_raw did now
Yes, this does not change what can be achieved. Instead it clarifies both the ownership and mutation by making the user be explicit about what they want. If they want to zero out the data, mutating it, they can use std::ptr::replace
on a *mut T
, if they just want to read the data, they can use std::ptr::read
, if the pointer isn't aligned they can use std::ptr::read_unaligned
, etc... The intention is to put the responsibility onto the caller as to what the pointer is and how to "correctly" read it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Sounds good to me.
For the tests which use try_from_raw
, could you try to make them use std::ptr::replace
to import from raw pointers? This is the one of purposes of the tests to test importing from raw pointers as ArrowArray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test using std::ptr::read
as that is actually all that is necessary, assuming the foreign API is transferring ownership.
//! | ||
//! // import it | ||
//! let array = unsafe { make_array_from_raw(array_ptr, schema_ptr)? }; | ||
//! let array = ArrowArray::new(out_array, out_schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_ptr
, schema_ptr
are somehow to simulate that the raw pointers from external (e.g. JVM in our usecase). Without make_array_from_raw
, how can we import them as ArrowArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let array = std::ptr::replace(array_ptr, FFI_ArrowSchema::empty());
Perhaps?
Or depending on the semantics
let array = std::ptr::read(array_ptr);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::ptr::replace
? Isn't it something we did in try_from_raw
nowadays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the argument to try_from_raw
is *const
so there is a pretty reasonable assumption that it shouldn't mutate the provided data 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. How about make them as *mut
? It sounds like we just remove std::ptr::replace
usage and leave it to users like us. Although it is straightforward to write a importer function in our codebase, just wondering why we don't have it in upstream like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I wanted to reduce the number of methods that pass around raw pointers, and wanted to avoid making a breaking change. I can change it if you feel strongly, but having two methods one of which has to have a long explanation of its ownership semantics, as it does a sort of DIY indirect aligned move, seemed like a code smell
arrow/src/ffi.rs
Outdated
//! impl ForeignArray { | ||
//! /// Export to C Data interface | ||
//! fn export(&self, array: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) { | ||
//! // ... | ||
//! } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import? I assume that idea is to import from foreign array into C Data Interface.
//! impl ForeignArray { | |
//! /// Export to C Data interface | |
//! fn export(&self, array: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) { | |
//! // ... | |
//! } | |
//! impl ForeignArray { | |
//! /// Import to C Data interface | |
//! fn import(&self, array: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) { | |
//! // ... | |
//! } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was modelling it after the export_to_c function found in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaked it to hopefully be a bit clearer, PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks.
Benchmark runs are scheduled for baseline = 98ce68f and contender = bb4fc59. bb4fc59 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3684
Closes #3683
Rationale for this change
The current interface is very hard to use correctly, with it being incredibly easy to accidentally leak memory. This reworks the interface to avoid this, and make better use of Rust's ownership semantics.
What changes are included in this PR?
Are there any user-facing changes?