Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup FFI interface (#3684) (#3683) #3685

Merged
merged 4 commits into from
Feb 9, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Feb 9, 2023

Which issue does this PR close?

Closes #3684
Closes #3683

Rationale for this change

The current interface is very hard to use correctly, with it being incredibly easy to accidentally leak memory. This reworks the interface to avoid this, and make better use of Rust's ownership semantics.

What changes are included in this PR?

Are there any user-facing changes?

@tustvold tustvold requested a review from viirya February 9, 2023 16:50
@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 9, 2023
@@ -911,6 +928,7 @@ impl ArrowArray {
}

/// exports [ArrowArray] to the C Data Interface
#[deprecated(note = "Use FFI_ArrowArray and FFI_ArrowSchema directly")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of this method are really unclear, it is never documented that the returned pointers need to be fed to Arc::from_raw in order to free them

@@ -878,6 +894,7 @@ impl ArrowArray {
/// on managing the allocation of the structs by themselves.
/// # Error
/// Errors if any of the pointers is null
#[deprecated(note = "Use ArrowArray::new")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has a very misleading signature, it actually mutates the data pointed to by array and schema

}

// (simulate consumer) import it
unsafe {
let array = ArrowArray::try_from_raw(out_array_ptr, out_schema_ptr).unwrap();
Copy link
Member

@viirya viirya Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without try_from_raw, how do we import raw pointers from external into ArrowArray?

Copy link
Contributor Author

@tustvold tustvold Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You either do something like in this doc - https://github.com/apache/arrow-rs/pull/3685/files#diff-c6cab492853dc6dec63fbf818294b78b87bdef628515910d91873084ff32d6d7R61

Or you use std::ptr::replace.

We could add something like try_from_raw that accepts *mut pointers, but I figured it was better to push people towards using Rust ownership instead of passing pointers around with unclear ownership

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you mean to use std::ptr::replace in ForeignArray.export? Isn't it jut like try_from_raw did now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I mean import from external pointers, not export.

Copy link
Contributor Author

@tustvold tustvold Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it jut like try_from_raw did now

Yes, this does not change what can be achieved. Instead it clarifies both the ownership and mutation by making the user be explicit about what they want. If they want to zero out the data, mutating it, they can use std::ptr::replace on a *mut T, if they just want to read the data, they can use std::ptr::read, if the pointer isn't aligned they can use std::ptr::read_unaligned, etc... The intention is to put the responsibility onto the caller as to what the pointer is and how to "correctly" read it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Sounds good to me.

For the tests which use try_from_raw, could you try to make them use std::ptr::replace to import from raw pointers? This is the one of purposes of the tests to test importing from raw pointers as ArrowArray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test using std::ptr::read as that is actually all that is necessary, assuming the foreign API is transferring ownership.

//!
//! // import it
//! let array = unsafe { make_array_from_raw(array_ptr, schema_ptr)? };
//! let array = ArrowArray::new(out_array, out_schema);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_ptr, schema_ptr are somehow to simulate that the raw pointers from external (e.g. JVM in our usecase). Without make_array_from_raw, how can we import them as ArrowArray?

Copy link
Contributor Author

@tustvold tustvold Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let array = std::ptr::replace(array_ptr, FFI_ArrowSchema::empty());

Perhaps?

Or depending on the semantics

let array = std::ptr::read(array_ptr);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::ptr::replace ? Isn't it something we did in try_from_raw nowadays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the argument to try_from_raw is *const so there is a pretty reasonable assumption that it shouldn't mutate the provided data 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. How about make them as *mut? It sounds like we just remove std::ptr::replace usage and leave it to users like us. Although it is straightforward to write a importer function in our codebase, just wondering why we don't have it in upstream like this.

Copy link
Contributor Author

@tustvold tustvold Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I wanted to reduce the number of methods that pass around raw pointers, and wanted to avoid making a breaking change. I can change it if you feel strongly, but having two methods one of which has to have a long explanation of its ownership semantics, as it does a sort of DIY indirect aligned move, seemed like a code smell

arrow/src/ffi.rs Outdated
Comment on lines 70 to 74
//! impl ForeignArray {
//! /// Export to C Data interface
//! fn export(&self, array: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) {
//! // ...
//! }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import? I assume that idea is to import from foreign array into C Data Interface.

Suggested change
//! impl ForeignArray {
//! /// Export to C Data interface
//! fn export(&self, array: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) {
//! // ...
//! }
//! impl ForeignArray {
//! /// Import to C Data interface
//! fn import(&self, array: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) {
//! // ...
//! }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was modelling it after the export_to_c function found in python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked it to hopefully be a bit clearer, PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks.

@tustvold tustvold merged commit bb4fc59 into apache:master Feb 9, 2023
@ursabot
Copy link

ursabot commented Feb 9, 2023

Benchmark runs are scheduled for baseline = 98ce68f and contender = bb4fc59. bb4fc59 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArrowArray::try_from_raw Misleading Signature PyArrowConvert Leaks Memory
3 participants