-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple buffer deallocation from ffi and allow creating buffers from rust vec #1494
Conversation
…tion from rust vectors or strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is really cool -- nice work @jhorstmann
In terms of naming, I would like to suggest we keep it backwards compatible for a while (so as not to have to bump major versions for a bit)
So that would mean
- Keep the old interface named
from_unowned
, mark it deprecated, change impl to callfrom_foreign
- Add new
from_unowned()
that does what you suggest
Buffer::from_foreign( | ||
NonNull::new_unchecked(strings.as_mut_ptr()), | ||
strings.len(), | ||
Arc::new(strings), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems pretty neat to me that the ownership of the Vec
is transferred to Arc
which is then stored (as dyn Allocation
)
BTW I ran this under MIRI and it looks good: cargo +nightly miri test -p arrow -- test_string_data_from_foreigntest_string_data_from_foreign
...
running 1 test
test array::data::tests::test_string_data_from_foreign ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1072 filtered out
Doc-tests arrow
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 145 filtered out; finished in 0.03s
🎉 |
|
||
/// The owner of an allocation, that is not natively allocated. | ||
/// The trait implementation is responsible for dropping the allocations once no more references exist. | ||
pub trait Allocation: RefUnwindSafe {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trait bound on RefUnwindSafe
was needed to make one test compile. I'll need to check that in more detail and document why it is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explicit test that Buffer
is UnwindSafe
to make this requirement clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good idea to me. Thanks.
Codecov Report
@@ Coverage Diff @@
## master #1494 +/- ##
==========================================
+ Coverage 82.72% 82.74% +0.01%
==========================================
Files 188 188
Lines 54286 54389 +103
==========================================
+ Hits 44908 45002 +94
- Misses 9378 9387 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great @jhorstmann -- thank you. @viirya / @tustvold / @sunchao any last thoughts before I merge it in?
I believe it is an API change, so we will bump the major version of arrow to use it (but that is probably ok given stuff like #1510 which will require a new version as well
/// | ||
/// # Safety | ||
/// | ||
/// This function is unsafe as there is no guarantee that the given pointer is valid for `len` | ||
/// bytes and that the foreign deallocator frees the region. | ||
#[deprecated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There is a minor doctest failure: https://github.com/apache/arrow-rs/runs/5756218032?check_suite_focus=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks @jhorstmann
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
I merged this PR with |
I believe the clippy errors are due to rust 1.60 being released. I will create a new PR |
Which issue does this PR close?
Closes #1516
Rationale for this change
I'd like to process data that is coming from outside arrow using arrow compute kernels without copying the data.
The FFI api does not seem to support this use case at the moment. The only way I found to create an
FFI_ArrowArray
was with an already existingArrayData
struct. CreatingArrayData
requires at least oneBuffer
, and creating aBuffer
from foreign memory again needs anFFI_ArrowArray
.What changes are included in this PR?
The owner of an allocation is tracked via a trait object instead of an
FFI_ArrowArray
. This allows zero-copy interoperability between arrow buffers and most collection types from rust stdlib or other crates. The most common example would beVec
orString
, and it should also be possible to access arrow2 buffers this way.Are there any user-facing changes?
The function
from_unowned
was marked as deprecated and delegates to a new functionfrom_custom_allocation
. The "unowned" part is abit misleading since the data is owned by theFFI_ArrayArray
.The other changes should be api compatible. The
bytes
module ispub(crate)
, so moving theDeallocation
enum should be fine.