-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Clone and copy source structs internally #1449
Conversation
let array_mut = array as *mut FFI_ArrowArray; | ||
let schema_mut = schema as *mut FFI_ArrowSchema; | ||
|
||
let array_data = std::ptr::replace(array_mut, FFI_ArrowArray::empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually after thinking more on this, it seems this won't address the original problem neither. It basically just calls drop
on FFI_ArrowArray
(which is empty), but doesn't free the memory pointed by array
and schema
.
+-------+
| array |
+-------+ +----------------------------+
| | |
+----------------->| FFI_ArrowArray | <- memory leaked
| |
+----------------------------+
For instance, if array
and schema
are from Arc::into_raw
, then the memory allocated for the Arc
will become dangling after this, and thus memory leak.
I'm thinking whether we'll need two APIs, one where we are able to take the ownership of the memory allocated for the array
and schema
(e.g., exported by Arc::into_raw
from Rust itself), and one where we cannot take the ownership (e.g., memory was allocated by other languages such as Java), and thus requires the exporter to free the memory by itself later.
For the latter, we can clone the content for FFI_ArrowArray
and FFI_ArrowSchema
, and set the content of the original array
and schema
to be FFI_ArrowArray::empty()
and FFI_ArrowSchema::empty()
so that the exporter can just safely free the memory later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, if array and schema are from Arc::into_raw, then the memory allocated for the Arc will become dangling after this, and thus memory leak.
Currently if user try to export using into_raw and and don't import using from_raw (we can assume it's a normal case? as they export data to be used by others they don't need to import again), they might have memory leak.
After check the CPP-import implementation, I think this change is fine. We even can remove the two drop_in_place call as it seems unnecessary.
What we need is to redesign the ArrowArray::into_raw(), we can't use Arc::into_raw in the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea the drop_in_place
here seems unnecessary.
I'm not sure if it's possible to redesign ArrowArray::into_raw
though, since after exporting the array, we need to free up the memory allocated for FFI_ArrowArray
. However this can only be done after the exported array is imported via FFI_ArrowArray::try_from_raw
, which we don't know when.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct that with one single API, we cannot deal with both cases: raw pointers from Arc and not from Arc.
I'm not sure two separate APIs is good. With a single API, we can ask users to take care of releasing the raw pointers (either Arc or not) by themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However this can only be done after the exported array is imported via FFI_ArrowArray::try_from_raw, which we don't know when.
- It may not be imported in rust via FFI_ArrowArray::try_from_raw, it can be imported by other language sdk
- We don't need to know, the user should import it somewhere or free them if needed, that's why we can't use Arc::into_raw because we don't know how user might use them. This API should be fired and done, shouldn't expect user always do something like try_from_raw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another one allowing importer to allocate memory for exporter
If Rust side is importer, we already have it as we can do it now by creating empty structs, passing raw pointers to exporter.
If Java side is importer, we may need an export API which takes raw pointers from Java and replaces its content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. Do you plan to add the export API in this PR, or separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be straightforward to add, let me add it here. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new API export_into_raw
. Please check it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to catch up with these discussions since I'll soon have a need to create Buffers from foreign memory. The cast to *mut
followed by std::ptr::replace
here doesn't look safe to me. When the pointer is coming from an Arc
that seems to violate rust's unique ownership rules.
Codecov Report
@@ Coverage Diff @@
## master #1449 +/- ##
==========================================
+ Coverage 82.67% 82.71% +0.03%
==========================================
Files 185 187 +2
Lines 53866 54175 +309
==========================================
+ Hits 44535 44811 +276
- Misses 9331 9364 +33
Continue to review full report at Codecov.
|
arrow/src/ffi.rs
Outdated
@@ -802,6 +810,41 @@ impl ArrowArray { | |||
pub fn into_raw(this: ArrowArray) -> (*const FFI_ArrowArray, *const FFI_ArrowSchema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we delete this API, as this no longer a pair with the try_from_raw() method.
If we leave it here, the only way to avoid memory leak is user use Arc::from_raw() 2 times by themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend not to delete it as its usage is different. We actually use this API internally and we manage the Arc pointers manually. This save us a round trip from Java.
Good work, seems this is the minimum change we can make this right |
It'd be nice if we can include this in 11.0.0 and 10.0.1, although 11.0.0 RC1 is already under voting now. |
arrow/src/ffi.rs
Outdated
@@ -802,6 +810,35 @@ impl ArrowArray { | |||
pub fn into_raw(this: ArrowArray) -> (*const FFI_ArrowArray, *const FFI_ArrowSchema) { | |||
(Arc::into_raw(this.array), Arc::into_raw(this.schema)) | |||
} | |||
|
|||
/// exports [ArrowArray] to raw pointers of the C Data Interface provided by the consumer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Exports"
arrow/src/ffi.rs
Outdated
/// this [ArrowArray] to the location pointed by the raw pointers. Usually the raw pointers are | ||
/// provided by the array data consumer. | ||
pub unsafe fn export_into_raw( | ||
this: ArrowArray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this is src/array/array.rs
and have it to be a top-level function, paired to make_array_from_raw
?
The signature can be something:
pub unsafe fn export_array_into_raw(
src: &ArrayRef,
out_array: *mut FFI_ArrowArray,
out_schema: *mut FFI_ArrowSchema)
It's better to also update the usage doc at the top of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved. Please take another look. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I changed src: &ArrayRef
to src: ArrayRef
. It is already a Arc
, seems not necessary to take borrowed type on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @viirya !
Thanks @sunchao @wangfenjin for the detailed review! |
Merged, thanks! |
Thanks @sunchao |
Since we'll release a new arrow version in ~2 weeks, this will be included there. Hopefully we can avoid API changes and release it as 11.1 |
Which issue does this PR close?
Closes #1425.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?