-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10149: [Rust] Improved support for externally owned memory regions #8316
Conversation
@@ -253,20 +156,6 @@ impl Buffer { | |||
self.len() / mem::size_of::<T>(), | |||
) | |||
} | |||
|
|||
/// Returns an empty buffer. | |||
pub fn empty() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not being used, and thus I dropped it.
rust/arrow/src/bytes.rs
Outdated
let a = Box::new(b"hello"); | ||
|
||
let dealloc = Arc::new(|bytes: &mut Bytes| { | ||
// println!(""); seems to be the only way to validate that this is actually called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps using a closure that mutates a captured variable?
Apparently you need to use either FnMut
or a Cell
:
https://stackoverflow.com/questions/38677736/passing-a-closure-that-modifies-its-environment-to-a-function-in-rust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I had to introduce a Mutex
, because a FnMut
is now mutable, while the data itself is not. I am not very happy with this, but it makes sense as we cannot assume that the C data interface is thread-safe, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted this. I do not think that that Fn
should be mutable, as it is just performing an FFI call, over which Rust does not need to know about mutability. I am still trying to test it, but I think that something like Arc<dyn Fn(&mut Bytes)>
is a better signature,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use a Cell
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let b = Cell::new(false);
let dealloc = Arc::new(|bytes: &mut Bytes| {
*b.get_mut() = true;
assert_eq!(bytes.as_slice(), &b"hello"[1..4]);
});
does not compile because it requires moving b
to inside the closure: if we move b
to inside the closure (using move
), the closure is no longer Fn
, but FnMut
. If the closure is FnMut
, we can no longer wrap it inside an Arc
, as Arc
is immutable. To make it immutable, we need to wrap it around a Mutex
.
The difference between the code we are going here and the example in SO is that we have an immutable function, as the underlying resource that this function acts upon is outside rust. I.e. from rust's perspective, the function is Fn
.
One way to test this would be to make the closure to write something to a file, and verify that that was written. I.e. test that the function mutated something outside of Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After trying out varying things, I got the following to work:
use std::cell::Cell;
use std::sync::Arc;
pub type VoidFn = Arc<dyn Fn()>;
fn main() {
let integer = Arc::new(Cell::new(5));
let inner = integer.clone();
let closure = Arc::new(move || {
inner.set(inner.get() + 1);
});
execute_closure(closure);
println!("After closure: {}", integer.get());
}
fn execute_closure(func: VoidFn)
{
func();
}
I may be missing something though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, by definition a destructor will mutate some state (visible or not), so it seems FnMut
may be fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is awesome! Thanks a lot for the help and insight @pitrou .
The destructor mutates state. Shouldn't that state be only the state of self
? IMO that is the reason rust's Drop
requires a mutable ref of self, fn drop(&mut self)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, I am out of my depth here (not a Rust developer :-)).
The existing implementation was not useful to support FFI as it did not specify how to release memory.
Closing in favor of #8401 |
Background
Currently, a memory region (
arrow::buffer::BufferData
) always knows its capacity, that it uses todrop
itself once it is no longer needed. It also knows whether it needs to be dropped or not viaBufferData::owner: bool
.However, this is insufficient for the purposes of supporting the C Data Interface, which requires informing the owner that the region is no longer needed, typically via a function call (
release
), for reference counting by the owner of the region.This PR
This PR generalizes
BufferData
(and renames it toBytes
, which is more natural name for this structure, a-labytes::Bytes
) to support foreign deallocators. Specifically, it accepts two deallocation modes:Native(usize)
: the current implementationForeign(Fn)
: an implementation that calls a function (which can be used to call a FFI)FYI @pitrou , @nevi-me @paddyhoran @sunchao
Related to #8052 , which IMO is blocked by this functionality.