-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[discuss] A proposal for passing around object instances #419
Comments
A note to self, if this reverse identity map is going to conceptually map multiple |
This sounds great and makes alot of sense. Ryan and I chatted a little and where I landed was: My initial reaction was concern about dropping the safety features of handle-maps, particularly the panic handling and "handle poisoning". However, as #244 mentions, this is less of a problem for generated code. I've a vague concern that not all languages will prevent "accidental misuse" (eg, managing to take a copy of the raw pointer without the generated code knowing about it would be catastrophic) but Ryan reminded me that all our target languages already have libraries which wrap pointers and manage to do so safely. Panic safely can still be generated. So this sounds good to me. Ryan also admitted that ideally he'd like to see non-threadsafe support dropped, so all locking primitives end up in explicit, hand-written rust code - most likely via mutexes. The pros of this include:
The cons are:
On balance, I'm inclined to support dropping support for the "non thread-safe" case, and if we can agree to that, we should just do it (ie, instead of it being an asprirational "eventually we'd like to", we should just do it now). We can always add stuff later. Re the identity map: I'd be inclined to put this in the "nice to have" rather than making it an absolute requirement - primarily because it doesn't seem like it would be a breaking change - ie, if 2 objects that should, but not not, compare correctly with I guess my final thought is that this is still somewhat abstract, and that we should try and work on the canonical demo now - an actual UDL and document the semantics. OTOH, it's not clear this will be hugely valuable because some of the edge-cases or things we haven't considered might remain hidden. However, it's likely to give us something more concrete to hang our opinions on, which can't hurt. I said I'd have a poke at this, but I'm not sure I'll get far this week. |
This looks very promising/exciting! Off the bat, I will third the decision to drop support for non- I'm a little bit confused around the safety of passing
If we are leaving the locking to the hand-written Rust crate, i.e. I'm also not sure what the default would be: what does If that's right, then we can reduce the UDL surface to
So far, so good. Getting rid of handlemaps seems sensible, but if I have a constructor (or a factory method):
i.e. a Rust Now with no handle maps, who holding the Conceptually, I'm arriving at a place where we have only constructors handing ownership to the foreign languages side, and if Rust needs it back, it will need to be done via an The Reverse Identity MapI have a few thoughts on this; I like @mhammond 's YAGNI noises here; however, we should try and take steps to aggressively document this. e.g. forcing an
Alternative spellings for I think I share the unease about the handle/identity map; without further annotation of the objects being passed in or returned out related to their use, I think it'll be hard to make a good general solution. Without thinking too hard about this, I can imagine an annotation to guide uniffi on which side of the FFI the object (or its twin) will spend most of its time.
|
For example, if one foreign-language thread is doing a method call while another thread destroys the object? I think ideally the foreign-language bindings would not allow this, perhaps because they can only destroy the object in the foreign-language destructor. But you're right, we'll have to have a story about how to prevent this As long as the foreign-languge code doesn't destroy the object while a method-call is in flight, we should be OK. I reckon we could guard against this with some foreign-language side locking around the cloning of the arc if we need to.
We also have the option of rejecting that syntax, and insisting on an explicit annotation.
Conceptually, calling Concretely, putting the
This sounds right to me, yes.
TBH this doesn't sound that useful to me at first glance, but then again, I spend a lot more time on the other side of the boundary. It's not clear to me what the consumer would do with this information - assuming we don't support the reverse handlemap at all, then every single object instance returned from the UDL is going to be a fresh object, and the annotation doesn't tell you what it's a copy of. Maybe we need a more concrete consumer example here to help make the discussion more concrete? |
I could've been clearer: this would be in the context where we're don't provide any bookkeeping of the foreign language objects that have been passed into rust, and then out again, i.e. reverse identity maps. Without the identity mapAs @mhammond points out; these objects would be equal ( My
With the identity map.The other attributes ( |
I played with this primarily to better understand how uniffi actually worked and to try and get enough context to understand the challenges. It actually worked out surprisingly well (although is nowhere near mergable) The context for this is Ryan's [proposal for passing around object instances]( mozilla#419) but I've only played with "step 1" - replacing Handlemaps with `Arc<>` or `Arc<Mutex<>>`. I kept support for both threadsafe and non-threadsafe interfaces, primarily to avoid touching all the examples. It turns out the difference between `[Threadsafe]` and not is quite trivial. This might change in the future when we start supporting those other attributes, but maybe not. I'm still on the fence about this. There are some nasty hacks. The most obvious is using both `usize` and `u64` for pointer values. `usize` is probably correct, but `u64`is still handed around due to the lack of `usize` in Types. I tried using `const *TypeName` but this meant that `TypeName` needed to be `pub`, which isn't true in all the examples. You could argue that forcing these types to be `pub` is actually the right thing, but I don't care enough to actually do that :) It's also incomplete. I'm currently failing to run the kotlin tests (they fail in the same way in both this branch and on main - WSL FTW! :) (I very lightly edited the code blocks below - removing comments and logging) What it generates: in the non-threadsafe case, the constructor is: ```rust pub extern "C" fn threadsafe_ffd6_Counter_new( err: &mut uniffi::deps::ffi_support::ExternError, ) -> usize /* *const Counter */ { match std::panic::catch_unwind(|| { let _new = std::sync::Mutex::new(Counter::new()); let _arc = std::sync::Arc::new(_new); std::sync::Arc::into_raw(_arc) }) { Ok(ptr) => { *err = uniffi::deps::ffi_support::ExternError::default(); ptr as usize } Err(e) => { *err = e.into(); 0 as usize /*`std::ptr::null()` is a compile error */ } } } ``` The threadsafe case is identical except for 1 line: ```rust let _new = ThreadsafeCounter::new(); ``` (ie, no mutex) The `free()` function is also quite trivial and identical in both cases ```rust pub extern "C" fn ffi_threadsafe_ffd6_Counter_object_free(ptr: u64) { if let Err(e) = std::panic::catch_unwind(|| { assert_ne!(ptr, 0); // turn it into an Arc and let the value naturally drop. unsafe { std::sync::Arc::from_raw(ptr as *const Counter) }; }) { uniffi::deps::log::error!("ffi_threadsafe_ffd6_Counter_object_free panicked: {:?}", e); } } ``` The generated code for methods does have more variation between the 2, but not much. A non-threadsafe method: ```rust pub extern "C" fn threadsafe_ffd6_Counter_busy_wait( ptr: u64, ms: i32, err: &mut uniffi::deps::ffi_support::ExternError, ) -> () { uniffi::deps::ffi_support::call_with_output(err, || { let _arc = unsafe { std::sync::Arc::from_raw(ptr as *const std::sync::Mutex<Counter>) }; // This arc now "owns" the reference but we need an outstanding reference still. std::sync::Arc::into_raw(std::sync::Arc::clone(&_arc)); let _retval = Counter::busy_wait( &mut *_arc.lock().unwrap(), <i32 as uniffi::ViaFfi>::try_lift(ms).unwrap(), ); _retval }) } ``` for contrast, the thread-safe version: ```rust pub extern "C" fn threadsafe_ffd6_ThreadsafeCounter_busy_wait( ptr: u64, ms: i32, err: &mut uniffi::deps::ffi_support::ExternError, ) -> () { uniffi::deps::ffi_support::call_with_output(err, || { let _arc = unsafe { std::sync::Arc::from_raw(ptr as *const ThreadsafeCounter) }; // This arc now "owns" the reference but we need an outstanding reference still. std::sync::Arc::into_raw(std::sync::Arc::clone(&_arc)); let _retval = ThreadsafeCounter::busy_wait(&*_arc, <i32 as uniffi::ViaFfi>::try_lift(ms).unwrap()); _retval }) } ``` (As in the current version, there are small differences depending on whether an error is returned or not, but that's not particularly interesting here) The "keep the Arc alive by leaking a clone" wasn't immediately obvious to me until Python kept crashing :)
The context for this is Ryan's [proposal for passing around object instances]( mozilla#419) By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality for free - `Arc` wrappers around Objects can suddenly appear in dictionaries and function params, and be returned from functions once we remove the explicit code that disallows it. I kept support for both threadsafe and non-threadsafe interfaces, primarily to avoid touching all the examples. This turned out to be quite easy in general, because obviously `Arc<T>` ends up getting support for `Arc<Mutex<T>>` for free. The mutex semantics for using these as params etc is almost certainly going to do the wrong thing - eg, acquiring a mutex when the object is passed as a param is going to end in tears at some stage. It's also incomplete (needs docs, kotlin tests for the new test cases, swift hasn't been touched at all, etc)
The context for this is Ryan's [proposal for passing around object instances]( mozilla#419) By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality for free - `Arc` wrappers around Objects can suddenly appear in dictionaries and function params, and be returned from functions once we remove the explicit code that disallows it. I kept support for both threadsafe and non-threadsafe interfaces, primarily to avoid touching all the examples. This turned out to be quite easy in general, because obviously `Arc<T>` ends up getting support for `Arc<Mutex<T>>` for free. The mutex semantics for using these as params etc is almost certainly going to do the wrong thing - eg, acquiring a mutex when the object is passed as a param is going to end in tears at some stage. It's also incomplete (needs docs, kotlin tests for the new test cases, swift hasn't been touched at all, etc)
The context for this is Ryan's [proposal for passing around object instances]( mozilla#419) By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality for free - `Arc` wrappers around Objects can suddenly appear in dictionaries and function params, and be returned from functions once we remove the explicit code that disallows it. I kept support for both threadsafe and non-threadsafe interfaces, primarily to avoid touching all the examples. This turned out to be quite easy in general, because obviously `Arc<T>` ends up getting support for `Arc<Mutex<T>>` for free. The mutex semantics for using these as params etc is almost certainly going to do the wrong thing - eg, acquiring a mutex when the object is passed as a param is going to end in tears at some stage. It's also incomplete (needs docs, kotlin tests for the new test cases, swift hasn't been touched at all, etc)
Combined with [ADR-0004)(https://github.com/mozilla/uniffi-rs/blob/main/docs/adr/0004-only-threadsafe-interfaces.md) this is the last significant decision to be made on the path to allowing us to [pass around object instances](mozilla#419).
Combined with [ADR-0004](https://github.com/mozilla/uniffi-rs/blob/main/docs/adr/0004-only-threadsafe-interfaces.md ) this is the last significant decision to be made on the path to allowing us to [pass around object instances](mozilla#419).
The context for this is Ryan's [proposal for passing around object instances]( mozilla#419) By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality for free - `Arc` wrappers around Objects can suddenly appear in dictionaries and function params, and be returned from functions once we remove the explicit code that disallows it. I kept support for both threadsafe and non-threadsafe interfaces, primarily to avoid touching all the examples. This turned out to be quite easy in general, because obviously `Arc<T>` ends up getting support for `Arc<Mutex<T>>` for free. The mutex semantics for using these as params etc is almost certainly going to do the wrong thing - eg, acquiring a mutex when the object is passed as a param is going to end in tears at some stage. It's also incomplete (needs docs, kotlin tests for the new test cases, swift hasn't been touched at all, etc)
The context for this is Ryan's [proposal for passing around object instances]( mozilla#419) By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality for free - `Arc` wrappers around Objects can suddenly appear in dictionaries and function params, and be returned from functions once we remove the explicit code that disallows it. I kept support for both threadsafe and non-threadsafe interfaces, primarily to avoid touching all the examples. This turned out to be quite easy in general, because obviously `Arc<T>` ends up getting support for `Arc<Mutex<T>>` for free. The mutex semantics for using these as params etc is almost certainly going to do the wrong thing - eg, acquiring a mutex when the object is passed as a param is going to end in tears at some stage. It's also incomplete (needs docs, kotlin tests for the new test cases, swift hasn't been touched at all, etc)
The context for this is Ryan's [proposal for passing around object instances]( #419) By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality for free - `Arc` wrappers around Objects can suddenly appear in dictionaries and function params, and be returned from functions once we remove the explicit code that disallows it. I kept support for both threadsafe and non-threadsafe interfaces, primarily to avoid touching all the examples. This turned out to be quite easy in general, because obviously `Arc<T>` ends up getting support for `Arc<Mutex<T>>` for free. The mutex semantics for using these as params etc is almost certainly going to do the wrong thing - eg, acquiring a mutex when the object is passed as a param is going to end in tears at some stage. It's also incomplete (needs docs, kotlin tests for the new test cases, swift hasn't been touched at all, etc)
The context for this is Ryan's [proposal for passing around object instances]( #419) By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality for free - `Arc` wrappers around Objects can suddenly appear in dictionaries and function params, and be returned from functions once we remove the explicit code that disallows it. It's also incomplete (needs docs, kotlin tests for the new test cases, swift hasn't been touched at all, etc)
Combined with [ADR-0004](https://github.com/mozilla/uniffi-rs/blob/main/docs/adr/0004-only-threadsafe-interfaces.md ) this is the last significant decision to be made on the path to allowing us to [pass around object instances](#419).
This work shipped in v0.12.0, via #462. |
Riffing on some conversations today with @mhammond, here are some partly-formed thoughts on how we might work towards enabling object instances to be passed as arguments and returned from functions/methods, resolving #40 and #197.
Broadly, we have two big complexities to grapple with here - the way that handlemaps make lifting/lowering codegen for object instances difficult, and the management of lifetimes and object identity across the ffi boundary. I think we may be able to resolve both of these with some judicious use of
Arc
.Step 1: Remove Handlemaps
First, we replace our current use of handlemaps with
Arc
. I expect this will be more performant, but really I'm interested in simplifying codegen in the bindings.For a threadsafe interface
T
, we put it in anArc<T>
just like we do today. But instead of putting theArc<T>
into a handlemap to turn it into an integer handle, we useArc::into_raw
to turn it directly into a pointer, and pass the pointer over the ffi to the foreign-language code. When the foreign-language code wants to call a method on the instance, it passes the pointer back to the Rust code and we useArc::from_raw
to reconstitute theArc
, and can use it like we currently do with the one in the handlemap. (We have to be careful about not dropping theArc
after reconstituting it, but this seems tractable).For a non-threadsafe interface
T
, we first make it synchronous by putting it in aMutex<T>
, then putting that inside anArc<Mutex<T>>
. We can then treat it the same way as for threadsafe instances above, converting to and from a raw pointer to pass over the FFI.I need to dig more into the memory-safety and thread-safety requirements of the above, but it seems like it should be tractable.
The aim of this is to remove the use of handlemap closures in order to get object references in Rust. Rather than generating code like:
We'd generate code like:
This resolves one of the big issues in #40 - if we pass multiple object instances as arguments over the FFI, we don't need to generate any nested closures to read items out of the handlemap, we can just generate straight-line code for each one.
For non-threadsafe interfaces, this doesn't solve the potential for deadlocking on the hidden
Mutex
es, but TBH, I think the right way to resolve that is to push all the locking down into Rust code where it's visible, and use threadsafe interfaces instead.Step 2: Allow objects in argument position, with annotations.
Given the above, there are three ways that a Rust function exposed via uniffi could accept an object instance as an argument:
&ObjectName
.&mut ObjectName
.Arc<ObjectName>
orArc<Mutex<ObjectName>>
.We'd need to represent each of these differently in the UDL so that we know how to codegen them, perhaps expanding our use of attributes like so:
method([ByRef] ObjectName argname)
method([ByMut] ObjectName argname)
method([ByArc] ObjectName argname)
The names are not great, but you get the idea.
Note that Rust's lifetime/ownership system helps us avoid doing anything silly here. If you accept a
[ByRef] ObjectName
, you can't mutate it and you can't stash a reference to it anywhere that would outlive the function call. We don't attempt to provide any cleverness in handling lifetimes at the FFI boundary, and the only way for Rust code to accept aT
that it can stash internally for later use, it to take its own reference-count-already-incremented owned copy of anArc<T>
.I expect most of the complexity here would be in deciding on how to expose this to consumers.
Note that the Rust code at this stage cannot return existing object instances back to the foreign-language code, so we don't get #197 yet, and also we don't have the instance-identity-mapping problem yet.
Step 3: Allow objects in return position, with a reverse identity map.
Since we don't attempt to provide any cleverness in handling lifetimes at the FFI boundary, I don't think we can allow uniffied functions to return references to objects. However, we could correctly handle two cases:
func() -> T
.func() -> Arc<T>
.In the first case, we know that the return value is the unique existing reference to the object. We can handle this in the same way we handle values returned from a constructor - by putting it in an
Arc<T>
, turning it into a pointer, and handing that back to the foreign-language code to create a new object wrapper around it. (This is the same safety argument I was trying to make over in #384 (comment), which perhaps is easier to form a cohesive mental model of in this more general setting).The second case is more interesting. The
Arc
might contain aT
that has never been seen before for the foreign-language bindings, or it might be a clone of an existingArc
that the foreign language bindings already hold a reference to. After turning thisArc
into a pointer and passing it back to the foreign-language code, I think we need some sort of reverse identity map that - check if the foreign-language code already has an Object wrapper for this pointer, return the existing Object if so, or create a new one if not. (N.B. you can have many clones of anArc<T>
but when you turn them into raw pointers, they all turn into the same pointer, being the address at which the sharedT
is allocated on the heap).Using
Arc
s, we can implement passing of object instances back and forth across the FFI while preserving object identity. Suppose we have something like this proposed interface for nimbus:The foreign-language code could do something like this:
In the Rust implementation, we'd have functions like:
And UniFFI would generate the necessary plumbing to connect them together.
That's a lot, but I think this would work, and can be tackled in a few incremental pieces. Curious to hear what others think, and especially if you can poke any holes in the hand-waving above. /cc @mhammond @jhugman in particular based on previous discussions, but all input welcome.
┆Issue is synchronized with this Jira Task
The text was updated successfully, but these errors were encountered: