Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC for step 1 of #419 - replace Handlemaps with Arcs. #420

Closed
wants to merge 8 commits into from

Conversation

mhammond
Copy link
Member

@mhammond mhammond commented Mar 30, 2021

The context for this is Ryan's proposal for passing around object instances - which has steps 1 and 2 in a rough way - replacing Handlemaps with Arc<> (or to-be-deprecated Arc<Mutex<>>) and allowing them to be args and results.

This is turning out better than I hoped - for example, see this new test which exercises most of what this does.

I think the Kotlin generation works, but the Swift generation most certainly does not. @jhugman now you are back from PTO I'd love your feedback and to gauge your interest in collaborating on this? Note that my intent is to iterate slowly, just in time for Ryan to get back and review.

@mhammond mhammond requested review from rfk and jhugman March 30, 2021 04:26
Copy link
Collaborator

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty neat! I'm kind of strangely glad to see a SIGSEGV in the circle failures here, means we're definitely in raw pointer territory :-)

I'd like to consider whether we can embed the use of Arc here more thoroughly in the mechanics of how we work with object instances, so will share some rambling thoughts about how that might look architecturally.

First, observe that when we generate the support code for records, we generate an implementation of the ViaFfi trait that knows how to read/write then in a buffer. We currently do not implement ViaFfi for object types at all.

What if we did update ObjectTemplate.rs to produce a ViaFfi implementation for object types? It might look something like this:

// This won't work as written IIRC, because this code doesn't define `ViaFfi` and
// doesn't define `Arc`, so Rust's trait coherence rules will stop us from impling
// the trait on `Arc<ObjectName>`. But it's interesting to think about architecturally.
// Maybe we could define a newtype wrapper that would let this work?

unsafe impl uniffi::ViaFfi for Arc<ObjectName> {

    // When passing object instances over the FFI, we pass a pointer
    // that is the raw form of an Arc<ObjectName> owned by the foreign-language code.
    type FfiType = *const ObjectName;

    // If we return an `Arc<ObjectName>` to the foreign-language code, then it
    // owns the reference and is responsible for freeing it.
    fn lower(self) -> Self::FfiType {
        self.into_raw()
    }

    // If we want to receive an `Arc<ObjectName>` from the foreign-language code,
    // we have to make our own clone of it.
    fn try_lift(v: Self::FfiType) -> Result<Self> {
        // The received pointer is an Arc owned by foreign-language code, don't drop it.
        let foreign_arc = ManuallyDrop::new(unsafe { Self::from_raw(v) });
        // Take a (cheap) clone for our own use.
        Ok(Arc::clone(foreign_arc))
    }

    // If we want to send a data structure with an `Arc<ObjectName>` in one of its
    // fields, we pass ownership of a clone by passing the pointer.
    fn write<B: bytes::BufMut>(&self, buf: &mut B) {
        let foreign_arc = Arc::clone(self);
        buf.put_u64(foreign_arc.into_raw() as u64) // Probably need some sizeof() checks here :-/
    }

    // If we receive a serialized data structure with an `Arc<ObjectName>` in one of its
    // fields, we need to take a clone for our own purposes.
    fn try_read<B: bytes::Buf>(buf: &mut B) -> Result<Self> {
      Self::try_lift(buf.get_u64() as const* ObjectName)
    }
}

I wonder if something like this would help make the generated code any cleaner?

@@ -232,6 +232,7 @@ impl Constructor {
self.ffi_func.name.push_str("_");
self.ffi_func.name.push_str(&self.name);
self.ffi_func.arguments = self.arguments.iter().map(Into::into).collect();
// XXX - we need this to be some kind of pointer abstraction - `usize`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to work with raw pointers, I think the right thing to do here is add an FFIType::Pointer variant and then map it explicitly to the corresponding pointer abstractions in each foreign language. (The existing FFITypeRustCString variant actually already is sort of this, but specifically for a pointer-to-char).

uniffi_bindgen/src/templates/ObjectTemplate.rs Outdated Show resolved Hide resolved
&*{{- arg_name }}
{% else %}
&mut *{{- arg_name }}.lock().unwrap()
{% endif %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dereferencing/unwrapping logic...I wonder if it makes sense to move some or all of that into the definition of the lift_rs filter. It seems pretty analogous to me - this is the stuff that you have to do to go from the value that you get over the FFI, to the value that the Rust code needs to use at runtime.

{% endmatch %}
{% endmacro %}

{% macro get_arc(obj) -%}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think calling this clone_arc would be clearer about your intent.

let _arc = unsafe { std::sync::Arc::from_raw(ptr as *const std::sync::Mutex<{{ obj.name() }}>) };
{% endif %}
// This arc now "owns" the reference but we need an outstanding reference still.
std::sync::Arc::into_raw(std::sync::Arc::clone(&_arc));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using ManuallyDrop rather than leaking it back via into_raw, as I think it better captures the intent. If I had to write this code out by hand with lots of comments it would look something like:

// The pointed-to Arc is owned by the foreign-language code, so we mustn't drop it.
let _foreign_arc = ManuallyDrop::new(unsafe { Arc::from_raw(ptr as *const {{ obj.name() }}) });
// We want an owned copy of the Arc to pass internally to the Rust code.
let _arc = Arc::clone(&*_foreign_arc);

uniffi::deps::ffi_support::call_with_result(
err,
|| -> Result<_, {{ e }}> {
{% call get_arc(obj) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a macro for this, what if this was done via the existing "lift" mechanics, something like:

let _arc = {{meth.first_argument().name()|lift_rs(meth.first_argument().type_()) }}

Broadly, I think the more we can treat these object-typed arguments in the same way as we treat other arguments, the better.

let _new = std::sync::Mutex::new({{ obj.name() }}::{% call to_rs_call(cons) %});
{%- endif %}
let _arc = std::sync::Arc::new(_new);
std::sync::Arc::into_raw(_arc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know this isn't intended for merge, but I'm commenting to share my mental model of how the final code might look).

This needs a comment along the lines of "This copy of the Arc is leaked, and is conceptually now owned by the foreign-language code. It will only be dropped when the foreign-language code calls the matching desctructor".

{% let ffi_free = obj.ffi_object_free() -%}
#[doc(hidden)]
#[no_mangle]
pub extern "C" fn {{ ffi_free.name() }}(ptr: u64) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I really think we want to update the FFIType enum to let us use a pointer type in the argument here.

@rfk
Copy link
Collaborator

rfk commented Mar 30, 2021

// This won't work as written IIRC, because this code doesn't define ViaFfi and
// doesn't define Arc, so Rust's trait coherence rules will stop us from impling
// the trait on Arc<ObjectName>. But it's interesting to think about architecturally.
// Maybe we could define a newtype wrapper that would let this work?

Actually, if this seems useful then we might be able to have uniffi provide a default impl<T> ViaFfi for Arc<T> in the same way that it provides one for Vec<T>.

@mhammond mhammond force-pushed the 419-step1 branch 2 times, most recently from b83d272 to dde07a5 Compare April 13, 2021 03:27
@mhammond
Copy link
Member Author

Callbacks are broken - possibly for many reasons, but at least one is due to the constructors creating a Arc<Box<T>> but the dtor assuming a simple Arc<T>. It's not yet obvious to me why callbacks use a Box<> and whether an Arc might work ('cos if it did, I imagine the extra symmetry might be worthwhile).

I'm going to put this down for a few days, so let me know if anyone is interested in picking any of this up.

@mhammond
Copy link
Member Author

Now with an "identity map" for Python :) Assuming #431 and #429 aren't too controversial, once they land I'll rebase this to help make this simpler to review.

mhammond added 6 commits May 17, 2021 18:15
The context for this is Ryan's [proposal for passing around object instances]( mozilla#419)

By ipmlementing `ViaFfi` for `Arc<T>` we get a huge amount of functionality
for free - `Arc` wrappers around Objects can suddenly appear in dictionaries
and function params, and be returned from functions once we remove the explicit
code that disallows it.

I kept support for both threadsafe and non-threadsafe interfaces, primarily
to avoid touching all the examples. This turned out to be quite easy in
general, because obviously `Arc<T>` ends up getting support for `Arc<Mutex<T>>`
for free. The mutex semantics for using these as params etc is almost certainly
going to do the wrong thing - eg, acquiring a mutex when the object is passed
as a param is going to end in tears at some stage.

It's also incomplete (needs docs, kotlin tests for the new test cases,
swift hasn't been touched at all, etc)
This is just the rust side of the world, and functions must
return `std::sync::Arc<Self>` rather than plain `Self` - not
sure if that's a bug (not seamless and natural) or a "feature"
(by closing existing `Arc`s offers a path to an "identity" concept)

Note that it would be fairly simple to allow objects returned
by Rust to not be `Arc<>` because `Arc<T>` automatically supplies
`From<T>` - so we just need to generate a few `.into()` calls.
Obviously though, this will create a new `Arc` and not a clone of
an existing one, so identity will be lost.

It also more difficult to fix the other direction (ie, when the
FFI layer has a ptr to an `Arc<T>` but the rust code expects just `&T`)

It's possibly doable, but as above, it's not clear whether we actually
*want to* do that - asking the author to explicitly manage `Arc<>`s
seems less likely to cause surprises down the track - we made that
mistake before with Mutexes.
Copy link
Collaborator

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this is coming together, thanks for pushing on it! In particular I'm really happy with how small the diff seems to be, given the conceptual impact of the change. (Other language bindings still to come, obviously, but still...).

I'll try to find some time to try my hand at the Kotlin bindings to help move this forward and shake out any other issues.

How should our work here relate to the ADR in #430? Right now I can convince myself that we're just prototyping to help inform a decision in that ADR, but if we keep going we'll eventually get to a point where we're actually assuming the outcome of that discussion.

fn take_other(&self, other: Option<Arc<Self>>) {
*self.other.lock().unwrap() = match other {
None => None,
Some(arc) => Some(Arc::clone(&arc)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we already own the Arc<Self> in other at this point, so I don't think we need to clone here, and could just *self.other.lock().unwrap() = other. The current setup will increment the refcount for the Arc::clone, then immediately decrement it when other is dropped at the end of the function.

# Any test not terminating with zero objects alive will cause others to
# fail - this helps us work out which test kept things alive.
def tearDown(self):
self.assertEqual(get_num_alive(), 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 nice!

{# Look in our weak map for an instance already associated with this
pointer. If it exists we can return it, but the pointer we have
is a clone of an `Arc<>` so has a reference we need to destroy.
#}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we probably want a mutex or other concurrency-control here around the updates to the identity map.

/// When lowering, we have an owned `Arc<T>` and we transfer that ownership
/// to the foreign-language code, "leaking" it out of Rust's ownership system
/// as a raw pointer. The foreign-language code is responsible for freeing
/// this via TODO calling a special destructor for `T`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah! We already have one of these in obj.ffi_object_free().name() and it just needs to be updated, so there's not really anything "TODO" about this except for document it here.

@@ -108,10 +108,12 @@ def writeString(self, v):

{% when Type::Object with (object_name) -%}
# The Object type {{ object_name }}.
# Objects cannot currently be serialized, but we can produce a helpful error.
# We write the pointer value directly - what could possibly go wrong?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞🏻😅🤞🏻

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"What could possibly go wrong?" they ask, tempting fate...#457

/// This supports the `[ByRef]` attribute for arguments that should be passed
/// by reference in the generated Rust scaffolding.
/// This supports the `[ByRef]` and `[ByArc]` attribute for arguments that
/// should be passed by reference in the generated Rust scaffolding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the "passed by reference" part of this comment will need re-wording.

}

impl TryFrom<&weedle::attribute::ExtendedAttributeList<'_>> for ArgumentAttributes {
type Error = anyhow::Error;
fn try_from(
weedle_attributes: &weedle::attribute::ExtendedAttributeList<'_>,
) -> Result<Self, Self::Error> {
// XXX - how do we ensure both `ByRef` and `ByArc` aren't specified?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the InterfaceAttributes struct has an extremely dumb check to ensure that [Threadsafe] and [Enum] are not applied to the same interface.

@@ -184,10 +188,12 @@ impl APIConverter<Argument> for weedle::argument::SingleArgument<'_> {
Some(v) => Some(convert_default_value(&v.value, &type_)?),
};
let by_ref = ArgumentAttributes::try_from(self.attributes.as_ref())?.by_ref();
let by_arc = ArgumentAttributes::try_from(self.attributes.as_ref())?.by_arc();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually wondering if [ByArc] is really necessary on arguments, given that we've simplified the proposal down quite a bit. If you're passing something that's not an Object instance then [ByArc] is invalid. If you're passing something that is an Object instance then you have to be passing it in an Arc. So I'm not sure what value the annotation provides, except maybe for reminding the component author that they need to accept an Arc<T> instead of a T. (But even then, the Rust compiler will also remind them of this fact).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're passing something that is an Object instance then you have to be passing it in an Arc.

Hrm, I guess this isn't necessarily true - the bindings could take an &T out of the Arc<T> and pass it to the hand-written Rust code. But we could consider having f(T arg) mean "receives an owned value in an Arc" and f([ByRef] T arg) mean "receives a borrowed reference", which is how it works for non-Object types.

@@ -327,6 +332,10 @@ impl Method {
self.attributes.get_throws_err()
}

pub fn self_by_arc(&self) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parsing center of my brain wants the name of this to be worded as a predicate about the object, e.g. takes_self_by_arc or similar. But then I can't really articulate a cohesive style guide for the way things are named in here, so I'm just going to note it and move on.

@mhammond
Copy link
Member Author

How should our work here relate to the ADR in #430?

Funny you should ask - I was just making note about that. When we chatted in person, I mentioned that I thought it would be possible to still use a HandleMap in this world (ie, we'd not strictly need #430) - we'd still use Arc as the values in the map, but could hand out handles to consumers. I think we were both in agreement that there's not a huge amount of value there, but left the door open.

Thinking more though, I think that would work OK right until the last commit in the set - the one that provides "identity" for Python. That relies on the exact same value for the same objects, and the existing handle maps do not offer that - it probably could with more complex hoops, but I don't think that's worthwhile. So assuming we want to lean in on offering identity for languages which make it possible, I'm inclined to suggest we use this justification to push forward with #430.

@rfk
Copy link
Collaborator

rfk commented May 20, 2021

So assuming we want to lean in on offering identity for languages which make it possible,
I'm inclined to suggest we use this justification to push forward with #430.

Sold! I'll take another look at that ADR next, and if it's welcome, perhaps put up a PR to your PR with some more thoughts.

@mhammond
Copy link
Member Author

Sold! I'll take another look at that ADR next, and if it's welcome, perhaps put up a PR to your PR with some more thoughts.

Obviously welcome! We could collaborate on the same branch, but I guess PRs on PRs gives us some kind of audit-trail for reviews - I review your PRs into my PR, you review my PR into main :)

InternalError,
_UniFFILib.{{ obj.ffi_object_free().name() }},
handle
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I just understood what you're doing here, and it's neat.

In my head I was worried about the asymmetry where we return a fresh clone of the Arc<T> on the Rust side but collapse them all into a single instance on the foreign-language side. I thought we might have to keep track of the number of Arc<T> that had been merged into this instance, so that we could call ffi_object_free the appropriate number of times. But immediately destroying the clones is much simpler (and really reinforces the "The Prestige" metaphor that @jhugman used to describe this identity map idea to me at one point...).

In the future we might be motivated to keep a count and destroy them all at once for performance reasons, but that seems likely to be a long ways off.

rfk and others added 2 commits May 20, 2021 18:26
This is a nice little example of how to use object references,
and also happens to help us test a way that we want to use them
in a real consumer.
Add a singleton-style feature to the "todolist" example.
def test_arcs(self):
coveralls = Coveralls("test_arcs")
self.assertEqual(get_num_alive(), 1)
self.assertEqual(coveralls.strong_count(), 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding: the strong count is 2 because 1 reference is held by the foreign-language code, and one reference is held by the Rust code for the duration of the call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in this specific example, 1 is in coveralls._handle and the other is a new Arc created for self: Arc<Self>

@rfk
Copy link
Collaborator

rfk commented May 25, 2021

I'm going to close this in favour of a rebased and squashed commit in #462, which incorporates some parts of this that have already been taken onto main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants