Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a doc comparing UniFFI with diplomat #1146

Merged
merged 7 commits into from
Jan 6, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions docs/diplomat-and-macros.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Comparing UniFFI with Diplomat

[Diplomat](https://github.com/rust-diplomat/diplomat/) and [UniFFI](https://github.com/mozilla/uniffi-rs/)
are both tools which expose a rust implemented API over an FFI.
At face value, these tools are solving the exact same problem, but their approach
is significantly different.

This document attempts to describe these different approaches and discuss the pros and cons of each.
It's not going to try and declare one better than the other, but instead just note how they differ.
If you are reading this hoping to find an answer to "what one should I use?", then that's easy -
each tool currently supports a unique set of foreign language bindings, so the tool you should
use is the one that supports the languages you care about!

(There may even be a future where these 2 tools converge - that seems like a lot of work, but
might also provide a large payoff - more on this later)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think a difference not really covered in this document is the style of actual interface. UniFFI basically seems to be serializing types across FFI with this cool FFIConverter scheme, whereas Diplomat uses raw repr(C) so backends do not need much ceremony to read things. Both are valid approaches, but it's an interesting difference likely borne out of the choice of languages to prioritize (Diplomat cares heavily about C++)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and I think there may be some broader philosophical differences as well that feed into some of the technical differences. For example, I get the impression that UniFFI is happier to eat some performance overhead in the bindings, and that this comes from our initial focus on targeting managed languages. (As you say, both valid approaches, but an interesting difference).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Ryan notes later, I think this is a reflection of expedience and initial use-cases, but not necessarily a key difference between the tools at a high level - I think UniFFI could support "native" repr(C) types in some cases.

If you still think this is important to communicate after seeing my updated version, please give me a rough idea of what it should say and where :)

Disclaimer: This document was written by one of the UniFFI developers, who has never used
diplomat in anger. Please feel free to open PRs if anything here misrepresents diplomat.

# The type systems

The key different between these 2 tools is the "type system". While both are exposing Rust
code (which obviously comes with its own type system), the foreign bindings need to know
lots of details about all the types expressed by the tool.

For the sake of this document, we will use the term "type universe" to define the set of
all types known by each of the tools. Both of these tools build their own "type universe" then
use that to generate both Rust code and foreign bindings.

## UniFFI's type universe
UniFFI's model is to parse an external ffi description from a `.udl` file which describes the
entire "type universe". This type universe is then used to generate both the Rust scaffolding
(on disk as a `.rs` file) and the foreign bindings.

**What's good about this** is that the entire type system is known when generating both the rust code
and the foreign binding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be obvious why "the entire type system is known when generating" is a good thing, I wonder if it's worth saying a few more words here about how it e.g. allows additional safety assurances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the point I had in mind was simply that the foreign bindings need to know the names and types of struct elements so it can recreate the same struct on the other side. But I've tried to capture that in the new version.


**What's bad about this** is that the external UDL is very ugly and redundant in terms of the
implemented rust API.

## Diplomat's type universe

Diplomat defines its "type universe" (ie, the external ffi) using macros.

**What's good about this** is that the "ffi module" defines the canonical API and it is defined in
terms of Rust types - the redundant UDL is removed. The Rust scaffolding can also be generated
by the macros, meaning there are no generated `.rs` files involved.

Ryan even tried this for UniFFI in [#416](https://github.com/mozilla/uniffi-rs/pull/416) - but we
struck **what's bad about this**: the context in which the macro runs doesn't know about types defined
mhammond marked this conversation as resolved.
Show resolved Hide resolved
outside of that macro, which are what we need to expose.

## Limitations in the macro approach

Let's look at diplomat's simple example:

```rust
#[diplomat::bridge]
mod ffi {
pub struct MyFFIType {
pub a: i32,
pub b: bool,
}

impl MyFFIType {
pub fn create() -> MyFFIType { ... }
...
}
}
```

This works fine, but starts to come unstuck if you want the types defined somewhere else. In this trivial example, something like:
mhammond marked this conversation as resolved.
Show resolved Hide resolved

```Rust
pub struct MyFFIType {
mhammond marked this conversation as resolved.
Show resolved Hide resolved
pub a: i32,
pub b: bool,
}

#[diplomat::bridge]
mod ffi {
impl MyFFIType {
pub fn create() -> MyFFIType { ... }
...
}
}
```

fails - diplomat can't handle this scenario - in the same way and for the same reasons that Ryan's
[#416](https://github.com/mozilla/uniffi-rs/pull/416) can't - the contents of the struct aren't known.

From the Rust side of the world, this is probably solvable by sprinkling more macros around - eg, something like:

```Rust
#[uniffi::magic]
pub struct MyFFIType {
pub a: i32,
pub b: bool,
}
```

Might be enough for the generation of the Rust scaffolding. However, the problems are in the foreign bindings.

## How the type universe is constructed for the macro approach.

In both diplomat and [#416](https://github.com/mozilla/uniffi-rs/pull/416), the approach taken
is that the generation process wants a path to the Rust source file that contains the module in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diplomat takes in a whole crate, not just a module. there may be multiple #[diplomat::bridge] tagged modules in the crate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be multiple #[diplomat::bridge] tagged modules in the crate

@mhammond @badboy I wonder how much of a difference this would make to a use-case like Glean, if you could e.g. have one #[diplomat::bridge] declaration per metric type but the tool knows how to find them all and expose them as a single API surface when generating the bindings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly worth experimenting with, especially if we can avoid the "module" limitation - eg, if all types needed [uniffi::magic] then maybe there's actually no reason to insist on a wrapping module? Regardless, I've tried to capture this idea in the doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the 416 approach that would still have not worked, because we hash all input to put that into the generated function names as a way to prevent use of the wrong version of a library.
A single #[diplomat::bridge] invocation cannot know all input and thus can't generate one hash.
Of course this is a limitation we built ourselves, so maybe could be lifted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading quickly how diplomat works that same hashing could be applied to only the limited bridged module while still giving the same benefits.
Given that diplomat-tool parses the same code, it would come up with the same hash and know what functions to call in the foreign language code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the reason we have the module wrapper is mostly for convenience, because we need to look at imports as well sometimes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for clarifying - and while that's interesting context for this discussion, I don't think a discussion of the hashing needs to go into that document. LMK if you disagree though.

Copy link
Member

@badboy badboy Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving that out sounds right to me.
It's a thing we will need to consider if we try the macro approach again, but not necessary to discuss in comparison.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading quickly how diplomat works that same hashing could be applied to only the limited bridged module
while still giving the same benefits.

Good point, and in retrospect this thing about including a hash of the component in the name of the generated function probably deserved its own separate design doc. A summary of what's in my head based on the above, in case some of the details aren't obvious:

The main goal of naming our FFI functions like component_<hash>_do_the_thing instead of just component_do_the_thing is to guard against undefined behaviours if foreign-language bindings for version X of the interface are accidentally used with a .so for version Y of the crate. That is: if the details of how to call one of the FFI functions change, then we want the generated name of that function to change.

We don't necessarily have to achieve that by using a single <hash> of the whole API surface. If we split the FFI into several independent macro invocations, then as long as they're self-contained then they could all safely use their own individual hash.

question - in the example above, the `ffi` module annotated with `#[diplomat:bridge]`. They both
use the `syn` crate to parse the Rust code inside this module, build their type universe, then
generate the foreign bindings.

In our problematic example above, this process never sees the layout of the `MyFFIType` struct,
and nor does it see any macros annotating them.

For this approach to work, it would be necessary for this process to compile the entire crate,
mhammond marked this conversation as resolved.
Show resolved Hide resolved
including depedent crates - the actual definition of all the types might appear anywhere.
Not only would this be slow, it's not clear it could be made to work - it might be reasonable to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have plans for this, see rust-diplomat/diplomat#34 (I added a clarifying comment that explains the plans in more depth).

have constraints on what can appear in just the `ffi` mod, but if we started adding constraints
mhammond marked this conversation as resolved.
Show resolved Hide resolved
to the entire crate, the tool would become far less useful.

This is the exact same problem which caused us to decide to stop working on
[#416](https://github.com/mozilla/uniffi-rs/pull/416) - the current world where the type universe
is described externally doesn't have this problem - only the UDL file needs to be parsed when
generating the foreign bindings - Rust code isn't considered. The application-services team has
concluded that none of our non-trival use-cases for UniFFI could be described using macros,
so supporting both mechanisms is pain for no gain.

As noted in #416, `wasm-bindgen` has a similarly shaped problem, and solves it by having
the Rust macro arrange for the resulting library to have an extra data section with the
serialized "type universe" - foreign binding generation would then read this information from the
already built binary. This sounds more complex than the UniFFI team has appetite for at
the current time.

## Is this a problem for users of diplomat? Will diplomat solve it?

I couldn't find real examples using diplomat, so it's difficult to know if this
mhammond marked this conversation as resolved.
Show resolved Hide resolved
is a problem in practice. UniFFI came from a world where we had Rust crates and
a hand-written FFI that exposed types from all over the crate. If these tools
had started with the limitations from the macro approach in mind, it's possible
a different, acceptable design might have been made to work. Maybe duplicating
some structs and supplying suitable `Into` implementations might make things workable?
mhammond marked this conversation as resolved.
Show resolved Hide resolved

Diplomat comes from a very smart team. They may well come up with a novel solution, so
UniFFI should track the progress of that project to see what we can gleefully steal
in the future. As discussed below, a kind of "hybrid" approach might even be possible.

# Looking forward

Before looking forward, let's step back a little - both UniFFI and diplomat are solving the exact
same use-cases, just using a different approach to defining the type universe.
But if we ignore that, the tools take the same basic approach - they all build the
type universe, then use the representation of this type universe to define both Rust
and foreign bindings.

The type universe described by diplomat is somewhat "leaner" than that described by UniFFI -
Rust types are the first-class citizens in the universe. UniFFI defines an external type model -
for example, there's a `Type` enum where, for example, `Type::Record(Record)` represents a
Rust struct. In other words, diplomat's type world can not be divorced from Rust,
whereas UniFFI's already is.

That said though, there might be a future where merging or otherwise creating some
interoperability between these type universes might make sense. You could imagine
a world where you can use diplomat to describe your type universe, but use UniFFI's foreign
generation code to generate the Kotlin bindings. Similarly, a world where you use UniFFI
and UDL files to describe your type universe, but then use diplomat to generate
the NodeJS bindings.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plays a little bit into the discussions around having bindings generators live in separate crates, ref #299.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more seriously about this, I don't think that there's realistically much value in having a kind of two-way interoperability between UniFFI bits and Diplomat bits. It would require the projects to converge on a great many details of how the FFI layer works, and I don't think there are sufficiently many quick wins to fuel that kind of work. For example, it would almost certainly be quicker for Diplomat to gain a Kotlin backend by writing one from scratch than by trying to iterate the UniFFI Kotlin backend towards something that works for both.

However, what I could see happening in future is UniFFI becoming a kind of higher-level wrapper around Diplomat. I can imagine a Diplomat backend for UniFFI that converts a .udl file into a bridge module and then uses the Diplomat toolchain to generate bindings from it, keeping some of the additional affordances/conveniences we've built for our specific use-cases (e.g. around megazording).

An interesting thought experiment, anyway 😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I did get a bit carried away - I've tried to tone this down a little in the new version.


Or to put it another way, you could imagine a world where both tools are split into a
"describe the type universe" portion and a "build the bindings" portion, and these tools
could be used together.

Sadly, that looks like alot of work, so someone would probably need to find a compelling
actual use-case to perform this work.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just find it a really interesting technical challenge I guess... 😅


# Next steps for UniFFI

As much as some of the UniFFI team dislike the external UDL file, there's no clear path to
moving away from it. The macro approach is too limiting, and no other promising opportunities
have presented themselves. There's no clear alternative to UDL which allows a complex
type universe to be described, and at this stage, any replacement would need to be
compelling enough to make a change worthwhile, which is hard to imagine.

In the short term, the best we can probably do is to enumerate the perceived problems
with the UDL file and try to make them more ergonomic - for example, avoiding repetition of
`[Throws=SomeError]` would remove alot of noise, and some strategy for generating

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diplomat just treats documentation as "yet another backend", which works reasonably well, since the architecture of a diplomat backend is just "here's the type structure, you know what to expect on the FFI layer, do what you want".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By contrast, the only reason that we don't already have documentation as yet-another-backend in UniFFI, is that the off-the-shelf parser that we use for the IDL throws away comments by default :-(

documentation might go a long way.