[Propsal] Updating FFI naming and checksums #1271

bendk · 2022-06-10T13:50:32Z

This PR represents a proposal for changing how scaffolding functions are named. I believe it represents the consensus from the planning meeting and also will work with the new proc-macro code being developed. Please tell me if I'm wrong on either of those parts.

The main point of the PR is just to start a discussion and hopefully get agreement on the general plan. The motivation for this all is that I want to get the initial proc-macro PR that @jplatte put up merged. I think the FFI naming change is the most controversial part of it and if we can agree that this is the direction we want to go, then that PR can be merged (we can discuss more in that PR of course).

Updated how we calculate FFI names and especially how we calculate the
checksum part for functions. Now the checksum is only derived from the
function definition, not the entire component interface. This is needed
so the proc-macros can generate the checksums. The new system should
still prevent the possibility of the bindings using one function
definition when the scaffolding was build using a different one.

Added the concept of a uniffi scaffolding contract. We can bump that
whenever we make changes to how scaffolding calls work. The FFI names
depend on this value rather than the UniFFI version, which should
prevent linker errors when uniffi and uniffi-bindgen don't have the
exact same version.

Consolidated the code into one module. I think it's easier to
understand what's going on if you can see it in one place.

Removed some unneeded Result<> returns.

jplatte · 2022-06-10T14:06:57Z

uniffi_bindgen/src/interface/mod.rs

-    pub fn checksum(&self) -> u64 {
-        let mut hasher = DefaultHasher::new();
-        // Our implementation of `Hash` mixes in all of the public API of the component,
-        // as well as the version string of uniffi.
-        self.hash(&mut hasher);
-        hasher.finish()
-    }


Given you removed this, I think you can also remove the Hash impl further down in this file (and maybe a few others, too).

Good call. I removed that one and also the ComponentInterface.uniffi_version field. All the other manual Hash impls were on the very things we need (functions, methods, etc).

Updated how we calculate FFI names and especially how we calculate the checksum part for functions. Now the checksum is only derived from the function definition, not the entire component interface. This is needed so the proc-macros can generate the checksums. The new system should still prevent the possibility of the bindings using one function definition when the scaffolding was build using a different one. Added the concept of a uniffi scaffolding contract. We can bump that whenever we make changes to how scaffolding calls work. The FFI names depend on this value rather than the UniFFI version, which should prevent linker errors when uniffi and uniffi-bindgen don't have the exact same version. Consolidated the code into one module. I think it's easier to understand what's going on if you can see it in one place. Removed some unneeded Result<> returns. Updated the CI Rust version to 1.59, which is the current Firefox minimum version.

rfk

Some things to think about below, but FWIW I think this is a good direction to head in on balance 👍🏻

(Sorry for the sporadic drive-bys, but I do still enjoy poking in here from time to time 😁)

rfk · 2022-06-11T05:24:05Z

uniffi_bindgen/src/interface/ffi_names.rs

+//!     checksums that depend on things like function signatures to avoid the possibility of the
+//!     bindings calling into the scaffolding when they don't agree on the interface. The result
+//!     will be an ugly inscrutable link-time error, but that is a lot better than triggering
+//!     potentially arbitrary memory unsafety!


IIUC, there are some ComponentInterface changes that will no longer be covered by the checksum but might still lead to unsafe behaviour in the event of a version mismatch.

For example, suppose you have a function foo(bar: Bar) for some record type Bar, and you add or remove a field from Bar. The checksum calculated for foo will not change (it still takes a single argument of record type Bar) but its internal Bar-lifting code may expect to read a different set of fields than what the calling code sends in. This would probably result in a clean error given the strict length checks that we do in the deserialization code, but there are likely cases where it would silently do the wrong thing.

I haven't been following the proc-macro changes very closely, but I expect there's not much you can do to avoid that under the constraints of a proc-macro approach. Given just the tokens of a rust-level function signature, I don't think there's a good way to find out any of the internal details of its argument types.

It may be worth trying to enumerate any known cases that are not covered by the checksum as part of the docstring here, for completeness. It's already a best-effort attempt so that's probably good enough in practice.

Alternately, we could try to add additional guards for such cases. Some thought bubbles I've had in the past along these lines:

The serialization format could check the checksum at runtime; for example by insisting that the first two bytes of the serialized representation of Bar match the compiled-time-calculated checksum of Bar.

Every user-defined type Bar could produce a stub function _uniffi_typecheck_<checksum>_Bar that doesn't actually do anything, but that gets linked against at module load time. A linker error here would indicate that some internal details of the Bar type do not match the foreign code's expectations, triggering an early abort.

This is a really good point. In addition to those 2 solutions, I think we might be able to use some macro magic to include the checksum of the type definition in the scaffolding function name itself.

I think the macro magic approach could maybe work, you'd need to do something like this:

Define checksum macros that input a list of other macros followed by a list of arguments a list of arguments

Each macro would:

Append their local checksum to the list of arguments

Prepend checksum macros for any dependent types

Call the first macro in the list with the arguments

The final macro would append the checksum to the function name

To define a scaffolding function, you'd put together a stack like this: first_type_checksum!(second_type_checksum, append_checksum_to_function_name, { function def })

So, it's possible but I don't think it's a good idea. In addition to the complications on the macro side, if there was ever a cycle between the types then it would go into infinite recursion.

I think adding checksums to RustBuffers could present issues down the road if we ever wanted to move to using C structs or something to store those. Also it wouldn't handle the Object.

Adding the stub functions and checking when we load the dylib seems like a great way to handle this to me. Maybe they could just be a static u8s?

Indeed, no need for them to be actual functions, just symbols of some sort.

Also it wouldn't handle the Object.

Hrm, indeed...if we wanted to check that at runtime we'd have to either widen the pointers or use some pointer-tagging shenanigans to encode the checksum, which seems likely to be more complexity than it's worth.

tarikeshaq

Thanks a ton, @bendk!! I'm hopping in to say that this aligns with what we discussed when we chatted about versioning and that I'm +1 to moving forward here 🚀 (and tackling Ryan's comment 😝)

tarikeshaq · 2022-06-15T06:58:46Z

uniffi_bindgen/src/interface/ffi_names.rs

+/// API contract version.  Bump this if you make any changes to how the foreign bindings calls the
+/// scaffolding functions, for example the semantics of `RustCallStatus` or how types are
+/// lifted/lowered.
+const UNIFFI_API_VERSION: u32 = 0;


I really like this, and when the proposal is ready to land, it would be great if we can document this somewhere where uniffi developers can easily see it as a reminder

Totally agree, but I'm not sure where it should be. Do you have any suggestions?

I imagine there are a few ways to do this, but one thing we can do is explicitly identify what our contracts are, then add a warning around them:

// ⚠️ If you modify this struct/function/etc you need to update the `UNIFFI_API_VERSION` in `ffi_names.rs` ⚠️

(which makes me wonder if ffi_names.rs is the right spot to keep the UNIFFI_API_VERSION)

We can also modify the pull-request template to remind developers that if they changed any of the contracts they should bump the version - at the end of the day I think it'll be best-effort, and ideally, if we forget, the ~~tests would probably fail~~ well not our tests, but consumers will get some wild behaviour, so ya lets make sure this can't happen 😅

mhammond

Ryan makes a couple of great points:

Because this isn't going to cover as much as it ideally would, there's a risk of the checksum not changing when it should. Even if that's partially true in the existing scheme, that existing scheme would still have been more likely to change the checksum anyway in practice - any change to a UDL would often be in multiple places, and it previously only needed one of those changes to trigger it for the entire library. Now, each checksum is more isolated.
That it seems impossible (or at least unreasonable) to consider the entire exposed interface everywhere this checksum is needed, so it might be necessary just accept the limitation.

And a couple of things that push me into an r+

Nothing we do here is set in concrete - we can improve this as we discover ways to do so.
The checksumming is valuable, but hasn't actually saved us in practice very often (and I don't think has ever saved us in practice in a production environment?) - making improvements to this be a blocker for the procmacro work doesn't seem the correct tradeoff.

So as far as I'm concerned, 🚢 (although I also note you are looking for a new uniffi version for vendoring into m-c, so it might be prudent to cut that before merging? I don't care really though.

mhammond · 2022-06-15T05:22:56Z

uniffi_bindgen/src/interface/ffi_names.rs

+
+    /// Name for a UniFFI global function, like `rustbuffer_alloc()`
+    ///
+    /// This name doesn't include a checksum.  If we make changes to those functions, then we need to


Can you maybe comment why it can't use the checksum?

(unrelated, but is there any reason you aren't using named arguments directly in format strings given you also bumped that min rust version to 1.59?)

mhammond · 2022-06-15T05:32:00Z

uniffi_bindgen/src/interface/ffi_names.rs

+/// This is an enum because we calculate our prefix differently depending on if we're using a UDL
+/// file or a proc-macro.  It's is a bit unfortunate, but it should work out okay.
+#[derive(Clone, Debug, Hash)]
+pub enum FFINames {


I wonder if FFINameBuilder is a better name? (I don't object to the existing name and it's far from a big deal, but it did confuse me a little until I read more of the impl).

mhammond · 2022-06-15T05:32:41Z

uniffi_bindgen/src/interface/ffi_names.rs

+    // Module path of the item, including the crate name.  This is comes the proc-macro code when
+    // run on nightly.  We don't currently use the extra parts yet, but maybe we could eventually
+    // add support for nested modules in the bindings code.
+    ModulePath(String),


Also minor, but I guess I don't understand why the last 2 variants can't be rolled into 1? (Well, not even sure why this code cares about the distinction at all given I can't see where any variants are treated differently? But I can see a better argument for having 2 than these 3)

badboy

r+.
I don't have much more to add than what was already mentioned by others.
In my testing I more than once was bitten by the checksum when I just wanted to swap out the library purely for testing and avoiding a full compile cycle. So in that regard this change is welcome and would make that possible.

bendk · 2023-03-29T18:53:35Z

I forget why I never merged this initially, but #1469 implements a similar system. Let's close this one and use that instead.

bendk requested a review from a team as a code owner June 10, 2022 13:50

bendk requested review from badboy and a team and removed request for a team June 10, 2022 13:50

bendk mentioned this pull request Jun 10, 2022

Add initial proc-macro frontend bits #1259

Merged

bendk force-pushed the new-function-checksum-policy branch from a7e1b4c to 82cb27d Compare June 10, 2022 14:00

jplatte reviewed Jun 10, 2022

View reviewed changes

bendk force-pushed the new-function-checksum-policy branch from 82cb27d to 0bb2751 Compare June 10, 2022 14:09

bendk force-pushed the new-function-checksum-policy branch from 0bb2751 to 46985af Compare June 10, 2022 14:19

rfk reviewed Jun 11, 2022

View reviewed changes

tarikeshaq reviewed Jun 15, 2022

View reviewed changes

mhammond approved these changes Jun 16, 2022

View reviewed changes

badboy approved these changes Jun 16, 2022

View reviewed changes

tarikeshaq mentioned this pull request Jun 29, 2022

Adding Glean back to the iOS megazord mozilla/application-services#5009

Closed

bendk mentioned this pull request Jul 7, 2022

Split out the interface definition code from uniffi_bindgen #1294

Open

tarikeshaq mentioned this pull request Jul 22, 2022

FFI extern function declaration is different from compiled library exports #1300

Closed

bendk closed this Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Propsal] Updating FFI naming and checksums #1271

[Propsal] Updating FFI naming and checksums #1271

bendk commented Jun 10, 2022

jplatte Jun 10, 2022

bendk Jun 10, 2022

rfk left a comment

rfk Jun 11, 2022 •

edited

Loading

bendk Jun 12, 2022

bendk Jun 14, 2022

rfk Jun 14, 2022

rfk Jun 14, 2022

tarikeshaq left a comment

tarikeshaq Jun 15, 2022

bendk Jun 15, 2022

tarikeshaq Jun 15, 2022 •

edited

Loading

mhammond left a comment

mhammond Jun 15, 2022

mhammond Jun 15, 2022

mhammond Jun 15, 2022

badboy left a comment

bendk commented Mar 29, 2023

[Propsal] Updating FFI naming and checksums #1271

[Propsal] Updating FFI naming and checksums #1271

Conversation

bendk commented Jun 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfk left a comment

Choose a reason for hiding this comment

rfk Jun 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarikeshaq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarikeshaq Jun 15, 2022 • edited Loading

Choose a reason for hiding this comment

mhammond left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

badboy left a comment

Choose a reason for hiding this comment

bendk commented Mar 29, 2023

rfk Jun 11, 2022 •

edited

Loading

tarikeshaq Jun 15, 2022 •

edited

Loading