Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Propsal] Updating FFI naming and checksums #1271

Closed
wants to merge 1 commit into from

Conversation

bendk
Copy link
Contributor

@bendk bendk commented Jun 10, 2022

This PR represents a proposal for changing how scaffolding functions are named. I believe it represents the consensus from the planning meeting and also will work with the new proc-macro code being developed. Please tell me if I'm wrong on either of those parts.

The main point of the PR is just to start a discussion and hopefully get agreement on the general plan. The motivation for this all is that I want to get the initial proc-macro PR that @jplatte put up merged. I think the FFI naming change is the most controversial part of it and if we can agree that this is the direction we want to go, then that PR can be merged (we can discuss more in that PR of course).


Updated how we calculate FFI names and especially how we calculate the
checksum part for functions. Now the checksum is only derived from the
function definition, not the entire component interface. This is needed
so the proc-macros can generate the checksums. The new system should
still prevent the possibility of the bindings using one function
definition when the scaffolding was build using a different one.

Added the concept of a uniffi scaffolding contract. We can bump that
whenever we make changes to how scaffolding calls work. The FFI names
depend on this value rather than the UniFFI version, which should
prevent linker errors when uniffi and uniffi-bindgen don't have the
exact same version.

Consolidated the code into one module. I think it's easier to
understand what's going on if you can see it in one place.

Removed some unneeded Result<> returns.

@bendk bendk requested a review from a team as a code owner June 10, 2022 13:50
@bendk bendk requested review from badboy and a team and removed request for a team June 10, 2022 13:50
@bendk bendk force-pushed the new-function-checksum-policy branch from a7e1b4c to 82cb27d Compare June 10, 2022 14:00
Comment on lines -303 to -309
pub fn checksum(&self) -> u64 {
let mut hasher = DefaultHasher::new();
// Our implementation of `Hash` mixes in all of the public API of the component,
// as well as the version string of uniffi.
self.hash(&mut hasher);
hasher.finish()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you removed this, I think you can also remove the Hash impl further down in this file (and maybe a few others, too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I removed that one and also the ComponentInterface.uniffi_version field. All the other manual Hash impls were on the very things we need (functions, methods, etc).

@bendk bendk force-pushed the new-function-checksum-policy branch from 82cb27d to 0bb2751 Compare June 10, 2022 14:09
Updated how we calculate FFI names and especially how we calculate the
checksum part for functions.  Now the checksum is only derived from the
function definition, not the entire component interface.  This is needed
so the proc-macros can generate the checksums.  The new system should
still prevent the possibility of the bindings using one function
definition when the scaffolding was build using a different one.

Added the concept of a uniffi scaffolding contract.  We can bump that
whenever we make changes to how scaffolding calls work.  The FFI names
depend on this value rather than the UniFFI version, which should
prevent linker errors when uniffi and uniffi-bindgen don't have the
exact same version.

Consolidated the code into one module.  I think it's easier to
understand what's going on if you can see it in one place.

Removed some unneeded Result<> returns.

Updated the CI Rust version to 1.59, which is the current Firefox
minimum version.
@bendk bendk force-pushed the new-function-checksum-policy branch from 0bb2751 to 46985af Compare June 10, 2022 14:19
Copy link
Collaborator

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to think about below, but FWIW I think this is a good direction to head in on balance 👍🏻

(Sorry for the sporadic drive-bys, but I do still enjoy poking in here from time to time 😁)

//! checksums that depend on things like function signatures to avoid the possibility of the
//! bindings calling into the scaffolding when they don't agree on the interface. The result
//! will be an ugly inscrutable link-time error, but that is a lot better than triggering
//! potentially arbitrary memory unsafety!
Copy link
Collaborator

@rfk rfk Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, there are some ComponentInterface changes that will no longer be covered by the checksum but might still lead to unsafe behaviour in the event of a version mismatch.

For example, suppose you have a function foo(bar: Bar) for some record type Bar, and you add or remove a field from Bar. The checksum calculated for foo will not change (it still takes a single argument of record type Bar) but its internal Bar-lifting code may expect to read a different set of fields than what the calling code sends in. This would probably result in a clean error given the strict length checks that we do in the deserialization code, but there are likely cases where it would silently do the wrong thing.

I haven't been following the proc-macro changes very closely, but I expect there's not much you can do to avoid that under the constraints of a proc-macro approach. Given just the tokens of a rust-level function signature, I don't think there's a good way to find out any of the internal details of its argument types.

It may be worth trying to enumerate any known cases that are not covered by the checksum as part of the docstring here, for completeness. It's already a best-effort attempt so that's probably good enough in practice.

Alternately, we could try to add additional guards for such cases. Some thought bubbles I've had in the past along these lines:

  • The serialization format could check the checksum at runtime; for example by insisting that the first two bytes of the serialized representation of Bar match the compiled-time-calculated checksum of Bar.
  • Every user-defined type Bar could produce a stub function _uniffi_typecheck_<checksum>_Bar that doesn't actually do anything, but that gets linked against at module load time. A linker error here would indicate that some internal details of the Bar type do not match the foreign code's expectations, triggering an early abort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point. In addition to those 2 solutions, I think we might be able to use some macro magic to include the checksum of the type definition in the scaffolding function name itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the macro magic approach could maybe work, you'd need to do something like this:

  • Define checksum macros that input a list of other macros followed by a list of arguments a list of arguments
  • Each macro would:
    • Append their local checksum to the list of arguments
    • Prepend checksum macros for any dependent types
    • Call the first macro in the list with the arguments
  • The final macro would append the checksum to the function name
  • To define a scaffolding function, you'd put together a stack like this: first_type_checksum!(second_type_checksum, append_checksum_to_function_name, { function def })

So, it's possible but I don't think it's a good idea. In addition to the complications on the macro side, if there was ever a cycle between the types then it would go into infinite recursion.

I think adding checksums to RustBuffers could present issues down the road if we ever wanted to move to using C structs or something to store those. Also it wouldn't handle the Object.

Adding the stub functions and checking when we load the dylib seems like a great way to handle this to me. Maybe they could just be a static u8s?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, no need for them to be actual functions, just symbols of some sort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it wouldn't handle the Object.

Hrm, indeed...if we wanted to check that at runtime we'd have to either widen the pointers or use some pointer-tagging shenanigans to encode the checksum, which seems likely to be more complexity than it's worth.

Copy link
Contributor

@tarikeshaq tarikeshaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton, @bendk!! I'm hopping in to say that this aligns with what we discussed when we chatted about versioning and that I'm +1 to moving forward here 🚀 (and tackling Ryan's comment 😝)

/// API contract version. Bump this if you make any changes to how the foreign bindings calls the
/// scaffolding functions, for example the semantics of `RustCallStatus` or how types are
/// lifted/lowered.
const UNIFFI_API_VERSION: u32 = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this, and when the proposal is ready to land, it would be great if we can document this somewhere where uniffi developers can easily see it as a reminder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, but I'm not sure where it should be. Do you have any suggestions?

Copy link
Contributor

@tarikeshaq tarikeshaq Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine there are a few ways to do this, but one thing we can do is explicitly identify what our contracts are, then add a warning around them:

// ⚠️  If you modify this struct/function/etc you need to update the `UNIFFI_API_VERSION` in `ffi_names.rs` ⚠️ 

(which makes me wonder if ffi_names.rs is the right spot to keep the UNIFFI_API_VERSION)

We can also modify the pull-request template to remind developers that if they changed any of the contracts they should bump the version - at the end of the day I think it'll be best-effort, and ideally, if we forget, the tests would probably fail well not our tests, but consumers will get some wild behaviour, so ya lets make sure this can't happen 😅

Copy link
Member

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryan makes a couple of great points:

  • Because this isn't going to cover as much as it ideally would, there's a risk of the checksum not changing when it should. Even if that's partially true in the existing scheme, that existing scheme would still have been more likely to change the checksum anyway in practice - any change to a UDL would often be in multiple places, and it previously only needed one of those changes to trigger it for the entire library. Now, each checksum is more isolated.
  • That it seems impossible (or at least unreasonable) to consider the entire exposed interface everywhere this checksum is needed, so it might be necessary just accept the limitation.

And a couple of things that push me into an r+

  • Nothing we do here is set in concrete - we can improve this as we discover ways to do so.
  • The checksumming is valuable, but hasn't actually saved us in practice very often (and I don't think has ever saved us in practice in a production environment?) - making improvements to this be a blocker for the procmacro work doesn't seem the correct tradeoff.

So as far as I'm concerned, 🚢 (although I also note you are looking for a new uniffi version for vendoring into m-c, so it might be prudent to cut that before merging? I don't care really though.


/// Name for a UniFFI global function, like `rustbuffer_alloc()`
///
/// This name doesn't include a checksum. If we make changes to those functions, then we need to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe comment why it can't use the checksum?

(unrelated, but is there any reason you aren't using named arguments directly in format strings given you also bumped that min rust version to 1.59?)

/// This is an enum because we calculate our prefix differently depending on if we're using a UDL
/// file or a proc-macro. It's is a bit unfortunate, but it should work out okay.
#[derive(Clone, Debug, Hash)]
pub enum FFINames {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if FFINameBuilder is a better name? (I don't object to the existing name and it's far from a big deal, but it did confuse me a little until I read more of the impl).

// Module path of the item, including the crate name. This is comes the proc-macro code when
// run on nightly. We don't currently use the extra parts yet, but maybe we could eventually
// add support for nested modules in the bindings code.
ModulePath(String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also minor, but I guess I don't understand why the last 2 variants can't be rolled into 1? (Well, not even sure why this code cares about the distinction at all given I can't see where any variants are treated differently? But I can see a better argument for having 2 than these 3)

Copy link
Member

@badboy badboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+.
I don't have much more to add than what was already mentioned by others.
In my testing I more than once was bitten by the checksum when I just wanted to swap out the library purely for testing and avoiding a full compile cycle. So in that regard this change is welcome and would make that possible.

@bendk
Copy link
Contributor Author

bendk commented Mar 29, 2023

I forget why I never merged this initially, but #1469 implements a similar system. Let's close this one and use that instead.

@bendk bendk closed this Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants