-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Propsal] Updating FFI naming and checksums #1271
Conversation
a7e1b4c
to
82cb27d
Compare
pub fn checksum(&self) -> u64 { | ||
let mut hasher = DefaultHasher::new(); | ||
// Our implementation of `Hash` mixes in all of the public API of the component, | ||
// as well as the version string of uniffi. | ||
self.hash(&mut hasher); | ||
hasher.finish() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given you removed this, I think you can also remove the Hash
impl further down in this file (and maybe a few others, too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I removed that one and also the ComponentInterface.uniffi_version
field. All the other manual Hash
impls were on the very things we need (functions, methods, etc).
82cb27d
to
0bb2751
Compare
Updated how we calculate FFI names and especially how we calculate the checksum part for functions. Now the checksum is only derived from the function definition, not the entire component interface. This is needed so the proc-macros can generate the checksums. The new system should still prevent the possibility of the bindings using one function definition when the scaffolding was build using a different one. Added the concept of a uniffi scaffolding contract. We can bump that whenever we make changes to how scaffolding calls work. The FFI names depend on this value rather than the UniFFI version, which should prevent linker errors when uniffi and uniffi-bindgen don't have the exact same version. Consolidated the code into one module. I think it's easier to understand what's going on if you can see it in one place. Removed some unneeded Result<> returns. Updated the CI Rust version to 1.59, which is the current Firefox minimum version.
0bb2751
to
46985af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to think about below, but FWIW I think this is a good direction to head in on balance 👍🏻
(Sorry for the sporadic drive-bys, but I do still enjoy poking in here from time to time 😁)
//! checksums that depend on things like function signatures to avoid the possibility of the | ||
//! bindings calling into the scaffolding when they don't agree on the interface. The result | ||
//! will be an ugly inscrutable link-time error, but that is a lot better than triggering | ||
//! potentially arbitrary memory unsafety! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, there are some ComponentInterface
changes that will no longer be covered by the checksum but might still lead to unsafe behaviour in the event of a version mismatch.
For example, suppose you have a function foo(bar: Bar)
for some record type Bar
, and you add or remove a field from Bar
. The checksum calculated for foo
will not change (it still takes a single argument of record type Bar
) but its internal Bar
-lifting code may expect to read a different set of fields than what the calling code sends in. This would probably result in a clean error given the strict length checks that we do in the deserialization code, but there are likely cases where it would silently do the wrong thing.
I haven't been following the proc-macro changes very closely, but I expect there's not much you can do to avoid that under the constraints of a proc-macro approach. Given just the tokens of a rust-level function signature, I don't think there's a good way to find out any of the internal details of its argument types.
It may be worth trying to enumerate any known cases that are not covered by the checksum as part of the docstring here, for completeness. It's already a best-effort attempt so that's probably good enough in practice.
Alternately, we could try to add additional guards for such cases. Some thought bubbles I've had in the past along these lines:
- The serialization format could check the checksum at runtime; for example by insisting that the first two bytes of the serialized representation of
Bar
match the compiled-time-calculated checksum ofBar
. - Every user-defined type
Bar
could produce a stub function_uniffi_typecheck_<checksum>_Bar
that doesn't actually do anything, but that gets linked against at module load time. A linker error here would indicate that some internal details of theBar
type do not match the foreign code's expectations, triggering an early abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good point. In addition to those 2 solutions, I think we might be able to use some macro magic to include the checksum of the type definition in the scaffolding function name itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the macro magic approach could maybe work, you'd need to do something like this:
- Define checksum macros that input a list of other macros followed by a list of arguments a list of arguments
- Each macro would:
- Append their local checksum to the list of arguments
- Prepend checksum macros for any dependent types
- Call the first macro in the list with the arguments
- The final macro would append the checksum to the function name
- To define a scaffolding function, you'd put together a stack like this:
first_type_checksum!(second_type_checksum, append_checksum_to_function_name, { function def })
So, it's possible but I don't think it's a good idea. In addition to the complications on the macro side, if there was ever a cycle between the types then it would go into infinite recursion.
I think adding checksums to RustBuffers could present issues down the road if we ever wanted to move to using C structs or something to store those. Also it wouldn't handle the Object
.
Adding the stub functions and checking when we load the dylib seems like a great way to handle this to me. Maybe they could just be a static u8
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, no need for them to be actual functions, just symbols of some sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it wouldn't handle the Object.
Hrm, indeed...if we wanted to check that at runtime we'd have to either widen the pointers or use some pointer-tagging shenanigans to encode the checksum, which seems likely to be more complexity than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton, @bendk!! I'm hopping in to say that this aligns with what we discussed when we chatted about versioning and that I'm +1 to moving forward here 🚀 (and tackling Ryan's comment 😝)
/// API contract version. Bump this if you make any changes to how the foreign bindings calls the | ||
/// scaffolding functions, for example the semantics of `RustCallStatus` or how types are | ||
/// lifted/lowered. | ||
const UNIFFI_API_VERSION: u32 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this, and when the proposal is ready to land, it would be great if we can document this somewhere where uniffi developers can easily see it as a reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, but I'm not sure where it should be. Do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine there are a few ways to do this, but one thing we can do is explicitly identify what our contracts are, then add a warning around them:
// ⚠️ If you modify this struct/function/etc you need to update the `UNIFFI_API_VERSION` in `ffi_names.rs` ⚠️
(which makes me wonder if ffi_names.rs
is the right spot to keep the UNIFFI_API_VERSION
)
We can also modify the pull-request template to remind developers that if they changed any of the contracts they should bump the version - at the end of the day I think it'll be best-effort, and ideally, if we forget, the tests would probably fail well not our tests, but consumers will get some wild behaviour, so ya lets make sure this can't happen 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ryan makes a couple of great points:
- Because this isn't going to cover as much as it ideally would, there's a risk of the checksum not changing when it should. Even if that's partially true in the existing scheme, that existing scheme would still have been more likely to change the checksum anyway in practice - any change to a UDL would often be in multiple places, and it previously only needed one of those changes to trigger it for the entire library. Now, each checksum is more isolated.
- That it seems impossible (or at least unreasonable) to consider the entire exposed interface everywhere this checksum is needed, so it might be necessary just accept the limitation.
And a couple of things that push me into an r+
- Nothing we do here is set in concrete - we can improve this as we discover ways to do so.
- The checksumming is valuable, but hasn't actually saved us in practice very often (and I don't think has ever saved us in practice in a production environment?) - making improvements to this be a blocker for the procmacro work doesn't seem the correct tradeoff.
So as far as I'm concerned, 🚢 (although I also note you are looking for a new uniffi version for vendoring into m-c, so it might be prudent to cut that before merging? I don't care really though.
|
||
/// Name for a UniFFI global function, like `rustbuffer_alloc()` | ||
/// | ||
/// This name doesn't include a checksum. If we make changes to those functions, then we need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe comment why it can't use the checksum?
(unrelated, but is there any reason you aren't using named arguments directly in format strings given you also bumped that min rust version to 1.59?)
/// This is an enum because we calculate our prefix differently depending on if we're using a UDL | ||
/// file or a proc-macro. It's is a bit unfortunate, but it should work out okay. | ||
#[derive(Clone, Debug, Hash)] | ||
pub enum FFINames { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if FFINameBuilder
is a better name? (I don't object to the existing name and it's far from a big deal, but it did confuse me a little until I read more of the impl).
// Module path of the item, including the crate name. This is comes the proc-macro code when | ||
// run on nightly. We don't currently use the extra parts yet, but maybe we could eventually | ||
// add support for nested modules in the bindings code. | ||
ModulePath(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also minor, but I guess I don't understand why the last 2 variants can't be rolled into 1? (Well, not even sure why this code cares about the distinction at all given I can't see where any variants are treated differently? But I can see a better argument for having 2 than these 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+.
I don't have much more to add than what was already mentioned by others.
In my testing I more than once was bitten by the checksum when I just wanted to swap out the library purely for testing and avoiding a full compile cycle. So in that regard this change is welcome and would make that possible.
I forget why I never merged this initially, but #1469 implements a similar system. Let's close this one and use that instead. |
This PR represents a proposal for changing how scaffolding functions are named. I believe it represents the consensus from the planning meeting and also will work with the new proc-macro code being developed. Please tell me if I'm wrong on either of those parts.
The main point of the PR is just to start a discussion and hopefully get agreement on the general plan. The motivation for this all is that I want to get the initial proc-macro PR that @jplatte put up merged. I think the FFI naming change is the most controversial part of it and if we can agree that this is the direction we want to go, then that PR can be merged (we can discuss more in that PR of course).
Updated how we calculate FFI names and especially how we calculate the
checksum part for functions. Now the checksum is only derived from the
function definition, not the entire component interface. This is needed
so the proc-macros can generate the checksums. The new system should
still prevent the possibility of the bindings using one function
definition when the scaffolding was build using a different one.
Added the concept of a uniffi scaffolding contract. We can bump that
whenever we make changes to how scaffolding calls work. The FFI names
depend on this value rather than the UniFFI version, which should
prevent linker errors when uniffi and uniffi-bindgen don't have the
exact same version.
Consolidated the code into one module. I think it's easier to
understand what's going on if you can see it in one place.
Removed some unneeded Result<> returns.