Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding twin ADRS on wrapper code and async Rust #5910

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions docs/adr/0008-wrapper-code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Wrapper code

* Status: draft
* Deciders:
* Date: ???

## Context and Problem Statement

Application-services components currently consist of a Rust core, and UniFFI-generated bindings for Swift and Kotlin.
Additionally, some of our components have hand-written Swift and Kotlin code, containing wrappers and extensions for the UniFFI bindings.
In the past, these wrappers were strictly necessary because of deficiencies in our FFI strategy.
However, UniFFI is reaching the point where it can support all our requirements and it's possible to remove the wrapper code.

We should decide how much effort, if any, we want to invest into getting rid of the wrapper code now that it's technically possible.
We should also decide if there are places where wrapper code makes sense and we don't want to replace it, even ignoring the time investment.

### Possible changes

This section explores some possibilities for removing wrapper code.
Deciding on any particular possibility listed here is out of scope for the ADR, the decision is if we should invest our time investigating some of these.

#### Async

One of the main reasons for our current wrapper code is to wrap our sync API and present an async interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's true for wrappers in our repo - eg, places seems most concerned with capturing telemetry. From memory the wrappers in a-c are mostly like this, but it does raise the question: if uniffi gave us the capability of presenting the async interface, would this actually allow us to remove the wrappers, or just thin them out?

Conversely though, these telemetry wrappers exist because of some weird architectural considerations - ie, it's not our desire to have things set up this way and if our goal was to remove all wrappers, we'd just do something different here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote this up, I was incorrectly picturing the a-c wrappers as living in a-s. This raises a question that I think the ADRs should be more explicit about: who owns the wrappers in a-c and firefox-ios? In practice, it's been us and if that's true I think we should try to move them into our repo -- or at least split out some part and move that in.

I think this would clarify/improve the relationship between our teams. One thing I noticed doing the FxA work is that I had lots of opinions on how the a-c code should be organized. But once I figured out what part of the API should move to a-s, I was willing and happy to let those opinions go and let the firefox-android team own those decisions.

Going back to the ADR: what kind of API should we own and export to our consumers? Sync? Async? Sync with an async wrapper?

ADR-0009 lays out a couple of alternatives to this.

#### Feature flags for breaking changes

Another main reason for our current wrapper code is to mitigate breaking API changes.
The wrapper layer allows us to make breaking changes at the Rust level, but keep the same API at the wrapper layer.

An alternative to this would be to use Rust feature flags to manage breaking changes.
Any breaking change, or large change in general, would be behind a feature flag.
We would wait to enable the feature flag on the megazord until consumer application was ready to take the change.
Maybe we could have a transition period where we built two megazords, for example `megazord` and `megazord-with-logins-local-encrytion` and the consumer app could pick between the two.
This would simplify the consumer PRs since they could run CI normally.

Some potential uses of of features flags would be:

- **Cosmetic changes**. If we want to rename a field/function name, we could put that rename behind a feature flag.
- **Architectural changes**. For bigger changes, like the logins local encryption rework, we could maintain two pieces of code at once. For example, copy db.rs to db/old.rs and db/new.rs. Create db/mod.rs which runs `use old::*` or `use new::*` depending on the feature flag. Then we do our work in db/new.rs.

Maintaining many feature flags at once would require significant effort, so we should aim to get our consumers to use the new feature flag soon after our work is complete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a concern to me (ie, that stuff will hand around). It's also interesting to think about who our "consumers" are in the medium term. It seems unlikely they will include Android or desktop, so really this is all about iOS.

That said though, it might be interesting to experiment with - or maybe to look back at some of the breaking changes we did make and see whether this approach would actually have worked in practice.

I wonder if another alternative would be to see if we can make the components more modular - eg, just allow iOS to stay on an "old" version of places even though they take a newer (say) tabs. This will have its own challenges though with crate versions - eg, if we update sqlite - but this might be managable (eg, I think the sqlite example would be managable by rearranging some dependencies, and for other crates it might occasionally mean some duplicate crates lower in the depdendency graph, but that might be ok)


#### UniFFI-supported interfaces

One last reason for wrapper code is to present idiomatic interfaces to our Rust code -- especially for callback interfaces.
For example, it's possible to define a UniFFI callback interface for notifications, but the FxA wrapper code uses the `NotificationCenter` on Swift which is not supported by UniFFI.
If we wanted to remove all wrapper code we would need to commit to only using interfaces that UniFFI could support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the implications of this, but if "only using interfaces that UniFFI could support" implies "our swift code consumers can't use the 'NotificationCentre` then it's probably a non-starter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more like "our swift consumers need to do the work to hook up a callback interface to the NotificationCenter themselves". Definitely a drawback but not totally out of the question. I do prefer C over B at this point though.


## Decision Drivers

## Considered Options

* **(A) Keep using our current strategy**
Don't invest time into removing wrapper code.

* **(B) Remove all wrapper code**
Invest time into removing wrapper code with the intention of removing all of it.

* **(C) Remove most wrapper code, keep an additive wrapper layer**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with other comments that suggest we need to analyze our current wrapper code to see how much of it we can remove and that the result will inform our future strategy. I for one am assuming that we won't be able to remove all of it so we will end up at option C. We also might want to have a one-to-one wrapper to consumer strategy so that we can put the more platform-specific, opinionated code there. But in order to reduce our API surface I think we should do what we can to reduce the wrapper code we have.

Invest time into removing wrapper code, but keep some wrapper code.
In particular, keep a wrapper layer that adds new API surfaces rather than replaces existing ones.
For example, we could define an `FxaEventListener` interface with UniFFI, then add a `IosEventListener` class that implemented that interface by forwarding the messages to the `NotificationCenter`.

## Decision Outcome

## Pros and Cons of the Options

### (A) Keep using our current strategy

* Good, because we can spend our time on other improvements
* Good, because there's no chance of wasting time on implementing solutions that may not work out in practice.

### (B) Remove all wrapper code
* Good, because it simplifies our documentation strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully agree with this one - I agree the active work in UniFFI will truly improve the documentation strategy, but I'm not sure that removing wrapper code makes the documentation story any better. Both javadocs and swift docs have very rich documentation systems that would be hard to fully conform to (found this great blog post about swift's documentation)

I still believe what we're doing in UniFFI is many times better than the status quo, where the generated code has no doc comments at all, but I doubt that story alone makes the documentation story better than hand-written docs in swift

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to add this as a new bullet. Maybe it's most precise to say that auto-generating the docs would simplify things, but we might end up with worse docs if we don't hand-write the wrapper doc-strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the bullet below, but I should say that I'm a bit skeptical of the hand-written approach. IMO, a much bigger issue than specifics with javadoc/swiftdoc is the risk of documentation drift as we make changes. If we could make it work, I'd prefer using rustdoc for everything and rename the types (UInt32/UInt instead of u32). Yes it won't look as idiomatic, but rustdoc is still pretty good and it's much harder for the docs to get out-of-sync with each other and with the code.

There is active work in UniFFI to auto-generate the bindings documentation from the Rust docstrings (https://github.com/mozilla/uniffi-rs/pull/1498, https://github.com/mozilla/uniffi-rs/pull/1493).
If there are no wrappers, then we could potentially use this to auto-generate a high-level documentation site and/or docstrings in the generated bindings code.
If there are wrappers, then this probably isn't going to work.
In general, wrappers mean we are have multiple public APIs which makes documentation harder.
* Bad, because it may lead to worse documentation.
Hand-written documents can be better than auto-generated ones and especially since they can specifically target javadoc and swiftdoc.
* Good, because a "vanilla" API may be easier to integrate with multiple consumer apps.
`NotificationCenter` might be the preferred choice for firefox-ios, but another Swift app may want to use a different system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point, and I think this kind of wrapping is best done in our consuming applications (Android Components and Firefox for iOS) than in Application Services. Those wrappers in Android Components / Firefox for Android, and Firefox for iOS, can bridge the UniFFI interfaces to the the libraries that the consuming apps use.

We have one example of that in the Suggest component: our UniFFI binding returns a Vec<u8> for favicon data, which is decoded into an android.graphics.Bitmap on Android, and a UIImage on iOS. These are Android and iOS-specific types—not part of the Kotlin or Swift core language—so it makes a bit more sense for that logic to live in Fenix and Firefox for iOS, than in a wrapper in Application Services. (External types aren't quite suitable there, either, for a handful of reasons).

Background scheduling is another example: the WorkManager library in Android works completely differently than BGTaskScheduler on iOS, and the consuming application might not even want to call into the Rust component in the background at all. A "vanilla" API would make that integration easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not wanting to distract from the main point here, but it would be super interesting to know how uniffi "CustomTypes" work here - eg, we could probably use something like Image in the UDL, which is just an alias for Vec<u8>, and Android and iOS each has special code that knows how to turn that vec into the local type.

By only using UniFFI-supported types we can be fairly sure that our code will work with any system.

### (C) Remove most wrapper code, keep an additive wrapper layer
* Good, because most documentation can be auto-generated and some can be hand-written.
The hand-written documentation would be the language-specific parts, which probably need to be written by hand.
* Bad because it may lead to worse documentation, for the same reasons an (B).
* Good, because consumer apps can choose to use the wrapper interfaces or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be leaning towards an option described as something like "as few wrappers as possible, but no less" - ie, a perfect world has no wrappers, but this world is going to force some on us. Deciding "no wrappers" would be a fool's errand and not survive reality.

I guess this is closest to (C) Is there some feature uniffi is missing to enable this without the "bad" parts? It almost gets back to the "decorators" idea we had all that time ago - ie, something more like "augmentation" than "wrappers"?

151 changes: 151 additions & 0 deletions docs/adr/0009-async-rust.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Using Async Rust

* Status: draft
* Deciders:
* Date: ???

## Context and Problem Statement

Our Rust components are currently written using synchronous Rust.
The components are then wrapped in Kotlin to present an async interface.
Swift also wraps them to present an async-style interface, although it currently uses `DispatchQueue` and completion handlers rather than `async` functions.

UniFFI has been adding async capabilities in the last year and it seems possible to switch to using async Rust and not having a hand-written async wrapper.
It also seems possible to auto-generate the async wrapper with UniFFI.

What should our async strategy be?

### Scope

This ADR discusses what our general policy on wrapper code should be.
It does not cover how we should plan our work.
If we decide to embrace async Rust, we do not need to commit to any particular timeline for actually switching to it.

### Desktop and gecko-js

On desktop, we can't write async wrappers because it's not possible in Javascript.
Instead we use a strategy where every function is automatically wrapped as async in the C++ layer.
Using a config file, it's possible to opt-out of this strategy for particular functions/methods.

### Android-components

In Kotlin, the async wrapper layer currently lives in `android-components`.
For the purposes of this ADR, it doesn't really matter, and this ADR will not make a distinction between wrapper code in our repo and `android-components`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I was wondering if this was a bit inconsistent with your "Wrapper code" ADR, because that ADR (seems to?) explicitly exclude wrappers outside of Application Services.

But thinking about it a bit more, the way you've written this makes sense to me! All our wrappers—wherever they live—do two things:

  1. Wrap the synchronous Rust component interface in an asynchronous foreign interface.
  2. Bridge the Rust interface to platform-specific Android and iOS interfaces (WorkManager, BGTaskScheduler, UIKit, etc.)

We can lift (1) into Application Services, and still keep the wrappers in our consuming applications for (2).


## How it would work

### SQLite queries

One of the reasons our code currently blocks is to run SQLite queries.
https://github.com/mozilla/uniffi-rs/pull/1837 has a system to run blocking code inside an async function.
It would basically mean replacing code like this:

```kotlin
override suspend fun wipeLocal() = withContext(coroutineContext) {
conn.getStorage().wipeLocal()
}
```

with code like this:
```rust

async fn wipe_local() {
self.queue.execute(|| self.db.wipe_local()).await
}
```

We would need to merge #1837, which is currently planned for the end of 2023.

### Locks

Another reason our code blocks is to wait on a `Mutex` or `RWLock`.
There are a few ways we could handle this:

* The simplest is to continue using regular locks, inside a `execute()` call, which would be very similar to our current system.
* We could also consider switching to `async_lock` and reversing the order: lock first, then make a `execute()` call.
This may be more efficient since the async task would suspend while waiting for the lock rather than blocking a thread
* We could also ditch locks and use [actors and channels](https://ryhl.io/blog/actors-with-tokio/) to protect our resources.
It's probably not worth rewriting our current components to do this, but this approach might be useful for new components.

### Network requests

The last reason we block is for network requests.
To support that we would probably need some sort of "async viaduct" that would allow consumer applications to choose either:
- Use async functions from the `reqwest` library.
This matches what we currently do for `firefox-ios`.
- Use the foreign language's network stack via an async callback interface.
This matches what we currently do for `firefox-android`.
This would require implemnenting https://github.com/mozilla/uniffi-rs/issues/1729, which is currently planed for the end of 2023.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 that if we embrace async, having async callback support will be critical for things like network support, I'd almost want to see how that system works in practice before making a final call here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This one might also need some support from Android Components; IIRC, it currently wraps GeckoView's async networking client in a blocking interface. It would be really compelling (and a fair bit of work! 😅) to have networking be one of our first uses of async callbacks, though!


## Decision Drivers

## Considered Options

* **(A) Experiment with async Rust**

* Pick a small component like `tabs` or `push` and use it to test our async Rust.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙋🏼‍♀️ I volunteer remote_settings as a tribute another test component!

  • It has a small API surface.
  • It could be a good "launch customer" for async networking.
  • It uses a lock to keep track of some client state, and races there shouldn't be catastrophic.
  • It doesn't do any persistence (e.g., to SQLite) directly.
  • It has a small Android Components wrapper, that won't be too disruptive to migrate.

Copy link
Contributor Author

@bendk bendk Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off: Thanks for offering this! Secondly: maybe we should be thinking about this component-by-component.

Async makes sense for remote_settings so that's a natural place to start exploring.

Async seems to make the least sense for our syncable-storage components. I wonder if we should just lean into saying the non-async API is the "real" API. We could still have an async wrapper class, but this would be more of an add-on. In practice, this would mean that when publish documentation, we would create detailed documentation for the sync API while the async wrapper would just have some text like "all methods work the same as LoginsStore, but are wrapped with withContext to present an async API". I'm think that might work fine, although I wonder if that would make the IDE tooltips worse.

* Use async Rust for new components.
* Consider slowly switching existing components to use async Rust.

* **(B) Keep hand-written Async wrappers**

Don't change the status quo.

* **(C) Auto-generate Async wrappers**

We could also make the `gecko-js` model the official model and switch other languages to use it as well.
For example, we could support something like this in `uniffi.toml`:

```toml
[[bindings.async_wrappers]]
# Class to wrap, methods wrapped with an async version
wrapped = "LoginStore"
# Name of the wrapper class.
# UniFFI would generate async wrapper methods that worked exactly like the current hand-written code.
# For most languages, the wrapper class constructors would input an extra parameter to handle the async wrapping (for example `CoroutineContext` or `DispatchQueue`).
wrapper = "LoginStoreAsync"
# methods to skip wrapping and keep sync (optional)
sync_methods = [...]
```

We could also support async wrappers for callback interfaces.
These would allow the foreign code to implement an sync callback interface using async code
The Rust code would block while waiting for the result.

## Decision Outcome

## Pros and Cons of the Options

### (A) Experiment with async Rust

* Good, if we decide to avoid wrappers in `ADR-0008` because it allows us to remove the async wrappers.
* Bad, because there's a risk that the UniFFI async code will cause issues and our current async strategy is working okay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the largest Bad I see here, the cost of sticking to our existing, non-async Rust is cheap (no work!) but it might be expensive engineering time spent to make all the async bits work (async callback interfaces, migrating away from sync calls and the breaking changes that follow, async viaduct, etc)

There is also the cost of having to handle possible inconsistencies we never had to handle before (if we use async mutexes that drop locks on awaits, allowing different tasks to interleave in ways we never had to deal with before)

This almost makes me want to take the position of: "It's too costly and risky to embrace because of uncertainty - however if we can quantify possible wins, let's try it on a small surface and realize those wins to give us the confidence we're doing the right thing"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the cost of having to handle possible inconsistencies we never had to handle before (if we use async mutexes that drop locks on awaits, allowing different tasks to interleave in ways we never had to deal with before)

I think you are describing my concern here from the embedder perspective: we have our platform async system (Coroutines) and we will have the native rust async system which have to both sync (heh) together correctly because the race bugs from this would be painful to debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this is the biggest concern.

FWIW, UniFFI uses the platform async system to schedule the Rust work, so races don't seem that much more likely to me. If there's a bug, I would expect either hangs if the task gets lost somehow or just regular exceptions / crashes.

Even if we pick a small component to experiment with, it would be bad if that component crashes or stops responding because of async issues.
* Good because it allows us to be more efficient with our thread usage.
When an async task is waiting on a lock or network request, it can suspend itself and release the thread for other async work.
Currently, we need to block a thread while we are waiting for this.
However, it's not clear this would meaningfully affect our consumers since we don't run that many blocking operations.
We would be saving maybe 1-2 threads at certain points.
* Good, because it makes it easier to integrate with new languages that expect async.
For example, WASM integration libraries usually returns `Future` objects to Rust which can only be evaluated in an async context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide an example for this, I've experimented with having the fxa_client compile to WASM and the dependency on NSS was a problem, I've opted to web apis exposed through the JS runtime the wasm is running in, and those web apis often return promises, that have to be wrapped in Futures.

The above makes it so it's difficult to expose our fxa_client to web applications without exposing an async Rust API.

Note: this is a separate issue from UniFFI adding WASM support.
If we switched our component code to using async Rust, it's possible that we could use `wasm-bindgen` instead.
* Bad, because it makes it harder to provide bindings on new languages that don't support async, like C and C++.
Maybe we could bridge the gap with some sort of callback-based async system, but it's not clear how that would work.

### (B) Keep hand-written Async wrappers

* Good, this is the status quo, and doesn't require any work

### (C) Auto-generate Async wrappers

* Good, if we decide to avoid wrappers in `ADR-0008` because it allows us to remove the hand-written async wrappers.
* Good, because we could copy over auto-generated documentation and simply add `async` or `suspend` to the function signature.
* Good, because it's less risky than (A)
* Bad, because we would continue to have inefficiencies in our threading strategy.
* Good, because this is a more flexible async strategy.
We could use async wrappers to integrate with languages like WASM and not use them to integrate with languages like C and C++.
However, it's not clear at all how this would work in practice.
* Bad, because it's less flexible than (A).
For example, with (A) it would be for the main API interface to return another interface with async methods from Rust code.
That wouldn't be possible with this system, although it's not clear how big of an issue that would be.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a lot of time in H2 investigating ways to make UniFFI work with async Rust, but I have to say this one seems pretty attractive.