Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexedDB support for wasm/browser environments #414

Merged
merged 119 commits into from
Feb 3, 2022

Conversation

gnunicorn
Copy link
Contributor

@gnunicorn gnunicorn commented Nov 17, 2021

Hi,

this PR attempts to implement permanent storage solutions on webbrowser environments using the browser own IndexedDB.

Current progress:

  • state store using indexeddb
    • implement state storage
    • implement storing data files
    • implement storing of custom types
    • pass sled tests (all but the custom types)
  • crypto store using indexeddb
    • implement backup_v1
  • additional
    • reusable integration test
    • async-result-helper for wasm32 environments\
    • MilliSecondsSinceUnixEpoch-support for web
    • put browser-tests into CI
    • put nodesjs wasm32 tests into CI, too
    • ensure all crates do compile for wasm32 in isolation
    • improve security (currently keys leak information)

Things I'd very much like an opinion/input on:

  1. this is currently hidden behind a target+feature-flag. Do we want to keep it this way or assume the target also means we are in the browser and thus assume that indexeddb should be used (effectively dropping the feature-flag)?
  2. the current base client uses a sync new to instantiate, but for indexeddb it must be async. Right now, this goes for a feature-specific-approach, meaning the API is async for the browser but sync otherwise. Do we want to streamline the API and make it async both cases?
  3. do we still want the option of having an in-memory-store in wasm, too? Right now the feature flag is either-or
  4. I am re-using the sled-crypto to encrypt on the fly, maybe we should streamline that over multiple database implementations?

@poljar
Copy link
Contributor

poljar commented Nov 18, 2021

this is currently hidden behind a target+feature-flag. Do we want to keep it this way or assume the target also means we are in the browser and thus assume that indexeddb should be used (effectively dropping the feature-flag)?

Yeah the browser might be a special place where everybody just wants to use the indexeddb store, I don't think anyone will want to run bots/bridges in the browser.

the current base client uses a sync new to instantiate, but for indexeddb it must be async. Right now, this goes for a feature-specific-approach, meaning the API is async for the browser but sync otherwise. Do we want to streamline the API and make it async both cases?

Would use runBlocking be horrible there? I try to avoid async constructors and new() shouldn't do any real writing either way. The feature specific approach would be hard to document, so we most likely want to have the same function signature.

do we still want the option of having an in-memory-store in wasm, too? Right now the feature flag is either-or

I don't think so, no. The memory store is there for two reasons:

  1. Because some platforms may not support the sled store
  2. So people don't need to set a store path for ephemeral clients, i.e. scripts that may not even sync.

Number 1 is largely going away and people won't use number 2. in a browser.

I am re-using the sled-crypto to encrypt on the fly, maybe we should streamline that over multiple database implementations?

We could put the StoreKey into a separate crate so each store can depend on that crate. We should also move each store implementation, except the memory store, into a separate crate.

It was always the intention to have only the memory store be part of the SDK crates but due to time constraints shortcuts were taken.

improve security (currently keys leak information)

This is a bit of a tricky one and we'll likely need to measure the performance impact. It was also skipped due to time constraints and some things are deliberately left unencrypted. For example, the user ids that are part of a room, those need to be fetched every time a message is sent. We likely could keep the performance good enough using some smarter caching.

A scheme that avoid leaking the keys I had in mind is this:

  1. Remove any occurrence of key decoding
  2. Generate a random salt when we create the store and store it as well.
  3. Modify EncodeKey to hash each part of the key separately and join the keys as usual using a null byte (this ensures that we can still use scan_prefix().

The salt, of course, will be used for the key hashing.

That all being said, thanks for working on this.

@jplatte
Copy link
Collaborator

jplatte commented Nov 18, 2021

Re. feature flag, I think the question isn't "does somebody want to run a bridge / bot in a browser", it's "does somebody want to use the SDK with state storage in non-browser wasm environment". I think it's definitely realistic somebody would want that so keeping this behind a feature flag seems like a good idea.

@poljar
Copy link
Contributor

poljar commented Nov 18, 2021

Re. feature flag, I think the question isn't "does somebody want to run a bridge / bot in a browser", it's "does somebody want to use the SDK with state storage in non-browser wasm environment". I think it's definitely realistic somebody would want that so keeping this behind a feature flag seems like a good idea.

That's a fair point, I'm a bit biased since I would just use it natively in that case, but someone is bound to be interested in the sandboxing features a WASM environment provides.

@gnunicorn
Copy link
Contributor Author

Re feature-flag: yeah, @jplatte, that was my thinking. Especially since more and more non-browser platforms offer some form of WASM support, I don't think we can assume one by the other. Like node has wasm support and that won't have indexeddb support... Keeping it as a feature-flag than. Thanks for the input!

In light of this way of thinking of it, I'd leave the memory store as an option for wasm32 non-indexeddb environments (like nodejs), too...

Would use runBlocking be horrible there? I try to avoid async constructors and new() shouldn't do any real writing either way. The feature specific approach would be hard to document, so we most likely want to have the same function signature.

while I agree that new being async is a bit akward and something that should generally be avoided, the API level we are talking about (the client) already has a bunch of async constructors already _ namely new_from_user_id and new_from_user_id_with_config, which can't be converted easily. In hence of these I'd argue that it actually makes usage of the API more consistent when all constructors where async and/or drop the generic new in favor of specific async constructors only. I fear that a runBlocking for indexeddb won't stand the test of time anyways if we want to support more and other storages... (and blocking in startup procedures sounds really bad, though I am not sure whether asking for indexeddb would ever be actually blocking - but afaik a browser could prompt the user before allowing usage and thus significantly delay startup process...)

improve security (currently keys leak information)

This, as a well as

We should also move each store implementation, except the memory store, into a separate crate.

I'll leave for another PR another time. For simplicity I'll try to get the indexeddb to feature parity with sled store first and then we can investigate these further.

@jplatte
Copy link
Collaborator

jplatte commented Nov 18, 2021

👍🏼 from me for async constructor, regardless of feature flags. I don't really see what harm there would be in the additional .await.

@poljar
Copy link
Contributor

poljar commented Nov 18, 2021

In light of this way of thinking of it, I'd leave the memory store as an option for wasm32 non-indexeddb environments (like nodejs), too...

nodejs would probably use neon to create bindings, this way they get access to native threads. I was thinking more of something like wasmer, which provides an embedable WASM runtime.

while I agree that new being async is a bit akward and something that should generally be avoided, the API level we are talking about (the client) already has a bunch of async constructors already _ namely new_from_user_id and new_from_user_id_with_config, which can't be converted easily. In hence of these I'd argue that it actually makes usage of the API more consistent when all constructors where async and/or drop the generic new in favor of specific async constructors only.

The difference is that those methods send out requests. I prefer the non-async constructor because a client might want to create the object in the UI thread which might not be a Tokio thread. Then again a more feature-full client will certainly use Client::new_from_user_id() which makes my point moot. If people need to create the object on such an UI thread, they can certainly use runBlocking themselves.

@stoically stoically mentioned this pull request Nov 18, 2021
8 tasks
@gnunicorn
Copy link
Contributor Author

@ara4n yes, I was messing with that in the CI as well. It appears this only works with Node 14 and webpack5 reliably because of various factors (the long-name-stack-size-problem as well as the env-runtime-problem that is coming from libolm). As we have a truly-rust-olm on the horizon we've agreed to not waste more time on this version (other than the CI job for Node 14 showing it working) and instead switch to the other lib asap.

@ara4n
Copy link
Member

ara4n commented Dec 27, 2021

turns out that the magic to make this work for me is:

  ignoreWarnings: [
    (warning) =>
      warning.message ===
      "Critical dependency: the request of a dependency is an expression",
  ],

Alternatively, one can manually snip out the unneeded code with:

for i in _ZN3olm7Session8describeEPcm olm_session_describe aes_encrypt_ccm aes_decrypt_ccm
do
	wasm-snip index_bg.wasm -o index_bg_fixed.wasm $i
	mv index_bg_fixed.wasm index_bg.wasm
done

@gnunicorn
Copy link
Contributor Author

note: you can get wasm-snip here.

@gnunicorn gnunicorn requested a review from poljar February 2, 2022 14:26
@gnunicorn
Copy link
Contributor Author

gnunicorn commented Feb 2, 2022

Updated to latest main, including the remove-room-feature added. The only test failing is marked as continue-on-error to remind us that default wasm-crypto doesn't work (though this isn't blocking merges), while there's a test within an environment (node14, emcc2.0) that is working.

should be good to merge now.

# hence the tests
name: ${{ matrix.name }} WASM test
runs-on: ubuntu-latest
continue-on-error: ${{ matrix.experimental }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still turn our CI state into a ❌, oh well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, github actions don't have a proper ignore-failure switch ... what I could offer is to comment the specific test (which would also mean we don't waste these resources every time) until we have it fixed up ...?!?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds good.

@gnunicorn gnunicorn merged commit 1286357 into matrix-org:main Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants