-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded crypto #170
Multithreaded crypto #170
Conversation
Co-authored-by: Jonas Platte <jplatte@users.noreply.github.com>
Some realish world measurements follow. The measurements were done using complement which creates a room with a configured amount of users each having a single E2EE capable device. The measurement shows the time it takes to create outbound Olm sessions, encrypt a room key for each Olm session, and finally send out all the to-device requests carrying the encrypted room key. This is done by inspecting when a keys claim request is sent out, and when the last to-device requests was sent out, the duration between those two events is our recorded time. The x86_64 measurement were done using an 8 core Ryzen 7 4750U while the aarch64 were done using an 8 core Snapdragon 665. Please note that the old measurement for Element-Android was not done using the rust-sdk so only the green line, the rust-sdk on x86_64 is an apples to apples comparison. The measurement from before And now after applying this PR Testing this also revealed that the current slow path for such large groups seems to be this method which is called right before sending out an keys claim request. We should fix this while we're here, making it feasible to normally use encrypted rooms containing 2k members, well, at least on a modern x86_64 CPU. |
…d store This removes a massive performance issue since getting sessions is part of every message send cycle. Namely every time we check that all our devices inside a certain room have established 1 to 1 Olm sessions we load the account from the store. This means we go through an account unpickling phase which contains AES decryption every time we load sessions. Cache instead the account info that we're going to attach to sessions when we initially save or load the account.
We were merging the to-device messages using the extend() method while our data has the shape of BTreeMap<UserId, BTreeMap<_, _>>, extending such a map would mean that the inner BTreeMap would get dropped if both maps contain the same UserId. We need to extend the inner maps, those are guaranteed to contain unique device ids.
To create an executor abstraction we'll probably use the async_executors crate, which supports WASM. |
This PR aims to parallelize the main heavy paths of the crypto that we are able to parallelize:
get_missing_sessions()
methodNotably the key claiming and Olm session creation path (which does the tripple diffie-hellman calculation) is missing here. The key claiming path can't efficiently be parallelized yet since creating a session needs to mutably borrow the libolm
Account
.The method that is used in the C library can be found here. It takes a reference to the
Account
but realistically it doesn't need to take a mutable reference to it considering that it only needs to read the curve25519 identity key from theAccount
as described here.I'm trying bench the concrete improvements we're going to get, the image bellow shows how much this PR improves room key encryption.
The next image shows the improvement for key queries, please note that the key query response that the benchmark uses is heavy on devices and light on users and very light on cross signing keys. Parallelizing the handling of cross signing keys thus hasn't been done yet, and won't likely land as part of this PR.