-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Placeholder: e2ee is slow in massive rooms #16043
Comments
We can probably make a bunch of this more concurrent and save session keys somewhere safe, leading to faster messages after reloading the app. We can also probably increase the number of devices per to_device message when sending out keys, and defer the withheld notifications to after the message is actually sent. |
Yup, that's the long timeout from fetching keys over federation, and basically means that some other homeserver is timing out with its reply. It's supposed to also do a keys claim with a shorter (2s) timeout first, and the long-timeout one isn't supposed to hold up the message sending. But maybe it only does the longer timeout one when it's automatically creating sessions when you start typing a message? It's been a while since I've looked at the code. (#11836)
"Withheld" could also be due to failures in creating olm sessions, which could be due to some devices having run out of one-time keys (which is likely if you sent a message large public room), or due to some servers failing to send keys (which seems likely given that the keys claim call took >10s, which means that some servers have timed out). Another thing that I had been thinking about was batching up some of the IndexedDB operations so that, for example, it fetches the olm sessions for multiple devices at a time, rather than fetching them individually. My suspicion is that it won't produce enough speedup to be worth the effort (since it will probably involve lots of refactoring and increasing code complexity), but maybe your magic graphs can give some more insight. |
Downgrading, as this does not prevent work, it merely makes it slower.. |
Not clear to me from this issue:
@turt2live don't suppose you have any of these details? |
This issue is a bit of PS-sponsored work and has context that github can't surface (nor can I share here). To try and answer as best as I can:
There are significant gains which can be achieved by changing element-web, though some are down to chosen technologies, design patterns, etc. The most major one is storing outbound sessions to reduce the impact of having to send out device messages, though the device message approach may need spec changes.
In the context of this issue, 1000+ devices (roughly 300 users). The PS-sensitive side of this is focused on even larger rooms, however we needed a line in the sand to measure reliably against. At the time, this was the megolm test room with thousands of devices available.
Minutes, with a feeling of it taking days. This is also represented in large rooms like Element Internal where sending a message can take 2 minutes on a strong CPU/server, or 5 minutes on a weak one. |
Duplicate of #15476. |
No description provided.
The text was updated successfully, but these errors were encountered: