-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MM-57737] Improve client side call state consistency #681
Conversation
@@ -714,59 +715,14 @@ export default class Plugin { | |||
} | |||
}; | |||
|
|||
const fetchChannelData = async (channelID: string) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can finally get rid of this, it was totally redundant as we'd be fetching the same again in fetchChannels
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
if (skipChannelID === data[i].channel_id) { | ||
logDebug('skipping channel from state loading', skipChannelID); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important as it will avoid messing the state for the current call which should only come from websocket from now on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment in the code to remind us about the what and why? I can imagine forgetting and then spending time trying to reason through it.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## MM-42464 #681 +/- ##
============================================
+ Coverage 15.91% 16.14% +0.22%
============================================
Files 38 38
Lines 6414 6453 +39
============================================
+ Hits 1021 1042 +21
- Misses 5271 5286 +15
- Partials 122 125 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a weird one isn't it. But if it works 🤷 :)
if err != nil { | ||
return fmt.Errorf("failed to lock call: %w", err) | ||
} | ||
defer p.unlockCall(channelID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make much of a difference if we immediately unocked? Looks like there's no reason no to, and it might reduce some of the non-idealness (and prevent the publish ws event from delaying the unlock).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have ways to get the call state without locking but the point here is that we need to queue the event to be sent through WS before unlocking otherwise we are subject to a race again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course 🤦
switch msg.Type { | ||
case clientMessageTypeJoin, clientMessageTypeLeave, clientMessageTypeReconnect, clientMessageTypeCallState: | ||
default: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, this is just bad language design. (I mean all c-derivatives here, not just go). Imagine you are a non-programmer (or a programmer who's used to fallthrough, or not used to fallthrough, really), is this quick to read through and know immediately what's happening? sheesh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I can revert to crazy if conditions if you prefer, just looked cleaner but not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, no big deal, just complaining.
// making a potentially racy HTTP call and should guarantee | ||
// a consistent state. | ||
logDebug('requesting call state through ws'); | ||
this.context.sendMessage('custom_com.mattermost.calls_call_state', {channelID: callsClient.channelID}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the websocket message isn't a bit confusing. Are you sending the call's state, or requesting it...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just being consistent with other events there. We are using the direction (from/to client) to implicitly define whether it's request or response. I know you don't love it :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess HTTP fixes the confusion through the Method. Here we use a bit of context as it doesn't make any sense for the client to ever send the call state. Please let me know if this is blocking :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, not blocking :)
if (skipChannelID === data[i].channel_id) { | ||
logDebug('skipping channel from state loading', skipChannelID); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment in the code to remind us about the what and why? I can imagine forgetting and then spending time trying to reason through it.
@@ -714,59 +715,14 @@ export default class Plugin { | |||
} | |||
}; | |||
|
|||
const fetchChannelData = async (channelID: string) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
// We pass currentCallChannelID so that we | ||
// can skip loading its state as a result of the HTTP calls in | ||
// fetchChannels since it would be racy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, here it is. Maybe can we add a similar comment to the function also?
// A dummy React component so we can access webapp's | ||
// WebSocket client through the provided hook. Just lovely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause us to depend on a new min server version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been there for a long time (before monorepo) so I think we are good. But of course I'd like to fix it properly one day (pass the client object directly to the init function) in which case we'll have to add some backwards compatibility check.
Why weird? It's a rather simple race condition, our redux state is consistent only if events are received and dispatched in the order they are generated on the server side. With a mixture of HTTP calls and websockets both affecting state we make it racy by design :p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Summary
PR should fix most of the remaining call state data races originally mentioned in #512, specifically when loading the pop out window and in case of a websocket reconnect.
This is done by implementing a websocket request handler that returns the calls state under lock. Using the websocket channel as opposed to making HTTP request should guarantee a valid sequencing of potentially concurrent events (e.g. users joining, leaving, etc).
Ticket Link
https://mattermost.atlassian.net/browse/MM-57737