Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-57737] Improve client side call state consistency #681

Merged
merged 3 commits into from
Apr 11, 2024
Merged

Conversation

streamer45
Copy link
Collaborator

Summary

PR should fix most of the remaining call state data races originally mentioned in #512, specifically when loading the pop out window and in case of a websocket reconnect.

This is done by implementing a websocket request handler that returns the calls state under lock. Using the websocket channel as opposed to making HTTP request should guarantee a valid sequencing of potentially concurrent events (e.g. users joining, leaving, etc).

Ticket Link

https://mattermost.atlassian.net/browse/MM-57737

@streamer45 streamer45 added the 2: Dev Review Requires review by a core committer label Apr 5, 2024
@streamer45 streamer45 requested a review from cpoile April 5, 2024 20:56
@streamer45 streamer45 self-assigned this Apr 5, 2024
@@ -714,59 +715,14 @@ export default class Plugin {
}
};

const fetchChannelData = async (channelID: string) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can finally get rid of this, it was totally redundant as we'd be fetching the same again in fetchChannels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

Comment on lines +640 to +646
if (skipChannelID === data[i].channel_id) {
logDebug('skipping channel from state loading', skipChannelID);
continue;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important as it will avoid messing the state for the current call which should only come from websocket from now on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment in the code to remind us about the what and why? I can imagine forgetting and then spending time trying to reason through it.

@codecov-commenter
Copy link

codecov-commenter commented Apr 5, 2024

Codecov Report

Attention: Patch coverage is 43.18182% with 25 lines in your changes are missing coverage. Please review.

Project coverage is 16.14%. Comparing base (788d6eb) to head (bdcf3f2).
Report is 1 commits behind head on MM-42464.

Files Patch % Lines
server/websocket.go 43.18% 23 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           MM-42464     #681      +/-   ##
============================================
+ Coverage     15.91%   16.14%   +0.22%     
============================================
  Files            38       38              
  Lines          6414     6453      +39     
============================================
+ Hits           1021     1042      +21     
- Misses         5271     5286      +15     
- Partials        122      125       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@streamer45 streamer45 added this to the v0.27.0 / MM 9.9 milestone Apr 9, 2024
Copy link
Member

@cpoile cpoile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a weird one isn't it. But if it works 🤷 :)

if err != nil {
return fmt.Errorf("failed to lock call: %w", err)
}
defer p.unlockCall(channelID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make much of a difference if we immediately unocked? Looks like there's no reason no to, and it might reduce some of the non-idealness (and prevent the publish ws event from delaying the unlock).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have ways to get the call state without locking but the point here is that we need to queue the event to be sent through WS before unlocking otherwise we are subject to a race again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course 🤦

Comment on lines +936 to +939
switch msg.Type {
case clientMessageTypeJoin, clientMessageTypeLeave, clientMessageTypeReconnect, clientMessageTypeCallState:
default:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, this is just bad language design. (I mean all c-derivatives here, not just go). Imagine you are a non-programmer (or a programmer who's used to fallthrough, or not used to fallthrough, really), is this quick to read through and know immediately what's happening? sheesh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can revert to crazy if conditions if you prefer, just looked cleaner but not a big deal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, no big deal, just complaining.

// making a potentially racy HTTP call and should guarantee
// a consistent state.
logDebug('requesting call state through ws');
this.context.sendMessage('custom_com.mattermost.calls_call_state', {channelID: callsClient.channelID});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the websocket message isn't a bit confusing. Are you sending the call's state, or requesting it...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being consistent with other events there. We are using the direction (from/to client) to implicitly define whether it's request or response. I know you don't love it :p

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess HTTP fixes the confusion through the Method. Here we use a bit of context as it doesn't make any sense for the client to ever send the call state. Please let me know if this is blocking :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, not blocking :)

Comment on lines +640 to +646
if (skipChannelID === data[i].channel_id) {
logDebug('skipping channel from state loading', skipChannelID);
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment in the code to remind us about the what and why? I can imagine forgetting and then spending time trying to reason through it.

@@ -714,59 +715,14 @@ export default class Plugin {
}
};

const fetchChannelData = async (channelID: string) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

Comment on lines +744 to +749
// We pass currentCallChannelID so that we
// can skip loading its state as a result of the HTTP calls in
// fetchChannels since it would be racy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, here it is. Maybe can we add a similar comment to the function also?

Comment on lines +777 to +781
// A dummy React component so we can access webapp's
// WebSocket client through the provided hook. Just lovely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause us to depend on a new min server version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been there for a long time (before monorepo) so I think we are good. But of course I'd like to fix it properly one day (pass the client object directly to the init function) in which case we'll have to add some backwards compatibility check.

@streamer45
Copy link
Collaborator Author

This is kind of a weird one isn't it.

Why weird? It's a rather simple race condition, our redux state is consistent only if events are received and dispatched in the order they are generated on the server side. With a mixture of HTTP calls and websockets both affecting state we make it racy by design :p

@streamer45 streamer45 requested a review from cpoile April 10, 2024 16:14
Copy link
Member

@cpoile cpoile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cpoile cpoile added 3: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer labels Apr 10, 2024
Base automatically changed from MM-42464 to main April 11, 2024 19:05
@streamer45 streamer45 merged commit 5095fd5 into main Apr 11, 2024
7 checks passed
@streamer45 streamer45 deleted the MM-57737 branch April 11, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3: Reviews Complete All reviewers have approved the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants