-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MM-58085] Improve calls load balancing logic #721
Conversation
// Fallback to random choice if we couldn't get system info. | ||
if hostWithMinLoad == nil { | ||
hostWithMinLoad = hostsAvailable[rand.Intn(len(hostsAvailable))] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the continue
in case of error above are to make the change backward compatible in which case we'd be using a randomized approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, nicely done!
@cpoile Asking for re-review since I slightly changed the logic after I noticed that the 1 minute average wasn't as reactive as I would have liked. We are now using a 2-second instant load (see https://github.com/mattermost/rtcd/tree/MM-54335-improvements). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks great.
Summary
PR implements a (hopefully) more efficient load-balancing logic for calls. Up until now, we'd be using a simple round-robin approach which can work well for lots of smaller calls but it can be quite inefficient in case of larger calls.
The proposed changes will fetch actual system load (CPU) info from the rtcd instances and select the host with the lower load.
The rationale here is that we know CPU to be the main performance bottleneck. Exposing this information avoids having to calculate the load in more complex ways such as figuring out how many connections and tracks (and their type) we are sending at any given time.
Ticket Link
https://mattermost.atlassian.net/browse/MM-58085