-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(balancer) use a FIFO for eventual consistency updates #6833
Conversation
4b1980d
to
aed6c5a
Compare
end | ||
if kong.configuration.worker_consistency == "strict" then | ||
create_balancers() | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not about this PR, but I have one question about this.
With eventual
when it was first introduced for router
and plugin iterator
rebuilds, it meant only eventual
. That also meant that it was always on
. Things were prepared in a background. strict
meant that on request time we do check if those are up to date, that check we don't do with eventual
, thus the only eventual
.
I am not sure does the balancer take same approach, but I think it should if it is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strict is the default for the balancer based on the default kong.conf values... the same should hold true for router and plugins iterator
The following explanation only applies to the balancer.
For both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker
phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. I'd suggest we try to not load all upstreams from DB into memory in the init()
function here in this PR, this way we can improve the situation at least for the eventual
consistency. Then we can go back to the strict
once we're done with eventual
.
The only difference between eventual and strict is that with strict on request time we always get the most up-to-date upstream. On upstream entities admin api events we immediately invalidate the upstream cache to reflect the admin api entity change so that the next requests will get this most-recently updated entity.
With eventual on request time we just get what's available in the cache right now. There is a background timer running that will periodically update upstreams on upstream entities admin api events. We'll schedule the change to be done after the timer expires. This scheduling is done by a FIFO queue of events (implemented here in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of my comments fall into the bigger bucket of the following problems:
- High DB queries when sending a request to Admin API upstream endpoints;
- A large number of
SELECT * FROM upstreams
queries are issued to the DB during kong restart, causing a high load on the DB
Problem 1) is more systematic, it is present in the router (for routes endpoints) and plugin iterator (for plugins endpoints) as well. But I think we should take this opportunity to solve it here in the balancer in this PR by trying to address my comments.
Problem 2) is present with and without this PR, unfortunately. Right now for both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. Can we optimize it in this PR as well? I'd suggest we try to not load all upstreams from DB into memory in the init()
function here in this PR, this way we can improve the situation at least for the eventual
consistency. Then we can go back to the strict
once we're done with eventual
.
end | ||
if kong.configuration.worker_consistency == "strict" then | ||
create_balancers() | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strict is the default for the balancer based on the default kong.conf values... the same should hold true for router and plugins iterator
The following explanation only applies to the balancer.
For both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker
phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. I'd suggest we try to not load all upstreams from DB into memory in the init()
function here in this PR, this way we can improve the situation at least for the eventual
consistency. Then we can go back to the strict
once we're done with eventual
.
The only difference between eventual and strict is that with strict on request time we always get the most up-to-date upstream. On upstream entities admin api events we immediately invalidate the upstream cache to reflect the admin api entity change so that the next requests will get this most-recently updated entity.
With eventual on request time we just get what's available in the cache right now. There is a background timer running that will periodically update upstreams on upstream entities admin api events. We'll schedule the change to be done after the timer expires. This scheduling is done by a FIFO queue of events (implemented here in this PR).
b507a48
to
c9f9fa2
Compare
c9f9fa2
to
4941165
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me, I have gone over it with @locao . I would appreciate a ✅ from @murillopaula before the merge since he's more involved than me on this part of the code.
* When using eventual worker consistency, instead of trying to synchronize all workers using a shared dictionary, share the events and let each worker deal with its updates. * Fix several IPv6 issues in balancer tests. * Balancer stress tests file was renamed, so the tests must be ran on demand from now on, as they take a long time to run and most of them seem to be flaky on CI environment.
4941165
to
2f2bb9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, reviewed IRL about it!
When using eventual worker consistency, instead of trying to synchronize
all workers using a shared dictionary, share the events, and let each
worker deal with its updates.
This PR also fixes several IPv6 issues in balancer tests.