Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(balancer) use a FIFO for eventual consistency updates #6833

Merged
merged 1 commit into from
Mar 3, 2021

Conversation

locao
Copy link
Contributor

@locao locao commented Feb 11, 2021

When using eventual worker consistency, instead of trying to synchronize
all workers using a shared dictionary, share the events, and let each
worker deal with its updates.

This PR also fixes several IPv6 issues in balancer tests.

@locao locao requested a review from a user February 11, 2021 19:50
@locao locao force-pushed the fix/balancer_eventual_consistency_improve branch 6 times, most recently from 4b1980d to aed6c5a Compare February 12, 2021 07:15
end
if kong.configuration.worker_consistency == "strict" then
create_balancers()
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not about this PR, but I have one question about this.

With eventual when it was first introduced for router and plugin iterator rebuilds, it meant only eventual. That also meant that it was always on. Things were prepared in a background. strict meant that on request time we do check if those are up to date, that check we don't do with eventual, thus the only eventual.

I am not sure does the balancer take same approach, but I think it should if it is possible.

Copy link

@ghost ghost Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strict is the default for the balancer based on the default kong.conf values... the same should hold true for router and plugins iterator

The following explanation only applies to the balancer.

For both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. I'd suggest we try to not load all upstreams from DB into memory in the init() function here in this PR, this way we can improve the situation at least for the eventual consistency. Then we can go back to the strict once we're done with eventual.

The only difference between eventual and strict is that with strict on request time we always get the most up-to-date upstream. On upstream entities admin api events we immediately invalidate the upstream cache to reflect the admin api entity change so that the next requests will get this most-recently updated entity.

With eventual on request time we just get what's available in the cache right now. There is a background timer running that will periodically update upstreams on upstream entities admin api events. We'll schedule the change to be done after the timer expires. This scheduling is done by a FIFO queue of events (implemented here in this PR).

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of my comments fall into the bigger bucket of the following problems:

  1. High DB queries when sending a request to Admin API upstream endpoints;
  2. A large number of SELECT * FROM upstreams queries are issued to the DB during kong restart, causing a high load on the DB

Problem 1) is more systematic, it is present in the router (for routes endpoints) and plugin iterator (for plugins endpoints) as well. But I think we should take this opportunity to solve it here in the balancer in this PR by trying to address my comments.

Problem 2) is present with and without this PR, unfortunately. Right now for both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. Can we optimize it in this PR as well? I'd suggest we try to not load all upstreams from DB into memory in the init() function here in this PR, this way we can improve the situation at least for the eventual consistency. Then we can go back to the strict once we're done with eventual.

kong/runloop/balancer.lua Outdated Show resolved Hide resolved
kong/runloop/balancer.lua Outdated Show resolved Hide resolved
end
if kong.configuration.worker_consistency == "strict" then
create_balancers()
return
Copy link

@ghost ghost Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strict is the default for the balancer based on the default kong.conf values... the same should hold true for router and plugins iterator

The following explanation only applies to the balancer.

For both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. I'd suggest we try to not load all upstreams from DB into memory in the init() function here in this PR, this way we can improve the situation at least for the eventual consistency. Then we can go back to the strict once we're done with eventual.

The only difference between eventual and strict is that with strict on request time we always get the most up-to-date upstream. On upstream entities admin api events we immediately invalidate the upstream cache to reflect the admin api entity change so that the next requests will get this most-recently updated entity.

With eventual on request time we just get what's available in the cache right now. There is a background timer running that will periodically update upstreams on upstream entities admin api events. We'll schedule the change to be done after the timer expires. This scheduling is done by a FIFO queue of events (implemented here in this PR).

kong/runloop/balancer.lua Outdated Show resolved Hide resolved
kong/runloop/balancer.lua Outdated Show resolved Hide resolved
kong/runloop/balancer.lua Show resolved Hide resolved
@locao locao force-pushed the fix/balancer_eventual_consistency_improve branch 2 times, most recently from b507a48 to c9f9fa2 Compare February 26, 2021 22:49
@locao locao requested a review from a user February 27, 2021 00:06
@locao locao force-pushed the fix/balancer_eventual_consistency_improve branch from c9f9fa2 to 4941165 Compare March 3, 2021 15:57
@locao locao changed the base branch from next to master March 3, 2021 15:58
Copy link
Member

@kikito kikito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me, I have gone over it with @locao . I would appreciate a ✅ from @murillopaula before the merge since he's more involved than me on this part of the code.

* When using eventual worker consistency, instead of trying to
  synchronize all workers using a shared dictionary, share the events
  and let each worker deal with its updates.
* Fix several IPv6 issues in balancer tests.
* Balancer stress tests file was renamed, so the tests must be ran on
  demand from now on, as they take a long time to run and most of them
  seem to be flaky on CI environment.
@locao locao force-pushed the fix/balancer_eventual_consistency_improve branch from 4941165 to 2f2bb9f Compare March 3, 2021 18:04
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, reviewed IRL about it!

@locao locao merged commit a78e7f9 into master Mar 3, 2021
@locao locao deleted the fix/balancer_eventual_consistency_improve branch March 3, 2021 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants