fix(balancer) use a FIFO for eventual consistency updates #6833

locao · 2021-02-11T19:50:07Z

When using eventual worker consistency, instead of trying to synchronize
all workers using a shared dictionary, share the events, and let each
worker deal with its updates.

This PR also fixes several IPv6 issues in balancer tests.

bungle · 2021-02-15T09:45:21Z

kong/runloop/balancer.lua

-    end
+  if kong.configuration.worker_consistency == "strict" then
+    create_balancers()
+    return


I know this is not about this PR, but I have one question about this.

With eventual when it was first introduced for router and plugin iterator rebuilds, it meant only eventual. That also meant that it was always on. Things were prepared in a background. strict meant that on request time we do check if those are up to date, that check we don't do with eventual, thus the only eventual.

I am not sure does the balancer take same approach, but I think it should if it is possible.

strict is the default for the balancer based on the default kong.conf values... the same should hold true for router and plugins iterator

The following explanation only applies to the balancer.

For both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. I'd suggest we try to not load all upstreams from DB into memory in the init() function here in this PR, this way we can improve the situation at least for the eventual consistency. Then we can go back to the strict once we're done with eventual.

The only difference between eventual and strict is that with strict on request time we always get the most up-to-date upstream. On upstream entities admin api events we immediately invalidate the upstream cache to reflect the admin api entity change so that the next requests will get this most-recently updated entity.

With eventual on request time we just get what's available in the cache right now. There is a background timer running that will periodically update upstreams on upstream entities admin api events. We'll schedule the change to be done after the timer expires. This scheduling is done by a FIFO queue of events (implemented here in this PR).

ghost

I think most of my comments fall into the bigger bucket of the following problems:

High DB queries when sending a request to Admin API upstream endpoints;
A large number of SELECT * FROM upstreams queries are issued to the DB during kong restart, causing a high load on the DB

Problem 1) is more systematic, it is present in the router (for routes endpoints) and plugin iterator (for plugins endpoints) as well. But I think we should take this opportunity to solve it here in the balancer in this PR by trying to address my comments.

Problem 2) is present with and without this PR, unfortunately. Right now for both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. Can we optimize it in this PR as well? I'd suggest we try to not load all upstreams from DB into memory in the init() function here in this PR, this way we can improve the situation at least for the eventual consistency. Then we can go back to the strict once we're done with eventual.

kong/runloop/balancer.lua

ghost · 2021-02-18T14:22:42Z

kong/runloop/balancer.lua

-    end
+  if kong.configuration.worker_consistency == "strict" then
+    create_balancers()
+    return


strict is the default for the balancer based on the default kong.conf values... the same should hold true for router and plugins iterator

The following explanation only applies to the balancer.

For both strict and eventual, we create all balancers by loading all upstreams from the DB into memory on a timer, in the background, right after the init_worker phase finishes. A large number of queries are issued to DB, causing a high load on the DB after kong starts. I'd suggest we try to not load all upstreams from DB into memory in the init() function here in this PR, this way we can improve the situation at least for the eventual consistency. Then we can go back to the strict once we're done with eventual.

The only difference between eventual and strict is that with strict on request time we always get the most up-to-date upstream. On upstream entities admin api events we immediately invalidate the upstream cache to reflect the admin api entity change so that the next requests will get this most-recently updated entity.

With eventual on request time we just get what's available in the cache right now. There is a background timer running that will periodically update upstreams on upstream entities admin api events. We'll schedule the change to be done after the timer expires. This scheduling is done by a FIFO queue of events (implemented here in this PR).

kong/runloop/balancer.lua

kikito

This change looks good to me, I have gone over it with @locao . I would appreciate a ✅ from @murillopaula before the merge since he's more involved than me on this part of the code.

* When using eventual worker consistency, instead of trying to synchronize all workers using a shared dictionary, share the events and let each worker deal with its updates. * Fix several IPv6 issues in balancer tests. * Balancer stress tests file was renamed, so the tests must be ran on demand from now on, as they take a long time to run and most of them seem to be flaky on CI environment.

ghost

LGTM, reviewed IRL about it!

locao requested a review from a user February 11, 2021 19:50

locao force-pushed the fix/balancer_eventual_consistency_improve branch 6 times, most recently from 4b1980d to aed6c5a Compare February 12, 2021 07:15

bungle reviewed Feb 15, 2021

View reviewed changes

ghost suggested changes Feb 18, 2021

View reviewed changes

bungle added the core/balancer label Feb 24, 2021

locao force-pushed the fix/balancer_eventual_consistency_improve branch 2 times, most recently from b507a48 to c9f9fa2 Compare February 26, 2021 22:49

locao requested a review from a user February 27, 2021 00:06

locao force-pushed the fix/balancer_eventual_consistency_improve branch from c9f9fa2 to 4941165 Compare March 3, 2021 15:57

locao changed the base branch from next to master March 3, 2021 15:58

kikito approved these changes Mar 3, 2021

View reviewed changes

locao force-pushed the fix/balancer_eventual_consistency_improve branch from 4941165 to 2f2bb9f Compare March 3, 2021 18:04

ghost approved these changes Mar 3, 2021

View reviewed changes

locao merged commit a78e7f9 into master Mar 3, 2021

locao deleted the fix/balancer_eventual_consistency_improve branch March 3, 2021 20:16

locao mentioned this pull request Mar 5, 2021

Use of upstreams increases DNS Issues. #6812

Closed

23henne mentioned this pull request Jul 8, 2021

KONG performance break down while updating entity via admin API #7543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(balancer) use a FIFO for eventual consistency updates #6833

fix(balancer) use a FIFO for eventual consistency updates #6833

locao commented Feb 11, 2021

bungle Feb 15, 2021

ghost Feb 18, 2021 •

edited by ghost

Loading

ghost left a comment •

edited by ghost

Loading

ghost Feb 18, 2021 •

edited by ghost

Loading

kikito left a comment

ghost left a comment

fix(balancer) use a FIFO for eventual consistency updates #6833

fix(balancer) use a FIFO for eventual consistency updates #6833

Conversation

locao commented Feb 11, 2021

bungle Feb 15, 2021

Choose a reason for hiding this comment

ghost Feb 18, 2021 • edited by ghost Loading

Choose a reason for hiding this comment

ghost left a comment • edited by ghost Loading

Choose a reason for hiding this comment

ghost Feb 18, 2021 • edited by ghost Loading

Choose a reason for hiding this comment

kikito left a comment

Choose a reason for hiding this comment

ghost left a comment

Choose a reason for hiding this comment

ghost Feb 18, 2021 •

edited by ghost

Loading

ghost left a comment •

edited by ghost

Loading

ghost Feb 18, 2021 •

edited by ghost

Loading