-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(clustering) config export on demand #9362
Conversation
0a5d9d2
to
23ff5ae
Compare
This reminds me of this: |
fix FT-3308
93c431c
to
030df2d
Compare
yeah, we reduced 4 times of the process to 2, and now it should be 1 |
This also reminds me of this, @suika-kong please read carefully all the comments there: |
While I think That is:
|
That would be hard. We've discussed that earlier: the format of the 2 protocols are very different, and the most we could do here is to share the config table exported. I think that was the reason why we did not do it that time. |
Last time I checked it was not that hard. The wRPC is no different, it just wraps the above in its own protocol. What I said above is common, AFAIK. It means though a lot of code needs to be moved. |
I stand corrected, only the 1. and 2. are easily shareable. The 3. and 4. are not. |
@suika-kong if remove the duplicate code is too much work to do, let's not it do it in this PR (but sure to have a ticket to record it). As CP mostly takes to DP in either of the protocol, making it "on demand" should solve the CP 2x latency issue in most cases, let's focus on ship that and also solve the caching issue Aapo mentioned in 8840. |
|
||
elseif err ~= "timeout" then | ||
if err ~= "timeout" then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ok
is truth-y but self.clients
is empty, this branch will be followed because err
will be nil
and generate a confusing error log entry (I think).
Maybe it would be more straightforward to put the self.clients
check into push_config
?
function _M:push_config()
if not next(self.clients) then
return true
end
local start = ngx_now()
local payload, err = self:export_deflated_reconfigure_payload()
if not payload then
-- ...
end
...otherwise I think this needs to be changed to if err and err ~= "timeout" then
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, maybe the self.clients
check could be pushed to the event handler? This would be simple for both control_plane
and wrpc_control_plane
:
kong.worker_events.register(function(_)
if next(self.clients) and push_config_semaphore:count() <= 0 then
-- the following line always executes immediately after the `if` check
-- because `:count` will never yield, end result is that the semaphore
-- count is guaranteed to not exceed 1
push_config_semaphore:post()
end
end, "clustering", "push_config")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flrgh The first approach has some issues, if we return true
inside push_config
, then the timer of push_config_loop
is still reset which means that loop will wait for db_update_frequency
before even pushing out another updated config.
The second issue might work (I did not spot immediate issue with it), but it seems to be a bit more dangerous as there are a lot of asynchronous going on between worker events and the semaphore. It's a bit harder to argue the correctness.
My thought:
If we change the push_config_loop
function to do the following:
while not exiting() do
local ok, err = push_config_semaphore:wait(1)
if exiting() then
return
end
if ok then
if next(self.clients) do
ok, err = pcall(self.push_config, self)
...
Does this also work? It seems to be simpler than the current approach and does not affects the coalescing timer. Basically, we consume the semaphore, but pretends nothing happened. The current code does the same thing, it just seems to be more complex on the error handling part.
@@ -305,7 +308,10 @@ local function push_config_loop(premature, self, push_config_semaphore, delay) | |||
if exiting() then | |||
return | |||
end | |||
if ok then | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com>
Co-authored-by: Michael Martin <flrgh@protonmail.com>
if ok then | ||
-- we still need to wait for config for if no clients another run, just in case new client is in | ||
if ok and next(self.clients) then | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com>
@@ -540,7 +547,9 @@ local function push_config_loop(premature, self, push_config_semaphore, delay) | |||
if exiting() then | |||
return | |||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end | |
end | |
fix FT-3308