Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore/wrpc inline version negotiation #8775

Closed

Conversation

javierguerragiraldez
Copy link
Contributor

@javierguerragiraldez javierguerragiraldez commented May 10, 2022

Move the services/versions negotiation to the wRPC connection.

Because of limitations in the lua-resty-websocket library, the HTTP-based version negotiation and the websocket connection used separate TCP connections instead of reusing (via keep-alive). So, in the (very common) case of a load balancer in front of a cluster of control plane nodes, it could easily happen to negotiate against one node and then connect to another. To avoid that situation, this PR changes the implemenation of the version negotiation into a wRPC service with a single RPC method. The DP is responsible to make this call as soon as connected, so the agreed result of the negotiation is stored in both sides in the context of the peer connection.

Remove the version negotiation url and immediate handlers
First try on the wrpc url.  if not there, try with old protocol.
Replicate version negotiation data in wrpc call and handler
Add version negotiation service handler to the connection.
Split data plane connection into a separate step.

Add the initial negotiation call in the wrpc data plane.
@javierguerragiraldez javierguerragiraldez force-pushed the chore/wrpc_inline_version_negotiation branch from 26e6ec5 to 6789286 Compare May 10, 2022 21:08
@javierguerragiraldez
Copy link
Contributor Author

javierguerragiraldez commented May 10, 2022

Still things to do, but it actually works and the error flows are in place.

  • try/retry/fallback structure
    • try wrpc first
    • fallbak to old proto on 404
      • reliable detection of 404
    • on connection/no host error go to the loop of timed retries
  • data plane starts with NegotiateServices rpc call
  • rpc handler on control plane performs negotiation
    • control plane saves result in data plane's context (same as old HTTP negotiation)
  • data plane stores result (same as old HTTP negotiation)
  • remove unused code
  • fix/replace old tests

@javierguerragiraldez javierguerragiraldez marked this pull request as ready for review May 10, 2022 21:23
@javierguerragiraldez javierguerragiraldez requested a review from a team as a code owner May 10, 2022 21:23
end

if not ws_conn then
ngx_log(ngx_DEBUG, _log_prefix, "ugh retrying from the start")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh?

kong/clustering/init.lua Outdated Show resolved Hide resolved
local ws_conn, err = implementation.open_connection(self)
ngx_log(ngx_DEBUG, _log_prefix, "got conn: ", tostring(ws_conn), ", err: ", tostring(err))

-- TODO: find how to detect a 404 response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the websocket library doesn't return the http result code, instead, returns the whole header line if it doesn't match the expected pattern. this, added to some inherited fallbacks in the specific server context in nginx config, makes reliably detecting a 404 response a bit more involved than expected. i left the specifics to after i get the negotiation itself working. (that is, now).

the "got conn:" log messages are just to see the exact returned values. they won't stay in the final code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it turns out, any HTTP result is "good enough" to the library, so the user never sees a 404 or any other error.

I've just hacked a peeking socket to work around it.

kong/clustering/init.lua Outdated Show resolved Hide resolved
kong/clustering/init.lua Outdated Show resolved Hide resolved
@@ -60,6 +61,7 @@ end
local function get_config_service(self)
if not wrpc_config_service then
wrpc_config_service = wrpc.new_service()
version_negotiation.add_negotiation_service(wrpc_config_service, self.conf)
wrpc_config_service:add("kong.services.config.v1.config")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents a CP node to execute a ConfigService.SyncConfig before the negotiation is finished?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question for the DP.

Copy link
Contributor Author

@javierguerragiraldez javierguerragiraldez May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing yet. i could put a "negotiated" flag in the peer object and just drop (or reject) everything else until then.

switches to old websocket protocol if the wrpc connection
returns a 404 error.

any other error does the usual "retry in X seconds"
Remove http-specific code from version negotiation.
Protocol itself is a special case of service `protocol.json`/`protocol
.wrpc`.  Default is to try both.  If only one is mentioned, the other is
skipped.
Use services override in tests to specify the protocol.
close all pending semaphores when the connection is closed
@RobSerafini RobSerafini added this to the 3.0 milestone Jun 2, 2022
@StarlightIbuki
Copy link
Contributor

The refactoring to wRPC causes too much conflict to this PR. We will create another PR and take this as a reference.

}

message DPNodeDescription {
string id = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id should be already known from URI parameters. Do we need to include it here?

}

message CPNodeDescription {
string id = 1;
Copy link
Contributor

@StarlightIbuki StarlightIbuki Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, CP's ID(cluster ID) is synced with the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a more proper way to do that is to restructure config and get ID from negotiation.

@chronolaw
Copy link
Contributor

Due to we have merged another PR (#8926), I think we should close this.

@chronolaw chronolaw closed this Jun 22, 2022
@bungle bungle deleted the chore/wrpc_inline_version_negotiation branch October 14, 2022 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants