-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
peering: support peering establishment over mesh gateways, remove ability to pass external addresses to token generation endpoint #1610
Conversation
…rom expose service and use mesh gw image
when running with these images: go test -run TestPeering_Connect -p 1 -timeout 15m -kubeconfig=/Users/nitya/.kube/gke-peering-mesh-1 -secondary-kubeconfig=/Users/nitya/.kube/gke-peering-mesh-2 -enable-enterprise -enable-multi-cluster -consul-k8s-image=ndhanushkodi/consul-k8s-dev:pmgw1 -consul-image=ndhanushkodi/consul-dev:peermeshgw2 -failfast -no-cleanup-on-failure on clusters peering-mesh-1 and peering-mesh-2
…to the right ones
@@ -237,7 +237,7 @@ spec: | |||
{{- if .Values.global.adminPartitions.enabled }} | |||
-service-partition={{ .Values.global.adminPartitions.name }} \ | |||
{{- end }} | |||
-log-level={{ default .Values.global.logLevel }} \ | |||
-log-level="trace" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove
…ervices are being received only around 3 min into trying out the static server connection
@@ -96,7 +96,7 @@ func CheckStaticServerConnectionMultipleFailureMessages(t *testing.T, options *k | |||
expectedOutput = expectedSuccessOutput | |||
} | |||
|
|||
retrier := &retry.Timer{Timeout: 160 * time.Second, Wait: 2 * time.Second} | |||
retrier := &retry.Timer{Timeout: 320 * time.Second, Wait: 2 * time.Second} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was because I noticed that by the time the peering is established and the exported service is actually replicated to the other side, this curl was timing out. It is taking a bit longer for the peering to get set up-- things that cause this are dialing a follower node and needing to retry the peering, a delta xds bug that causes the mesh gateways to take extra time to get their config.
@@ -53,10 +53,14 @@ func TestPeering_Connect(t *testing.T) { | |||
|
|||
for _, c := range cases { | |||
t.Run(c.name, func(t *testing.T) { | |||
fmt.Println("starting test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace the Println's with Logger.Log or t.Log or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops thanks i forgot to remove those, I was just using those locally so i could see what was going on while running the test!
@@ -90,14 +95,13 @@ func TestPeering_Connect(t *testing.T) { | |||
staticServerPeerHelmValues["server.exposeGossipAndRPCPorts"] = "true" | |||
staticServerPeerHelmValues["meshGateway.service.type"] = "NodePort" | |||
staticServerPeerHelmValues["meshGateway.service.nodePort"] = "30100" | |||
staticServerPeerHelmValues["server.exposeService.type"] = "NodePort" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: why are we removing the grpc port configs for each of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need the exposeService at all for peering connections, so this removes any configuration of it.
@@ -237,7 +237,7 @@ spec: | |||
{{- if .Values.global.adminPartitions.enabled }} | |||
-service-partition={{ .Values.global.adminPartitions.name }} \ | |||
{{- end }} | |||
-log-level={{ default .Values.global.logLevel }} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitya mentioned above that she will remove it before merge. I have not idea why it didn't show up in the diffs for review. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic! I just have a few comments but nothing breaking/blocking!
staticServerPeerClusterContext := env.DefaultContext(t) | ||
staticClientPeerClusterContext := env.Context(t, environment.SecondaryContextName) | ||
|
||
commonHelmValues := map[string]string{ | ||
"global.imageK8S": "ndhanushkodi/consul-k8s-dev:pmgw1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget these before merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
CHANGELOG.md
Outdated
* Peering: Add support for `PeerThroughMeshGateways` in Mesh CRD. [[GH-1478](https://github.com/hashicorp/consul-k8s/pull/1478)] | ||
* Peering: | ||
* Add support for `PeerThroughMeshGateways` in Mesh CRD. [[GH-1478](https://github.com/hashicorp/consul-k8s/pull/1478)] | ||
* move support for customizing the server addresses in peering token generation. Instead, mesh gateways should be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure but should the token changes from the helm chart be listed as a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup added it to breaking instead!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I have not had a chance to test it yet but assuming we're 🍏 everywhere I don't have any blocking comments beyond seconding Curt's comment about probably marking this as a breaking change in the changelog!
…lity to pass external addresses to token generation endpoint (#1610)
Changes proposed in this PR:
In Consul 1.13, the way to establish a peering across 2 different networks was to use the expose-servers service to make the Consul servers reachable for peering establishment, and pass the externally reachable addresses of that service into the generate-token endpoint. This way, the peering token would contain these externally reachable server addresses rather than the Consul server pod ips.
Now, with support for peering establishment over mesh gateways, we can use the mesh gateways to reach the servers instead. So this PR removes support entirely for using the expose-servers service for peering. However, the service itself should remain since that will be used for admin partitions.
Now, the peering test uses mesh gateways for peering establishment by configuring the mesh config entry.
How I've tested this PR:
Ran the test locally, and verified via logs that peerings are being established both through the local and remote mesh gateways.
Ran the acceptance tests against all the clouds, passing build: https://app.circleci.com/pipelines/github/hashicorp/consul-k8s/8588/workflows/8fdab6f0-0fc6-48d4-abbc-5d86535ce55b
How I expect reviewers to test this PR:
👀
Checklist: