Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peering: support peering establishment over mesh gateways, remove ability to pass external addresses to token generation endpoint #1610

Merged
merged 22 commits into from
Oct 20, 2022
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
2f52a5e
start adding grpc-tls port
ndhanushkodi Oct 7, 2022
e002fe4
try running with grpc tls port and explicitly disable getting addrs f…
ndhanushkodi Oct 11, 2022
908a585
the tls port shouldn't be separate, and peering should require tls
ndhanushkodi Oct 11, 2022
3caf84a
not sure if tls and acls must both be enabled
ndhanushkodi Oct 11, 2022
e696145
use mesh resource so conn goes through mesh gateways and run only
ndhanushkodi Oct 12, 2022
bfc0bf0
failing test run with backwards compatible grpc tls port
ndhanushkodi Oct 13, 2022
eefa004
this works if you remove the lb sourceRange
ndhanushkodi Oct 18, 2022
3485e26
this config passes the first test case
ndhanushkodi Oct 19, 2022
42994da
remove the static ip, put back in the test cases, and set the images …
ndhanushkodi Oct 19, 2022
2fde3bf
bring back ent test and set the right replicas
ndhanushkodi Oct 19, 2022
e66b034
remove server.exposeService from peering tests
ndhanushkodi Oct 19, 2022
abd53e1
remove external address support from generate token endpoint
ndhanushkodi Oct 20, 2022
cefc722
add bats test
ndhanushkodi Oct 20, 2022
5ea134f
remove more unused code
ndhanushkodi Oct 20, 2022
fb6d811
run all tests with latest consul image and add changelog
ndhanushkodi Oct 20, 2022
688bf18
try with my image to compare to consul dev image
ndhanushkodi Oct 20, 2022
f3782ae
run latest image, peering tests, with longer timeout since exported s…
ndhanushkodi Oct 20, 2022
6a97224
set ci back to normal
ndhanushkodi Oct 20, 2022
da61b7d
try with xds fix
ndhanushkodi Oct 20, 2022
5e4f6c8
address review comments
ndhanushkodi Oct 20, 2022
aa7391a
can't use xds image since it's oss only
ndhanushkodi Oct 20, 2022
06a9ec7
revert log level
ndhanushkodi Oct 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -520,7 +520,7 @@ jobs:
- ~/.go_workspace/pkg/mod
- run: mkdir -p $TEST_RESULTS
- run-acceptance-tests:
failfast: true
failfast: false
additional-flags: -use-kind -kubecontext="kind-dc1" -secondary-kubecontext="kind-dc2" -consul-image=hashicorppreview/consul-enterprise:1.14-dev
- store_test_results:
path: /tmp/test-results
Expand Down Expand Up @@ -1065,7 +1065,7 @@ jobs:
- run: mkdir -p $TEST_RESULTS

- run-acceptance-tests:
additional-flags: -kubeconfig="$primary_kubeconfig" -secondary-kubeconfig="$secondary_kubeconfig" -enable-openshift -enable-transparent-proxy -consul-image=hashicorppreview/consul-enterprise:1.14-dev
additional-flags: -kubeconfig="$primary_kubeconfig" -secondary-kubeconfig="$secondary_kubeconfig" -enable-openshift -enable-transparent-proxy -consul-image=hashicorppreview/consul-enterprise:1.14-dev -run TestPeering_Connect

- store_test_results:
path: /tmp/test-results
Expand Down Expand Up @@ -1296,6 +1296,21 @@ workflows:
# context: consul-ci
# requires:
# - dev-upload-docker
- cleanup-gcp-resources
- cleanup-azure-resources
- cleanup-eks-resources
- acceptance-gke-1-23:
requires:
- cleanup-gcp-resources
- dev-upload-docker
- acceptance-eks-1-21:
requires:
- cleanup-eks-resources
- dev-upload-docker
- acceptance-aks-1-22:
requires:
- cleanup-azure-resources
- dev-upload-docker
nightly-acceptance-tests:
triggers:
- schedule:
Expand Down
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
## UNRELEASED

FEATURES:
* Peering: Add support for `PeerThroughMeshGateways` in Mesh CRD. [[GH-1478](https://github.com/hashicorp/consul-k8s/pull/1478)]
* Peering:
* Add support for `PeerThroughMeshGateways` in Mesh CRD. [[GH-1478](https://github.com/hashicorp/consul-k8s/pull/1478)]
* move support for customizing the server addresses in peering token generation. Instead, mesh gateways should be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure but should the token changes from the helm chart be listed as a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup added it to breaking instead!

to establish peering connections if the server pods are not directly reachable. [[GH-1610](https://github.com/hashicorp/consul-k8s/pull/1610)]

BREAKING CHANGES:
* Helm:
Expand Down
2 changes: 1 addition & 1 deletion acceptance/framework/k8s/deploy.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ func CheckStaticServerConnectionMultipleFailureMessages(t *testing.T, options *k
expectedOutput = expectedSuccessOutput
}

retrier := &retry.Timer{Timeout: 160 * time.Second, Wait: 2 * time.Second}
retrier := &retry.Timer{Timeout: 320 * time.Second, Wait: 2 * time.Second}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because I noticed that by the time the peering is established and the exported service is actually replicated to the other side, this curl was timing out. It is taking a bit longer for the peering to get set up-- things that cause this are dialing a follower node and needing to retry the peering, a delta xds bug that causes the mesh gateways to take extra time to get their config.


args := []string{"exec", "deploy/" + sourceApp, "-c", sourceApp, "--", "curl", "-vvvsSf"}
args = append(args, curlArgs...)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
resources:
- meshpeering.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: consul.hashicorp.com/v1alpha1
kind: Mesh
metadata:
name: mesh
spec:
peering:
peerThroughMeshGateways: true
21 changes: 17 additions & 4 deletions acceptance/tests/peering/peering_connect_namespaces_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ func TestPeering_ConnectNamespaces(t *testing.T) {
staticClientPeerClusterContext := env.Context(t, environment.SecondaryContextName)

commonHelmValues := map[string]string{
"global.imageK8S": "ndhanushkodi/consul-k8s-dev:pmgw1",
"global.image": "ndhanushkodi/consul-dev:peermeshgw2",

"global.peering.enabled": "true",
"global.enableConsulNamespaces": "true",

Expand Down Expand Up @@ -135,8 +138,6 @@ func TestPeering_ConnectNamespaces(t *testing.T) {
staticServerPeerHelmValues["server.exposeGossipAndRPCPorts"] = "true"
staticServerPeerHelmValues["meshGateway.service.type"] = "NodePort"
staticServerPeerHelmValues["meshGateway.service.nodePort"] = "30100"
staticServerPeerHelmValues["server.exposeService.type"] = "NodePort"
staticServerPeerHelmValues["server.exposeService.nodePort.grpc"] = "30200"
}

releaseName := helpers.RandomName()
Expand All @@ -159,8 +160,6 @@ func TestPeering_ConnectNamespaces(t *testing.T) {
staticClientPeerHelmValues["server.exposeGossipAndRPCPorts"] = "true"
staticClientPeerHelmValues["meshGateway.service.type"] = "NodePort"
staticClientPeerHelmValues["meshGateway.service.nodePort"] = "30100"
staticClientPeerHelmValues["server.exposeService.type"] = "NodePort"
staticClientPeerHelmValues["server.exposeService.nodePort.grpc"] = "30200"
}

helpers.MergeMaps(staticClientPeerHelmValues, commonHelmValues)
Expand All @@ -169,6 +168,20 @@ func TestPeering_ConnectNamespaces(t *testing.T) {
staticClientPeerCluster := consul.NewHelmCluster(t, staticClientPeerHelmValues, staticClientPeerClusterContext, cfg, releaseName)
staticClientPeerCluster.Create(t)

// Create Mesh resource to use mesh gateways.
logger.Log(t, "creating mesh config")
kustomizeMeshDir := "../fixtures/bases/mesh-peering"

k8s.KubectlApplyK(t, staticServerPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
helpers.Cleanup(t, cfg.NoCleanupOnFailure, func() {
k8s.KubectlDeleteK(t, staticServerPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
})

k8s.KubectlApplyK(t, staticClientPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
helpers.Cleanup(t, cfg.NoCleanupOnFailure, func() {
k8s.KubectlDeleteK(t, staticClientPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
})

// Create the peering acceptor on the client peer.
k8s.KubectlApply(t, staticClientPeerClusterContext.KubectlOptions(t), "../fixtures/bases/peering/peering-acceptor.yaml")
helpers.Cleanup(t, cfg.NoCleanupOnFailure, func() {
Expand Down
34 changes: 29 additions & 5 deletions acceptance/tests/peering/peering_connect_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,14 @@ func TestPeering_Connect(t *testing.T) {

for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
fmt.Println("starting test")
Copy link
Contributor

@kschoche kschoche Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace the Println's with Logger.Log or t.Log or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops thanks i forgot to remove those, I was just using those locally so i could see what was going on while running the test!

staticServerPeerClusterContext := env.DefaultContext(t)
staticClientPeerClusterContext := env.Context(t, environment.SecondaryContextName)

commonHelmValues := map[string]string{
"global.imageK8S": "ndhanushkodi/consul-k8s-dev:pmgw1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget these before merge!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

"global.image": "ndhanushkodi/consul-dev:peermeshgw2",

"global.peering.enabled": "true",

"global.tls.enabled": "true",
Expand All @@ -73,6 +77,7 @@ func TestPeering_Connect(t *testing.T) {

"dns.enabled": "true",
"dns.enableRedirection": strconv.FormatBool(cfg.EnableTransparentProxy),
"peering.tokenGeneration.serverAddresses.source": "consul",
}

staticServerPeerHelmValues := map[string]string{
Expand All @@ -90,14 +95,13 @@ func TestPeering_Connect(t *testing.T) {
staticServerPeerHelmValues["server.exposeGossipAndRPCPorts"] = "true"
staticServerPeerHelmValues["meshGateway.service.type"] = "NodePort"
staticServerPeerHelmValues["meshGateway.service.nodePort"] = "30100"
staticServerPeerHelmValues["server.exposeService.type"] = "NodePort"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: why are we removing the grpc port configs for each of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need the exposeService at all for peering connections, so this removes any configuration of it.

staticServerPeerHelmValues["server.exposeService.nodePort.grpc"] = "30200"
}

releaseName := helpers.RandomName()

helpers.MergeMaps(staticServerPeerHelmValues, commonHelmValues)

fmt.Println("install first peer")
// Install the first peer where static-server will be deployed in the static-server kubernetes context.
staticServerPeerCluster := consul.NewHelmCluster(t, staticServerPeerHelmValues, staticServerPeerClusterContext, cfg, releaseName)
staticServerPeerCluster.Create(t)
Expand All @@ -107,23 +111,38 @@ func TestPeering_Connect(t *testing.T) {
}

if !cfg.UseKind {
staticServerPeerHelmValues["server.replicas"] = "3"
staticClientPeerHelmValues["server.replicas"] = "3"
}

if cfg.UseKind {
staticClientPeerHelmValues["server.exposeGossipAndRPCPorts"] = "true"
staticClientPeerHelmValues["meshGateway.service.type"] = "NodePort"
staticClientPeerHelmValues["meshGateway.service.nodePort"] = "30100"
staticClientPeerHelmValues["server.exposeService.type"] = "NodePort"
staticClientPeerHelmValues["server.exposeService.nodePort.grpc"] = "30200"
}

helpers.MergeMaps(staticClientPeerHelmValues, commonHelmValues)

fmt.Println("install second (client) peer")
// Install the second peer where static-client will be deployed in the static-client kubernetes context.
staticClientPeerCluster := consul.NewHelmCluster(t, staticClientPeerHelmValues, staticClientPeerClusterContext, cfg, releaseName)
staticClientPeerCluster.Create(t)

fmt.Println("configure mesh resource so peering token will have mesh gateway addrs")
// Create Mesh resource to use mesh gateways.
logger.Log(t, "creating mesh config")
kustomizeMeshDir := "../fixtures/bases/mesh-peering"

k8s.KubectlApplyK(t, staticServerPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
helpers.Cleanup(t, cfg.NoCleanupOnFailure, func() {
k8s.KubectlDeleteK(t, staticServerPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
})

k8s.KubectlApplyK(t, staticClientPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
helpers.Cleanup(t, cfg.NoCleanupOnFailure, func() {
k8s.KubectlDeleteK(t, staticClientPeerClusterContext.KubectlOptions(t), kustomizeMeshDir)
})

fmt.Println("create acceptor on (client) peer")
// Create the peering acceptor on the client peer.
k8s.KubectlApply(t, staticClientPeerClusterContext.KubectlOptions(t), "../fixtures/bases/peering/peering-acceptor.yaml")
helpers.Cleanup(t, cfg.NoCleanupOnFailure, func() {
Expand All @@ -138,9 +157,13 @@ func TestPeering_Connect(t *testing.T) {
require.NotEmpty(r, acceptorSecretName)
})

fmt.Println("copying secret from client peer to server peer, printing contents:")
// Copy secret from client peer to server peer.
k8s.CopySecret(t, staticClientPeerClusterContext, staticServerPeerClusterContext, "api-token")
resp, _ := k8s.RunKubectlAndGetOutputE(t, staticClientPeerClusterContext.KubectlOptions(t), "secret", "api-token", "-o", "yaml")
fmt.Println(resp)
ndhanushkodi marked this conversation as resolved.
Show resolved Hide resolved

fmt.Println("creating peering dialer on server peer")
// Create the peering dialer on the server peer.
k8s.KubectlApply(t, staticServerPeerClusterContext.KubectlOptions(t), "../fixtures/bases/peering/peering-dialer.yaml")
helpers.Cleanup(t, cfg.NoCleanupOnFailure, func() {
Expand Down Expand Up @@ -261,6 +284,7 @@ func TestPeering_Connect(t *testing.T) {
} else {
k8s.CheckStaticServerConnectionSuccessful(t, staticClientOpts, staticClientName, "http://localhost:1234")
}

})
}
}
20 changes: 1 addition & 19 deletions charts/consul/templates/connect-inject-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
{{ template "consul.validateVaultWebhookCertConfiguration" . }}
{{- template "consul.reservedNamesFailer" (list .Values.connectInject.consulNamespaces.consulDestinationNamespace "connectInject.consulNamespaces.consulDestinationNamespace") }}
{{- $serverEnabled := (or (and (ne (.Values.server.enabled | toString) "-") .Values.server.enabled) (and (eq (.Values.server.enabled | toString) "-") .Values.global.enabled)) -}}
{{- $serverExposeServiceEnabled := (or (and (ne (.Values.server.exposeService.enabled | toString) "-") .Values.server.exposeService.enabled) (and (eq (.Values.server.exposeService.enabled | toString) "-") (or .Values.global.peering.enabled .Values.global.adminPartitions.enabled))) -}}
{{- if not (or (eq .Values.global.peering.tokenGeneration.serverAddresses.source "") (or (eq .Values.global.peering.tokenGeneration.serverAddresses.source "static") (eq .Values.global.peering.tokenGeneration.serverAddresses.source "consul"))) }}{{ fail "global.peering.tokenGeneration.serverAddresses.source must be one of empty string, 'consul' or 'static'" }}{{ end }}
{{- $serverExposeServiceEnabled := (or (and (ne (.Values.server.exposeService.enabled | toString) "-") .Values.server.exposeService.enabled) (and (eq (.Values.server.exposeService.enabled | toString) "-") .Values.global.adminPartitions.enabled)) -}}
{{- if and .Values.externalServers.enabled (not .Values.externalServers.hosts) }}{{ fail "externalServers.hosts must be set if externalServers.enabled is true" }}{{ end -}}
{{ template "consul.validateCloudConfiguration" . }}
# The deployment for running the Connect sidecar injector
Expand Down Expand Up @@ -141,23 +140,6 @@ spec:
-enable-cni={{ .Values.connectInject.cni.enabled }} \
{{- if .Values.global.peering.enabled }}
-enable-peering=true \
{{- if (eq .Values.global.peering.tokenGeneration.serverAddresses.source "") }}
{{- if (and $serverEnabled $serverExposeServiceEnabled) }}
-read-server-expose-service=true \
{{- else }}
{{- if .Values.externalServers.enabled }}
{{- $port := .Values.externalServers.grpcPort }}
{{- range $h := .Values.externalServers.hosts }}
-token-server-address="{{ $h }}:{{ $port }}" \
{{- end }}
{{- end }}
{{- end }}
{{- end }}
{{- if (eq .Values.global.peering.tokenGeneration.serverAddresses.source "static") }}
{{- range $addr := .Values.global.peering.tokenGeneration.serverAddresses.static }}
-token-server-address="{{ $addr }}" \
{{- end }}
{{- end }}
{{- end }}
{{- if .Values.global.openshift.enabled }}
-enable-openshift \
Expand Down
2 changes: 1 addition & 1 deletion charts/consul/templates/mesh-gateway-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ spec:
{{- if .Values.global.adminPartitions.enabled }}
-service-partition={{ .Values.global.adminPartitions.name }} \
{{- end }}
-log-level={{ default .Values.global.logLevel }} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitya mentioned above that she will remove it before merge. I have not idea why it didn't show up in the diffs for review. 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird!

-log-level="trace" \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

-log-json={{ .Values.global.logJSON }}
livenessProbe:
tcpSocket:
Expand Down
5 changes: 5 additions & 0 deletions charts/consul/templates/server-config-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,12 @@ data:
"data_dir": "/consul/data",
"domain": "{{ .Values.global.domain }}",
"ports": {
{{- if not .Values.global.tls.enabled }}
"grpc": 8502,
{{- end }}
{{- if .Values.global.tls.enabled }}
"grpc_tls": 8502,
{{- end }}
ndhanushkodi marked this conversation as resolved.
Show resolved Hide resolved
"serf_lan": {{ .Values.server.ports.serflan.port }}
},
"recursors": {{ .Values.global.recursors | toJson }},
Expand Down
Loading