Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consul-telemetry-collector deployment when enableConsulNamespaces #3192

Closed
wants to merge 23 commits into from

Conversation

jjti
Copy link
Contributor

@jjti jjti commented Nov 8, 2023

Issues

@david-yu reported multiple issues with the Consul Telemetry Collector deployment. A different PR fixed a couple of consul-dataplane args: #3184

This attempts to fix some of the connect-inject issues with namespaces:

Once I create the namespace manually, the connect-init container is not able find the service because there are a couple of things in the deployment template that I think is not right: ... should be the namespace where the OTEL is deployed, that by default is consul. It is taking the namespace used the the SyncCatalog which is wrong because I am using just the connectInject.consulNamespaces.mirroringK8s=true, so the SyncCatalog is false and that value is default then

My guess at what happened when this was written was, we were trying to replicate the namespace selection logic of the Consul Dataplane Sidecar webhook:

We run into issues tho, doing that. Partly because that's using syncCatalog. But also because, if CTC is the first service in a namespace, we fail to inject it because the namespace does not exist.

Changes

This PR is an attempt at fixing some of those namespace/partition issues.

  1. adds a new pod label. If set, we ignore all other namespace logic and just deploy the Consul Telemetry Collector to the default namespace. This is the same as what mesh-gateway does: https://github.com/hashicorp/consul-k8s/blob/e11143561551f86e0e7897a3f2830daa67ef5d74/control-plane/connect-inject/controllers/endpoints/endpoints_controller.go#L692
  2. adds -service-partition arg to the consul-dataplane container

How I've tested this PR:

  • bats
  • the scenarios below
Non-default namespaces
$ k get pods --namespace test-namespace
NAME                                           READY   STATUS    RESTARTS   AGE
consul-connect-injector-74c6b4b7dc-5twg9       1/1     Running   0          14m
...
consul-telemetry-collector-68b7b446fc-nzwgb    2/2     Running   0          2m10s
Non-default partitions

How I expect reviewers to test this PR:

  • fact check my assumptions + approaches about consul-k8s-auth-method
  • fact check whether I missed some combination of logic in connect-inject so this will fail with some combo of namespace/partition settings that I'm ignorant of

Checklist:

{{- if .Values.global.enableConsulNamespaces }}
- name: CONSUL_NAMESPACE
value: {{ .Values.syncCatalog.consulNamespaces.consulDestinationNamespace }}
Copy link
Contributor Author

@jjti jjti Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this removes all ref to .Values.syncCatalog.consulNamespaces.consulDestinationNamespace. AFAICT, .Values.syncCatalog.enabled might be false but we'll still insert it here.

My guess is that this is a bug that was from copying this setting from the Sync Catalog Deployment:

-consul-destination-namespace={{ .Values.syncCatalog.consulNamespaces.consulDestinationNamespace }} \

@jjti jjti changed the title Jjtimmons/fix ctc namespace default Fix consul-telemetry-collector deployment's setting of CONSUL_LOGIN_NAMESPACE Nov 8, 2023
Base automatically changed from jjtimmons/fix-ctc-with-acls to main November 8, 2023 21:35
@jjti jjti force-pushed the jjtimmons/fix-ctc-namespace-default branch from 26240ad to e2fc3e9 Compare November 8, 2023 23:30
Add test for v2

Fix bats tests

Fix CONSUL_NAMESPACE to be default

Fix consul telemetry collector login namespace

Finish namespace/partition logic changes

Update v2 too

Remove more $roots
@jjti jjti force-pushed the jjtimmons/fix-ctc-namespace-default branch from e2fc3e9 to b2f7f00 Compare November 8, 2023 23:31
@jjti jjti changed the title Fix consul-telemetry-collector deployment's setting of CONSUL_LOGIN_NAMESPACE Fix consul-telemetry-collector deployment namespace/partition when non-default Nov 9, 2023
@jjti jjti added backport/1.1.x Backport to release/1.1.x branch backport/1.2.x This release branch is no longer active. backport/1.3.x labels Nov 9, 2023
@jjti jjti marked this pull request as ready for review November 9, 2023 17:41
Comment on lines 292 to 313
- -login-namespace={{ .Values.syncCatalog.consulNamespaces.consulDestinationNamespace }}
- -login-namespace={{ .Values.connectInject.consulNamespaces.consulDestinationNamespace }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the fixes: login-name is from connectInject destination namespace not syncCatalog

@david-yu
Copy link
Contributor

david-yu commented Nov 9, 2023

@jjti When you say you failed to test with non-default partitions is this because non-default partition support is not available in the telemetry collector? Or is it because you would need to help with partitions on K8s?

@jjti
Copy link
Contributor Author

jjti commented Nov 9, 2023

@jjti When you say you failed to test with non-default partitions is this because non-default partition support is not available in the telemetry collector? Or is it because you would need to help with partitions on K8s?

At this point it's a me problem. If you have a ref values.yaml for that that'd be appreciated. It think it'll work, but the changes here are copied from adjacent deployments like ingress-gateways

{{- if .Values.global.enableConsulNamespaces }}
{{- if .Values.syncCatalog.consulNamespaces.mirroringK8S }}
{{- if .Values.connectInject.consulNamespaces.mirroringK8S }}
- -service-namespace={{ .Values.connectInject.consulNamespaces.mirroringK8SPrefix }}{{ .Release.Namespace }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change here to use k8s namespace prefixing like we do in the webhook

@david-yu
Copy link
Contributor

david-yu commented Nov 9, 2023

@jjti
Copy link
Contributor Author

jjti commented Nov 9, 2023

@jjti Does this help? https://developer.hashicorp.com/consul/docs/enterprise/admin-partitions#install-on-the-non-default-partition-clusters-running-workloads or do you need a writeup specifically written for HCP?

It does and thanks. I've been trying to hack helm templates together but lack the background knowledge to be efficient. Thanks! I'll draft this until that's done

@jjti jjti marked this pull request as draft November 9, 2023 18:01
@jjti jjti changed the title Fix consul-telemetry-collector deployment namespace/partition when non-default Fix consul-telemetry-collector deployment in non-default partitions/namespaces Nov 14, 2023
jjti and others added 3 commits November 14, 2023 12:59
…the latest Consul submodules (#3194)

* Add replace directive in Go mod for Control Plane so that we pull in the latest Consul submodules

* Add a comment to explain why we need a replace directive
Thomas Eckert and others added 8 commits November 14, 2023 12:59
Add MeshGateway to `mesh.consul.hashicorp.com/v2beta1`
add validation to account for type change
* checkpoint

* checkpoint, passing test

* kitchen sink, NET-4992

* lint issue

* clean up unneeded calls

---------

Co-authored-by: Sarah Alsmiller <sarah.alsmiller@sarah.alsmiller-RQQ26PQ2L5>
…eway-reso… (#3200)

* Adds GatewayClassConfig and MeshGateway resources to the gateway-resources-configmap.yaml in the Helm chart

* Updates the configmap to include more fields for the gatewayClassConfig for mesh gateways
* chore: skaffold build experiment

* feedback: add experiment comments
* NET-6401 Stub MeshGateway controller

* Add MeshGateways resource to connect-inject-clusterrole

* Setup v2controller for MeshGateway

* Add bats test assertion for connect-inject-clusterrole

* Regenerate control-plane/config/rbac/role.yaml
@jjti jjti force-pushed the jjtimmons/fix-ctc-namespace-default branch from 42e19e4 to 83ca25c Compare November 14, 2023 17:59
@jjti jjti changed the title Fix consul-telemetry-collector deployment in non-default partitions/namespaces Fix consul-telemetry-collector deployment when enableConsulNamespaces Nov 14, 2023
@jjti jjti closed this Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1.x Backport to release/1.1.x branch backport/1.2.x This release branch is no longer active. backport/1.3.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants