Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow DNS resolution of the runner pod for all k8s setup #886

Merged
merged 6 commits into from
Sep 7, 2023

Conversation

syalioune
Copy link
Contributor

Draft PR related to #843 (comment) which content I'll replicate here.

First of all, thanks for the great work on this controller.
We've tried deploying recently the tf-controller in a standard GKE clusters and hit a wall because of DNS resolution issues. The controller is not able to resolve DNS names like 10-40-129-81.podinfo.pod.cluster.local.

This issue is best described here #365 (comment) and here #462 (comment). GKE uses Cloud DNS which does not provide IP based DNS name resolution like CoreDNS.

It's still very rough but revolves around :

  • Setup a tf-runner service in each allowed namespaces
  • Add hostname=terraform_crd_name, subdomain=tf-runner fields to the runner pod so that an A record : terraform_crd_name.tf-runner.namespace.svc.cluster_domain is automatically created
  • Add a SAN into the runner generated certificate

Preliminary tests shows that it works.

Before going further, I'm looking for community/maintainer feedbacks 😄

  • Is this proposal in line with the project direction ?
  • Should that method replace entirely pod's ip based resolution or be an alternative with an opt-in flag/option ?
  • Should the controller be responsible for : checking the service existence in the allowed namespace ? creating the service if it does not exist ?

Cheers

Comment on lines 1 to 18
{{- range include "tf-controller.runner.allowedNamespaces" . | fromJsonArray }}
---
apiVersion: v1
kind: Service
metadata:
name: tf-runner
namespace: {{ . }}
spec:
clusterIP: None
ports:
- name: grpc
port: 30000
selector:
app.kubernetes.io/created-by: tf-controller
app.kubernetes.io/name: tf-runner
{{- end }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @syalioune

We want to rely only on Pod hostnames without introducing the Service layer.
Would it be possible to do so with your hacks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A service is mandatory in my proposal to have a consistent DNS entry generated regardless of the k8s cluster setup.
A service with pod hostnames/subdomains is the best compromise since it requires only one service per namespace instead of a service per runner pod.

@@ -233,6 +234,8 @@ func (r *TerraformReconciler) runnerPodSpec(terraform infrav1.Terraform, tlsSecr
}

return v1.PodSpec{
Hostname: terraform.Name,
Subdomain: "tf-runner",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we introduce --use-pod-subdomain-resolution to guard here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two fields are now guarded by --use-pod-subdomain-resolution

mtls/rotator.go Outdated
cert, key, err := cr.createCertPEM(caArtifacts, hostname, time.Now().Add(-1*time.Hour), caArtifacts.validUntil)
hostnames := []string{
fmt.Sprintf("*.%s.pod.%s", namespace, cr.ClusterDomain),
fmt.Sprintf("*.tf-runner.%s.svc.%s", namespace, cr.ClusterDomain),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we introduce --use-pod-subdomain-resolution to also guard here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 13 to 15
selector:
app.kubernetes.io/created-by: tf-controller
app.kubernetes.io/name: tf-runner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
selector:
app.kubernetes.io/created-by: tf-controller
app.kubernetes.io/name: tf-runner
selector:
app.kubernetes.io/created-by: tf-controller
app.kubernetes.io/name: tf-runner

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please introduce the new chart value called usePodSubdomainResolution to

  1. add --use-pod-subdomain-resolution flag to the controller (which requires you to create that flag for it too)
  2. guard generation of Service objects here.

Please note that we need tf-runner Service by default for flux-system namespace too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifications performed 😄

@chanwit
Copy link
Collaborator

chanwit commented Aug 21, 2023

Thank you for your contributions @syalioune

Maybe we want to go with the subdomain resolution as you suggested.
And we also need a new flag (see above comments) for the controller to make sure this new feature does not affect the current users.

Signed-off-by: Alioune SY <sy_alioune@yahoo.fr>
Signed-off-by: syalioune <sy_alioune@yahoo.fr>
@syalioune syalioune force-pushed the feature/gke-dns-connectivity branch from e1d4b00 to 574b05e Compare September 2, 2023 22:24
@syalioune syalioune marked this pull request as ready for review September 2, 2023 22:26
@syalioune syalioune requested a review from chanwit September 2, 2023 22:27
chanwit
chanwit previously approved these changes Sep 4, 2023
Copy link
Collaborator

@chanwit chanwit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thank you so much @syalioune 🥇

cmd/manager/main.go Outdated Show resolved Hide resolved
app.kubernetes.io/created-by: tf-controller
app.kubernetes.io/name: tf-runner
{{- end }}
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the yaml file needs a new line I believe.

Co-authored-by: souleb <bah.soule@gmail.com>
@chanwit chanwit self-assigned this Sep 6, 2023
charts/tf-controller/values.yaml Outdated Show resolved Hide resolved
@chanwit chanwit merged commit 293a6c7 into flux-iac:main Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants