Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

Fail scheduling all pods that are not part of consul when the webhook is offline #1024

Merged
merged 13 commits into from
Jul 13, 2021

Conversation

kschoche
Copy link
Contributor

@kschoche kschoche commented Jul 6, 2021

Fail scheduling all pods that are not labeled with app: consul.name.
This ensures that no user apps are inadvertently scheduled and skip mutation while the webhook is offline (after consul has been installed).
I chose to match on app: consul.name so we do not fail to schedule our own pods in case the webhook object is applied to k8s before the rest of our consul components are scheduled.

Changes proposed in this PR:

  • Set failure policy of the webhook to Fail instead of Ignore
  • Add a new field to connectInject stanza: connectInject.failurePolicy default to Fail which controls the behaviour.

How I've tested this PR:

  1. Manually tested by applying this patch which sets the webhook to unready, deploy consul.
diff --git a/templates/connect-inject-deployment.yaml b/templates/connect-inject-deployment.yaml
index 9c35728..e0745d9 100644
--- a/templates/connect-inject-deployment.yaml
+++ b/templates/connect-inject-deployment.yaml
@@ -39,6 +39,13 @@ spec:
       serviceAccountName: {{ template "consul.fullname" . }}-connect-injector-webhook-svc-account
       containers:
         - name: sidecar-injector
+          readinessProbe:
+            exec:
+              command:
+                - cat
+                - /tmp/healthy
+            initialDelaySeconds: 30
+            periodSeconds: 5
           image: "{{ default .Values.global.imageK8S .Values.connectInject.image }}"
           ports:
           - containerPort: 8080
  1. Wait for everything to come "online" and the readinessProbe to fail so that both copies of the webhook are unready:
% k get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
consul-consul-4v7ml                                               1/1     Running   0          66s
consul-consul-5sgcd                                               1/1     Running   0          66s
consul-consul-connect-injector-webhook-deployment-84cb5c9758crk   0/1     Running   0          65s
consul-consul-connect-injector-webhook-deployment-84cb5c97llxd9   0/1     Running   0          65s
consul-consul-controller-7dbd5c45d4-6wbt7                         1/1     Running   0          65s
consul-consul-p4wq4                                               1/1     Running   0          65s
consul-consul-server-0                                            1/1     Running   0          65s
consul-consul-webhook-cert-manager-d4598f84-t8qg9                 1/1     Running   0          65s
consul-consul-x9k46                                               1/1     Running   0          66s
  1. Deploy a sample app that is connect injected and see that it does not get scheduled.
  2. Set a copy of the webhook to healthy: kubectl exec -it consul-consul-connect-injector-webhook-deployment-84cb5c97llxd9 -- touch /tmp/healthy
  3. Sample app gets scheduled:
 % k get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
consul-consul-4v7ml                                               1/1     Running   0          7m13s
consul-consul-5sgcd                                               1/1     Running   0          7m13s
consul-consul-connect-injector-webhook-deployment-84cb5c9758crk   0/1     Running   0          7m12s
consul-consul-connect-injector-webhook-deployment-84cb5c97llxd9   1/1     Running   0          7m12s
consul-consul-controller-7dbd5c45d4-6wbt7                         1/1     Running   0          7m12s
consul-consul-p4wq4                                               1/1     Running   0          7m12s
consul-consul-server-0                                            1/1     Running   0          7m12s
consul-consul-webhook-cert-manager-d4598f84-t8qg9                 1/1     Running   0          7m12s
consul-consul-x9k46                                               1/1     Running   0          7m13s
whoami-75f5b5f654-4vtcs                                           2/2     Running   0          5m28s

How I expect reviewers to test this PR:
Code review
CI run against GKE: https://app.circleci.com/pipelines/github/hashicorp/consul-helm/3370/workflows/2ae29e9c-a234-407a-a4c9-84a96fad0979
CI run against Kind: https://app.circleci.com/pipelines/github/hashicorp/consul-helm/3369/workflows/5a0f3727-e04a-4b03-902a-0bc0137e45f4

Checklist:

  • Bats tests added
  • CHANGELOG entry added (HashiCorp engineers only, community PRs should not add a changelog entry)

@kschoche kschoche added enhancement New feature or request area/connect Related to Connect, e.g. injection labels Jul 6, 2021
@kschoche kschoche requested a review from a team July 6, 2021 16:45
@kschoche kschoche self-assigned this Jul 6, 2021
@kschoche kschoche requested review from ndhanushkodi and ishustava and removed request for a team July 6, 2021 16:45
@kschoche kschoche marked this pull request as draft July 6, 2021 23:01
@kschoche kschoche marked this pull request as ready for review July 9, 2021 22:51
Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this 🎉 🎉 🎉

test/acceptance/framework/config/config.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
values.yaml Outdated Show resolved Hide resolved
kschoche and others added 2 commits July 9, 2021 16:54
Co-authored-by: Iryna Shustava <ishustava@users.noreply.github.com>
CHANGELOG.md Outdated Show resolved Hide resolved
@david-yu david-yu changed the title Fail scehduling all pods that are not part of consul when the webhook is offline Fail scheduling all pods that are not part of consul when the webhook is offline Jul 12, 2021
Copy link
Contributor

@ndhanushkodi ndhanushkodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work Kyle!!

values.yaml Outdated Show resolved Hide resolved
values.yaml Outdated Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/connect Related to Connect, e.g. injection enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants