Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure system recovers quickly from failures or drift in state #326

Merged
merged 4 commits into from
Sep 22, 2020

Conversation

thisisnotashwin
Copy link
Contributor

@thisisnotashwin thisisnotashwin commented Sep 17, 2020

Changes proposed in this PR:

  • helm upgrades will cause the caBundle to get reset on the mutating webhooks. By "reconciling" the state of the system every second, we ensure the drift in this state has a minimal impact on the uptime of the system. it will now verify that the certificates as well as the CA bundle are "correct" every second and update them if they arent.

  • this will also ensure any edits made to the secret or the webhook configuration, which could lead to downtime in the system, are recovered from within a second.

How I've tested this PR:

  • helm installed consul with the image.
  • performed a helm upgrade once the system had reached a stable state.
  • attempted the creation of a service default and it did not throw an x509 error.

How I expect reviewers to test this PR:

  • kubectl edit the secret with the certificate and/or the MWC for the webhook certs. The system should recover within a few seconds and the webhook and the webhook configuration should be able to communicate with each other successfully.

Image for testing:
ashwinvenkatesh/consul-k8s:webhook-certs@sha256:25b85fd6ccbe7e141cd1541264c40cb15f28c1ba815a4df58c6a3949b583e91f

Checklist:

  • Tests added

@thisisnotashwin thisisnotashwin requested review from lkysow, a team and ishustava and removed request for a team September 17, 2020 22:08
subcommand/webhook-cert-manager/command.go Show resolved Hide resolved
subcommand/webhook-cert-manager/command.go Outdated Show resolved Hide resolved
subcommand/webhook-cert-manager/command_test.go Outdated Show resolved Hide resolved
subcommand/webhook-cert-manager/command_test.go Outdated Show resolved Hide resolved
- helm upgrades will cause the caBundle to get reset on the mutating
  webhooks. By "reconciling" the state of the system every second, we
ensure the drift in this state has a minimal impact on the uptime of the
system. it will now verify that the certificates as well as the CA
bundle are "correct" every second and update them if they arent.
Copy link
Member

@lkysow lkysow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I haven't run this code.

@thisisnotashwin thisisnotashwin added type/bug Something isn't working theme/crds labels Sep 18, 2020
Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good! I've left some comments about improving the tests, but otherwise, this looks good to me. Let me know what you think!

subcommand/webhook-cert-manager/command_test.go Outdated Show resolved Hide resolved
subcommand/webhook-cert-manager/command.go Show resolved Hide resolved
subcommand/webhook-cert-manager/command.go Show resolved Hide resolved
subcommand/webhook-cert-manager/command_test.go Outdated Show resolved Hide resolved
subcommand/webhook-cert-manager/command.go Outdated Show resolved Hide resolved
subcommand/webhook-cert-manager/command.go Outdated Show resolved Hide resolved
Ashwin Venkatesh and others added 2 commits September 21, 2020 18:43
Co-authored-by: Iryna Shustava <ishustava@users.noreply.github.com>
Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @ashwin-venkatesh !!

@thisisnotashwin thisisnotashwin merged commit cc1d16d into crd-controller-base Sep 22, 2020
@thisisnotashwin thisisnotashwin deleted the webhook-cert-recover branch September 22, 2020 17:14
ndhanushkodi pushed a commit to ndhanushkodi/consul-k8s that referenced this pull request Jul 9, 2021
Add PodSecurityPolicies for server-acl-init
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/crds type/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants