Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to delete namespace for HelmReleases impersonating ServiceAccount #270

Closed
darryl-sw opened this issue Jun 1, 2021 · 5 comments · Fixed by #738
Closed

Unable to delete namespace for HelmReleases impersonating ServiceAccount #270

darryl-sw opened this issue Jun 1, 2021 · 5 comments · Fixed by #738

Comments

@darryl-sw
Copy link

darryl-sw commented Jun 1, 2021

When:

  1. A HelmRelease sets .spec.serviceAccountName to "default"
  2. A Namespace is set for termination
  3. The ServiceAccount is terminated before the HelmRelease automatically
  4. The HelmController attempts to delete the HelmRelease
  5. The Namespace never terminates
{"level":"error","ts":"2021-06-01T01:30:41.625Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-296eaaf2","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}
{"level":"error","ts":"2021-06-01T01:31:22.901Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-a2852804","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}
{"level":"error","ts":"2021-06-01T01:32:59.594Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-d4cc7ecb","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}
{"level":"error","ts":"2021-06-01T01:36:09.305Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-296eaaf2","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}

Hypothesis:
The Helm Controller is behaving as expected by attempting to impersonate the SA, and failing the operation because the SA does not exist. This behaviour is unexpected as a a termination of a namespace should result in the deletion of helm resources as well.

Alternatives:

  1. Impersonate only for creation of resources, and not for deletion of resources
  2. If a SA cannot be found, detect if a namespace is marked for deletion, and delete the helm resources
@hiddeco
Copy link
Member

hiddeco commented Jun 1, 2021

Impersonate only for creation of resources, and not for deletion of resources

This would completely defeat the purpose of allowing a service account to be defined, as the idea is that this provides fine grain configuration over the RBAC scope for the HelmRelease.

If a SA cannot be found, detect if a namespace is marked for deletion, and delete the helm resources

This would be better, but still open some room to fiddle around with e.g. the DeletionTimestamp to try to trick the controller into doing something it shouldn't.


I think the better solution would be to add our finalizer to the ServiceAccount, so that it is not terminated before the HelmRelease, but only after we have performed a clean uninstall and removed our finalizer. The tricky part here is de-registration when a user changes the service account configuration, as keeping it around would result in the service account never terminating.

@stefanprodan
Copy link
Member

@hiddeco we can't add our finalizer to the ServiceAccount as it will break the new impersonation, Kubernetes Users are not objects so there is nothing to finalize.

@hiddeco
Copy link
Member

hiddeco commented Jun 1, 2021

To add a bit more context: we are in the process of reshaping our security model (details here: fluxcd/flux2#582, first PoC: fluxcd/kustomize-controller#349). With this model, the service account does not actually exist, but is just a name/group that exists as a RoleBinding.

Stefan's point is that we should not make any more changes to the current impersonation features as they stand, as there is a high chance of breaking changes in the (near) future.

@stealthybox
Copy link
Member

In this example, the HelmRelease is dependent on some other in-namespace resources. (default SA)
The issue is that an entire Namespace is being deleted before helm-controller has the ability to use the dependency.

It is probably important to note how or why the namespace is being deleted. There are a few use-cases.

  1. A Namespace is set for termination

The Namespace/defaultSA and dependent HelmRelease can be easily applied to the cluster using Flux's dependency features.
They cannot be removed from the cluster in the proper order in an atomic way.
This is a limitation with a related issue:
fluxcd/kustomize-controller#301

Solving that issue can help, but if you are just manually deleting the Namespace without respect for ordered, in-namespace dependency, the only things I can imagine that would help is Finalizers on the parent objects or an Admission Control Webhook that prevents deletes using some object DAG.


Regarding the missing SA in the new implementation, you would likely want to block on the existence of the pertinent RoleBindings.

WRT SA's, finalizing the SA is a good start but it's not enough if the necessary RoleBindings and Roles are deleted.
Those should be considered part of the same lifecycle as the principal itslef.


It's possible to protect resources from deletion using Finalizers.
Perhaps helm-controller could look up RoleBindings pertinent to its impersonated principal and attempt to de-register any matching finalizers? ( ex: helm.toolkit.fluxcd.io/hr-namespace/hr-name )

It's hard to think of a U/X that makes this work well and is also performant.
Perhaps helm-controller has an additional informer on RoleBindings and the behavior is opt-in via RoleBinding annotation(s).

Perhaps the Deletion Admission Controller or Webhook is an easier solution to reason about?
Some admission controllers combine Finalizers into their implementation: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#storageobjectinuseprotection

@stealthybox
Copy link
Member

stealthybox commented Jun 30, 2021

It's worth reading this comment if you're parsing this thread: fluxcd/kustomize-controller#301 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants