-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checksum label is missing when prune is false #305
Comments
We used to have the checksum labels no matter if GC was enabled and that caused many issues for our users, controllers like Kafka, MySQL and others would delete the stateful sets every time the label changes, we also spam with alerts since all objects are changed. |
Fair enough, although that sounds like the behavior with prune is suboptimal in those cases. |
An issue with the above suggestion is that the GC gets slower linearly as garbage grows, but maybe it's worth the tradeoff. Is the issue with the Kafka/MySQL controller that the labels are being copied down to the child STS objects from a parent CR? (and the labels aren't stable) |
I don’t think you can query based on annotations and fetching all objects of a kind could be expensive and due to rate limiting, on some distributions it will result in timeouts. |
If the label selector is only the kustomization Name + Namespace, we can still query more efficiently, but it would include all of the garbage. |
Is the biggest problem here that some controllers doesn't handle updates of labels well? What if we kept the default behavior to include checksums and made it possible to disable per resource using an annotation like |
Noted in conversation with Stefan, a concern with merging #348 with the current state of kustomize-controller is that it will cause Events to be generated whenever the The exact desired Object Tracking behavior and constraints are not super clear to me yet.
a. Annotations are intended to be arbitrary extensions, so adding them when the labels are already changing in the Prune Enabled case seems inconsequential to me. b. Adding annotations when Prune Disabled is less likely to cause undesirable downstream reconciles, but there is additional overhead to consider now that every child object of the Kustomization must be updated, which currently doesn't happen. Using annotations but omitting labels prevents clients interested in stale objects from using label-selectors like kustomize-controller can. c. The only consequence I can think of: If we could clarify together what sort of behavioral constraints we want from kustomize-controller's Object Tracking, we'll come up with the right solution. I'm almost inclined to suggest that we just use Prune-labels (not annotations) all the time and include a boolean annotation or spec field so that people who really need to opt-out of Object Tracking checksums can do so. |
@stealthybox Are we able to control these Kubernetes events in some way or are these just a construct of updating the object in general and can't be avoided? I'm assuming that this is a construct of Kubernetes but just clarifying. |
For users that have GC disabled this is what currently happens:
If we merge Jonathan's PR:
As you can see with that PR we'll be sending "bogus" events, the service didn't changed in Git nor in cluster, but To resolve this issue, instead of relying on |
Hopefully cache-hits in the apply for all of the objects from the diff will make adding the diff unnoticeable. It would be really ideal if the kubectl output had structured diff information already. |
So one thing I found when testing out kubectl diff in one of my clusters is that it failed. The reason it failed is because I have OPA Gatekeeper mutating webhook n the cluster. It returns an error message.
The reason this occurs is because the mutating webhook has set We need to document that flux depends on dry run working and not being limited by a webhook. I will try to get this solved in the gatekeeper project, but I am unsure if there are other projects which may cause flux to stop working. |
kustomize-controller uses dry run when |
Yes but validation is an optional setting that is disabled by default? This feature would run dry run for every apply. |
Validation is enforced at bootstrap, so anyone running bootstrap via the CLI or Terraform will run into this issue. I can see how doesn't apply to Azure... Guess we'll have to make diff optional too but then we are back where we're started, without diff you get events spam and so on. |
@phillebaba I'm for dropping |
I think responsibility should fall on the end user to make this work. I dont normally use the diff or dry run commands which is why I have totally missed this. I think we should move forward with having dry run as a required feature and document this requirement as I bet it will become more obvious when v1beta2 is released. I created an issue in the gatekeeper repository, and my guess is that kyverno does not have this problem, but I will have a look just to be sure. |
Every kyverno release is being e2e tested with every Flux release :) See fluxcd/flux2-multi-tenancy#30 |
@phillebaba you may want to review #362 and express your concerns there, maybe in that PR we should run the diff only if validation is enable so that Flux doesn't crash when used with OPA Gatekeeper? |
@phillebaba I looked it up and Kyverno uses $ k apply --dry-run=client -f fails-policy.yaml
kustomization.kustomize.toolkit.fluxcd.io/podinfo2 created (dry run)
$ k apply --dry-run=server -f fails-policy.yaml
$ k diff -f fails-policy.yaml
Error from server: admission webhook "validate.kyverno.svc" denied the request:
resource Kustomization/apps/podinfo was blocked due to the following policies
flux-multi-tenancy:
serviceAccountName: 'validation error: .spec.serviceAccountName is required. Rule
serviceAccountName failed at path /spec/serviceAccountName/' |
This has now been fixed in Gatekeeper so should not be any issues when the next version is released. open-policy-agent/gatekeeper/pull/1360 |
We have in #379 an implementation that uses diff, but this falls short on numerous cases (see #379 (comment)) given this situation I'm inclined on closing this issue as |
I'm going to go ahead and close the issue. Thanks for the discussion. |
When prune is true, child objects of Kustomizations have these labels:
When you set prune to false, the checksum label is dropped:
Even if the user does not have garbage collection enabled, it is still very useful to have these checksum labels because you can use them to identify stale resources.
This would for example be useful in a
flux tree
view or a web interface like the Flux UI that shows which objects are managed and which are not.I don't see any reason not to include these checksums.
It should be a relatively safe change for existing Kustomizations when kustomize-controller is updated, but it will cause every object to be patched with new labels.
The text was updated successfully, but these errors were encountered: