-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/helm: use status conditions, update status for failures #814
Conversation
var release *rpb.Release | ||
for _, condition := range status.Conditions { | ||
if condition.Type == types.ConditionDeployed && condition.Status == types.StatusTrue { | ||
release = condition.Release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have had this conversation already so sorry if I'm rehashing but is there no way we can do without relying on the installed/latest release stored in the CR status?
I might be wrong but this seems like we're using the conditions as a state machine for our operator logic(to sync the storage backend to the latest release).
Is it possible to get the latest release for this CR directly from the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course it's no different than what we were already doing with status.Release
so not an issue with this PR but just something I'm wondering if we can address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we did have that conversation, but I don't recall if we discussed this particular aspect of it.
My guess is that the syncing was done this way because we're using the tiller's memory storage driver (here and here) and this is necessary to bootstrap the operator if it has been restarted while there are active CRs.
Generally, syncing would be a no-op because the releases are all already stored in the storage backend.
To solve the problem, we could switch to a ConfigMap or Secret implementation of the storage backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to this, I think that the Release should potentially still be on the top level of the status IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few options on where to put the release object:
- In the condition, like it is now. The benefit here is that we can put the most recent successful release in the
Deployed
condition and the most recent failed release in theReleaseFailed
condition.- This could be helpful in the case where an install is successful and a subsequent update fails. In that situation, the previous installed release is still active.
- It's unclear from the status convention doc whether extra fields like this are allowed in the Condition. That may be an argument in favor of top-level fields.
- In the top level, as you suggest. Would we put just the successful release or have top level fields for both successful and failed releases?
- In a separate object entirely (which we may do anyway, based on my previous comment) with an object reference in the CR status either in the condition or at the top level, as suggested in the convention doc:
Status information that may be large (especially proportional in size to collections of other resources, such as lists of references to other objects -- see below) and/or rapidly changing, such as resource usage, should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear from the status convention doc whether extra fields like this are allowed in the Condition. That may be an argument in favor of top-level fields.
I think you can add some but that they should be small, What about just adding the Version, and in the top level of the status has only the active one. so say you have a failed upgrade from Y to X. The active condition would be a failure with the X release version (doesn't helm store this or is that point 3 above?) and the status object is for Y because that is actually still the active release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the issue of not needing to rely on the status to sync the in memory storage driver, we can revisit this in a followup PR by most likely using the ConfigMap driver.
For this PR I would lean towards keeping the conditions as they are.
Once we're using a Configmap driver to store the release objects in a Configmap, then we can remove them from the conditions, and probably just store the release version, either in the condition or as a top level field as shawn suggested.
@joelanford WDYT?
Overall SGTM apart from the sync release bit. |
pkg/helm/controller/reconcile.go
Outdated
return reconcile.Result{}, err | ||
} | ||
status.RemoveCondition(types.ConditionInitializing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you keep the initializing condition? (the same way that pod keeps container creating condition) to show that the initialization was completed successfully?
|
||
log.Info("Reconciled release") | ||
return reconcile.Result{RequeueAfter: r.ResyncPeriod}, nil | ||
err = r.updateResourceStatus(o, status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized this is causing immediate reconciliations even when nothing has changed, so I'll need to come up with a way to prevent this.
Are we updating the CRD scaffold to use the status
subresource when we move to 1.12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelanford We're trying to do it before 1.12 actually, as it seems CRD subresources are already beta in 1.11 and on by default.
#787 (comment)
https://github.com/operator-framework/operator-sdk/pull/787/files#diff-f44ebe3a96e65181844d62f38ed5a148R60
We can use the status client to only update the status sub resource and have a predicate to filter out status updates by checking metadata.generation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting added here: #787
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We can follow up with trying to use either a Configmap or Secrets storage driver to avoid relying on the status conditions to sync the in memory backend.
/hold cancel now that #787 is merged. There were a few more required fixes that needed to be added, namely:
@hasbro17 PTAL |
return reconcile.Result{RequeueAfter: r.ResyncPeriod}, err | ||
} | ||
|
||
func (r HelmOperatorReconciler) updateResource(o *unstructured.Unstructured) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just my sensibilities, but why do we have this function if it is not abstracting anything but a different function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair question. I was on the fence about this, but I went with it to emphasize the distinction between updating the non-subresources (e.g. metadata, spec, etc.) and the status subresource. So with this we'd have updateResource()
and updateResourceStatus()
functions. I wouldn't be opposed to using r.Client.Update()
directly though.
} | ||
|
||
// Update implements default UpdateEvent filter for validating generation change | ||
func (GenerationChangedPredicate) Update(e event.UpdateEvent) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this to the ansible operator as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The predicates are fine in this PR.
…-framework#814) * pkg/helm: use status conditions, update status for failures * pkg/helm: keep Initialized status condition * hack/test/e2e-helm.sh: fixing e2e test * pkg/helm/controller/reconcile.go: fix finalizer resource updates * pkg/predicate,pkg/helm: adding GenerationChangedPredicate * pkg/ansible/controller/controller.go: use GenerationChangedPredicate
Description of the change:
Updating helm operator to use conditions and to update status for failures.
Motivation for the change:
See previous PR and related issue from helm-app-operator-kit repo.