pkg/helm: use status conditions, update status for failures #814

joelanford · 2018-12-04T21:02:38Z

Description of the change:
Updating helm operator to use conditions and to update status for failures.

Motivation for the change:

The use of conditions is the current convention for status objects.
The operator currently only updates the status if reconciliation is successful.

See previous PR and related issue from helm-app-operator-kit repo.

hasbro17 · 2018-12-05T07:39:46Z

pkg/helm/release/manager.go

+	var release *rpb.Release
+	for _, condition := range status.Conditions {
+		if condition.Type == types.ConditionDeployed && condition.Status == types.StatusTrue {
+			release = condition.Release


We might have had this conversation already so sorry if I'm rehashing but is there no way we can do without relying on the installed/latest release stored in the CR status?

I might be wrong but this seems like we're using the conditions as a state machine for our operator logic(to sync the storage backend to the latest release).

Is it possible to get the latest release for this CR directly from the server?

Of course it's no different than what we were already doing with status.Release so not an issue with this PR but just something I'm wondering if we can address.

Yeah, we did have that conversation, but I don't recall if we discussed this particular aspect of it.

My guess is that the syncing was done this way because we're using the tiller's memory storage driver (here and here) and this is necessary to bootstrap the operator if it has been restarted while there are active CRs.

Generally, syncing would be a no-op because the releases are all already stored in the storage backend.

To solve the problem, we could switch to a ConfigMap or Secret implementation of the storage backend.

Due to this, I think that the Release should potentially still be on the top level of the status IMO.

We have a few options on where to put the release object:

In the condition, like it is now. The benefit here is that we can put the most recent successful release in the Deployed condition and the most recent failed release in the ReleaseFailed condition.

This could be helpful in the case where an install is successful and a subsequent update fails. In that situation, the previous installed release is still active.

It's unclear from the status convention doc whether extra fields like this are allowed in the Condition. That may be an argument in favor of top-level fields.

In the top level, as you suggest. Would we put just the successful release or have top level fields for both successful and failed releases?

In a separate object entirely (which we may do anyway, based on my previous comment) with an object reference in the CR status either in the condition or at the top level, as suggested in the convention doc:

Status information that may be large (especially proportional in size to collections of other resources, such as lists of references to other objects -- see below) and/or rapidly changing, such as resource usage, should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data.

It's unclear from the status convention doc whether extra fields like this are allowed in the Condition. That may be an argument in favor of top-level fields.

I think you can add some but that they should be small, What about just adding the Version, and in the top level of the status has only the active one. so say you have a failed upgrade from Y to X. The active condition would be a failure with the X release version (doesn't helm store this or is that point 3 above?) and the status object is for Y because that is actually still the active release?

For the issue of not needing to rely on the status to sync the in memory storage driver, we can revisit this in a followup PR by most likely using the ConfigMap driver.

For this PR I would lean towards keeping the conditions as they are.
Once we're using a Configmap driver to store the release objects in a Configmap, then we can remove them from the conditions, and probably just store the release version, either in the condition or as a top level field as shawn suggested.
@joelanford WDYT?

hasbro17 · 2018-12-05T07:42:18Z

Overall SGTM apart from the sync release bit.

shawn-hurley · 2018-12-05T15:57:35Z

pkg/helm/controller/reconcile.go

 		return reconcile.Result{}, err
 	}
+	status.RemoveCondition(types.ConditionInitializing)


Should you keep the initializing condition? (the same way that pod keeps container creating condition) to show that the initialization was completed successfully?

joelanford · 2018-12-05T18:24:56Z

pkg/helm/controller/reconcile.go


 	log.Info("Reconciled release")
-	return reconcile.Result{RequeueAfter: r.ResyncPeriod}, nil
+	err = r.updateResourceStatus(o, status)


I just realized this is causing immediate reconciliations even when nothing has changed, so I'll need to come up with a way to prevent this.

Are we updating the CRD scaffold to use the status subresource when we move to 1.12?

@joelanford We're trying to do it before 1.12 actually, as it seems CRD subresources are already beta in 1.11 and on by default.
#787 (comment)
https://github.com/operator-framework/operator-sdk/pull/787/files#diff-f44ebe3a96e65181844d62f38ed5a148R60

We can use the status client to only update the status sub resource and have a predicate to filter out status updates by checking metadata.generation.

This is getting added here: #787

joelanford · 2018-12-06T20:59:09Z

/hold

With #787, we'll get status subresource support, which will simplify this PR. I'll hold off on this until #787 is merged.

hasbro17

LGTM

We can follow up with trying to use either a Configmap or Secrets storage driver to avoid relying on the status conditions to sync the in memory backend.

joelanford · 2018-12-11T19:41:20Z

/hold cancel now that #787 is merged.

There were a few more required fixes that needed to be added, namely:

Making sure finalizers and status updates were both applied correctly now that were using the status subresource.
Adding a watch predicate for generation changes (should we push for this to be in controller-runtime instead?).

@hasbro17 PTAL

shawn-hurley · 2018-12-11T22:11:16Z

pkg/helm/controller/reconcile.go

+	return reconcile.Result{RequeueAfter: r.ResyncPeriod}, err
+}
+
+func (r HelmOperatorReconciler) updateResource(o *unstructured.Unstructured) error {


this is just my sensibilities, but why do we have this function if it is not abstracting anything but a different function call?

Fair question. I was on the fence about this, but I went with it to emphasize the distinction between updating the non-subresources (e.g. metadata, spec, etc.) and the status subresource. So with this we'd have updateResource() and updateResourceStatus() functions. I wouldn't be opposed to using r.Client.Update() directly though.

shawn-hurley · 2018-12-11T22:12:31Z

pkg/predicate/predicate.go

+}
+
+// Update implements default UpdateEvent filter for validating generation change
+func (GenerationChangedPredicate) Update(e event.UpdateEvent) bool {


Could you add this to the ansible operator as well?

hasbro17

LGTM

The predicates are fine in this PR.

…-framework#814) * pkg/helm: use status conditions, update status for failures * pkg/helm: keep Initialized status condition * hack/test/e2e-helm.sh: fixing e2e test * pkg/helm/controller/reconcile.go: fix finalizer resource updates * pkg/predicate,pkg/helm: adding GenerationChangedPredicate * pkg/ansible/controller/controller.go: use GenerationChangedPredicate

pkg/helm: use status conditions, update status for failures

7140ec6

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 4, 2018

joelanford added the language/helm Issue is related to a Helm operator project label Dec 4, 2018

hasbro17 reviewed Dec 5, 2018

View reviewed changes

shawn-hurley reviewed Dec 5, 2018

View reviewed changes

pkg/helm: keep Initialized status condition

4b2a6e2

joelanford commented Dec 5, 2018

View reviewed changes

hack/test/e2e-helm.sh: fixing e2e test

c9431bc

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 6, 2018

hasbro17 approved these changes Dec 11, 2018

View reviewed changes

joelanford added 3 commits December 11, 2018 14:03

Merge branch 'master' into helm-status

5580c03

pkg/helm/controller/reconcile.go: fix finalizer resource updates

b31e603

pkg/predicate,pkg/helm: adding GenerationChangedPredicate

acf2202

joelanford removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2018

shawn-hurley reviewed Dec 11, 2018

View reviewed changes

pkg/ansible/controller/controller.go: use GenerationChangedPredicate

ca6dc20

hasbro17 approved these changes Dec 16, 2018

View reviewed changes

joelanford merged commit 209ac08 into operator-framework:master Dec 17, 2018

joelanford deleted the helm-status branch December 18, 2018 17:31

This was referenced Jan 23, 2019

Operator only sets status when the reconciler succeeds. operator-framework/helm-app-operator-kit#59

Closed

Update status after failures and use conditions operator-framework/helm-app-operator-kit#73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/helm: use status conditions, update status for failures #814

pkg/helm: use status conditions, update status for failures #814

joelanford commented Dec 4, 2018

hasbro17 Dec 5, 2018

hasbro17 Dec 5, 2018

joelanford Dec 5, 2018 •

edited

Loading

shawn-hurley Dec 5, 2018

joelanford Dec 5, 2018

shawn-hurley Dec 5, 2018

hasbro17 Dec 5, 2018

hasbro17 commented Dec 5, 2018

shawn-hurley Dec 5, 2018

joelanford Dec 5, 2018

hasbro17 Dec 5, 2018

shawn-hurley Dec 5, 2018

joelanford commented Dec 6, 2018

hasbro17 left a comment

joelanford commented Dec 11, 2018 •

edited

Loading

shawn-hurley Dec 11, 2018

joelanford Dec 12, 2018

shawn-hurley Dec 11, 2018

hasbro17 left a comment

pkg/helm: use status conditions, update status for failures #814

pkg/helm: use status conditions, update status for failures #814

Conversation

joelanford commented Dec 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelanford Dec 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hasbro17 commented Dec 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelanford commented Dec 6, 2018

hasbro17 left a comment

Choose a reason for hiding this comment

joelanford commented Dec 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hasbro17 left a comment

Choose a reason for hiding this comment

joelanford Dec 5, 2018 •

edited

Loading

joelanford commented Dec 11, 2018 •

edited

Loading