Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/helm: use status conditions, update status for failures #814

Merged
merged 7 commits into from
Dec 17, 2018

Conversation

joelanford
Copy link
Member

Description of the change:
Updating helm operator to use conditions and to update status for failures.

Motivation for the change:

  • The use of conditions is the current convention for status objects.
  • The operator currently only updates the status if reconciliation is successful.

See previous PR and related issue from helm-app-operator-kit repo.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 4, 2018
@joelanford joelanford added the language/helm Issue is related to a Helm operator project label Dec 4, 2018
var release *rpb.Release
for _, condition := range status.Conditions {
if condition.Type == types.ConditionDeployed && condition.Status == types.StatusTrue {
release = condition.Release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have had this conversation already so sorry if I'm rehashing but is there no way we can do without relying on the installed/latest release stored in the CR status?

I might be wrong but this seems like we're using the conditions as a state machine for our operator logic(to sync the storage backend to the latest release).

Is it possible to get the latest release for this CR directly from the server?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it's no different than what we were already doing with status.Release so not an issue with this PR but just something I'm wondering if we can address.

Copy link
Member Author

@joelanford joelanford Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we did have that conversation, but I don't recall if we discussed this particular aspect of it.

My guess is that the syncing was done this way because we're using the tiller's memory storage driver (here and here) and this is necessary to bootstrap the operator if it has been restarted while there are active CRs.

Generally, syncing would be a no-op because the releases are all already stored in the storage backend.

To solve the problem, we could switch to a ConfigMap or Secret implementation of the storage backend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this, I think that the Release should potentially still be on the top level of the status IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a few options on where to put the release object:

  1. In the condition, like it is now. The benefit here is that we can put the most recent successful release in the Deployed condition and the most recent failed release in the ReleaseFailed condition.
    • This could be helpful in the case where an install is successful and a subsequent update fails. In that situation, the previous installed release is still active.
    • It's unclear from the status convention doc whether extra fields like this are allowed in the Condition. That may be an argument in favor of top-level fields.
  2. In the top level, as you suggest. Would we put just the successful release or have top level fields for both successful and failed releases?
  3. In a separate object entirely (which we may do anyway, based on my previous comment) with an object reference in the CR status either in the condition or at the top level, as suggested in the convention doc:

Status information that may be large (especially proportional in size to collections of other resources, such as lists of references to other objects -- see below) and/or rapidly changing, such as resource usage, should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear from the status convention doc whether extra fields like this are allowed in the Condition. That may be an argument in favor of top-level fields.

I think you can add some but that they should be small, What about just adding the Version, and in the top level of the status has only the active one. so say you have a failed upgrade from Y to X. The active condition would be a failure with the X release version (doesn't helm store this or is that point 3 above?) and the status object is for Y because that is actually still the active release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the issue of not needing to rely on the status to sync the in memory storage driver, we can revisit this in a followup PR by most likely using the ConfigMap driver.

For this PR I would lean towards keeping the conditions as they are.
Once we're using a Configmap driver to store the release objects in a Configmap, then we can remove them from the conditions, and probably just store the release version, either in the condition or as a top level field as shawn suggested.
@joelanford WDYT?

@hasbro17
Copy link
Contributor

hasbro17 commented Dec 5, 2018

Overall SGTM apart from the sync release bit.

return reconcile.Result{}, err
}
status.RemoveCondition(types.ConditionInitializing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you keep the initializing condition? (the same way that pod keeps container creating condition) to show that the initialization was completed successfully?


log.Info("Reconciled release")
return reconcile.Result{RequeueAfter: r.ResyncPeriod}, nil
err = r.updateResourceStatus(o, status)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this is causing immediate reconciliations even when nothing has changed, so I'll need to come up with a way to prevent this.

Are we updating the CRD scaffold to use the status subresource when we move to 1.12?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford We're trying to do it before 1.12 actually, as it seems CRD subresources are already beta in 1.11 and on by default.
#787 (comment)
https://github.com/operator-framework/operator-sdk/pull/787/files#diff-f44ebe3a96e65181844d62f38ed5a148R60

We can use the status client to only update the status sub resource and have a predicate to filter out status updates by checking metadata.generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting added here: #787

@joelanford
Copy link
Member Author

/hold

With #787, we'll get status subresource support, which will simplify this PR. I'll hold off on this until #787 is merged.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 6, 2018
Copy link
Contributor

@hasbro17 hasbro17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We can follow up with trying to use either a Configmap or Secrets storage driver to avoid relying on the status conditions to sync the in memory backend.

@joelanford
Copy link
Member Author

joelanford commented Dec 11, 2018

/hold cancel now that #787 is merged.

There were a few more required fixes that needed to be added, namely:

  • Making sure finalizers and status updates were both applied correctly now that were using the status subresource.
  • Adding a watch predicate for generation changes (should we push for this to be in controller-runtime instead?).

@hasbro17 PTAL

@joelanford joelanford removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2018
return reconcile.Result{RequeueAfter: r.ResyncPeriod}, err
}

func (r HelmOperatorReconciler) updateResource(o *unstructured.Unstructured) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just my sensibilities, but why do we have this function if it is not abstracting anything but a different function call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question. I was on the fence about this, but I went with it to emphasize the distinction between updating the non-subresources (e.g. metadata, spec, etc.) and the status subresource. So with this we'd have updateResource() and updateResourceStatus() functions. I wouldn't be opposed to using r.Client.Update() directly though.

}

// Update implements default UpdateEvent filter for validating generation change
func (GenerationChangedPredicate) Update(e event.UpdateEvent) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this to the ansible operator as well?

Copy link
Contributor

@hasbro17 hasbro17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The predicates are fine in this PR.

@joelanford joelanford merged commit 209ac08 into operator-framework:master Dec 17, 2018
@joelanford joelanford deleted the helm-status branch December 18, 2018 17:31
fabianvf pushed a commit to fabianvf/operator-sdk that referenced this pull request Dec 21, 2018
…-framework#814)

* pkg/helm: use status conditions, update status for failures

* pkg/helm: keep Initialized status condition

* hack/test/e2e-helm.sh: fixing e2e test

* pkg/helm/controller/reconcile.go: fix finalizer resource updates

* pkg/predicate,pkg/helm: adding GenerationChangedPredicate

* pkg/ansible/controller/controller.go: use GenerationChangedPredicate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/helm Issue is related to a Helm operator project size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants