Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed reconciler should set the finalizer only after resource is created #63

Closed
muvaf opened this issue Oct 28, 2019 · 6 comments · Fixed by #70
Closed

Managed reconciler should set the finalizer only after resource is created #63

muvaf opened this issue Oct 28, 2019 · 6 comments · Fixed by #70
Assignees
Labels
enhancement New feature or request

Comments

@muvaf
Copy link
Member

muvaf commented Oct 28, 2019

What problem are you facing?

Currently, for some reason, if the steps before Create call fails and the reason also causes Delete to fail, the resource is stuck and we can't delete it. Examples:

This started to happen after the merge of https://github.com/crossplaneio/crossplane-runtime/pull/45/files#diff-c7df5a255a05fb18975e474f9af689a9L400 since we set the finalizer no matter what on this PR.

How could Crossplane help solve your problem?

We can set the finalizer only after the Observe call returns that the resource exists and remove it after deletion signal is received and resource doesn't exist. But I think these calls should be native part of the managed reconciler reconcile loop instead of yet another Initializer or Finalizer objects. There has been some discussion around this #45 (comment) but I think we should fix it for 0.5 of Crossplane.

@negz
Copy link
Member

negz commented Oct 28, 2019

But I think these calls should be native part of the managed reconciler reconcile loop instead of yet another Initializer or Finalizer objects.

I can see why we might want to do this, because all the Initializer and Finalizer terminology kind of papers and over and abstracts the fact that we're adding and removing finalizers here. I implemented finalizer addition this way originally because:

  1. Hypothetically a controller might not need or want to set a finalizer? In practice I can't really imagine any controller dealing with an external system that would not need to use a finalizer, though.
  2. Hypothetically a controller might want to set more than one finalizer? We haven't seen this in practice so far.
  3. Updating finalizers requires updating the managed resource itself, not its status. This means one more call every managed resource reconciler must make, one more thing we must mock out for every test, etc.

@negz negz self-assigned this Oct 28, 2019
@negz
Copy link
Member

negz commented Oct 28, 2019

I'm going to play with this today to see if there's something we could do to fix this and #62 in time for possible 0.4.1 release.

@negz
Copy link
Member

negz commented Oct 28, 2019

We can set the finalizer only after the Observe call returns that the resource exists

I feel like it's a little risky only adding the finalizer after we've created our external resource. I like that in the current implementation we don't create the managed resource until we know we've set a finalizer to ensure we'll try to clean it up at delete time. If we reverse the ordering here we could open ourselves to situations in which we've created an external resource but not yet recorded a finalizer, in which case we would not do cleanup when deleted.

@muvaf
Copy link
Member Author

muvaf commented Oct 29, 2019

but not yet recorded a finalizer, in which case we would not do cleanup when deleted.

Well, deletion is still called but only once since you get the deletion event only once when there is no finalizer set.

Ideally, at any point, if managed resource(MR) exists, external resource(ER) exists and if MR disappears so does ER. However, we know that this isn't possible since the user interacts with MR; MR gets created and then ER creation is called, MR gets deleted and only then ER gets deletion call. So, there are necessary creation and deletion lags between these two entities, which is why we use k8s finalizers in the first place.

You either set too early and get a hanging MR with no ER or set too late and you leak ER with no MR. I believe hanging MR is usually much better than leaking ER since the ER is what you're paying for in the end. Setting the finalizer;

  • Current case: After the Connect call. There are common failure cases between this point and the actual creation of the ER. One is insufficient permissions; all calls to provider could fail but you don't delete the finalizer hence hanging MR. The other one is references; they may not resolve for any kind of reason and the function exits early without having a chance to call Finalize. The latter is easier to solve but for the former we need a code path that removes the finalizer of the MR in case all external client functions return error.
  • After the ResourceExists: true signal. You miss the time between the succesful Create call and the next Observe call that returns this signal. If you get Delete meanwhile, you still get the event and try to delete but that would be your only chance since MR disappears immediately.
  • Right before Create call. If Create call fails and its reason causes Delete to fail as well, then you've got a hanging MR with no corresponding ER .
  • After Create returns success. Covers only the resources we create, not the ones that already exists.
  • Right after if !observation.ResourceExists block. There is no call between setting the finalizer and the creation call that could fail and leave the ER hanging, except the kube update call that sets the finalizer, which exists in any case we can come up with. I think this seems to cover the most cases, something like this.
if !observation.ResourceExists {
	creation, err := external.Create(ctx, managed)
	if err != nil {
		// We'll hit this condition if we can't create our external
		// resource, for example if our provider credentials don't have
		// access to create it. If this is the first time we encounter this
		// issue we'll be requeued implicitly when we update our status with
		// the new error condition. If not, we want to try again after a
		// short wait.
		managed.SetConditions(v1alpha1.ReconcileError(err))
		return reconcile.Result{RequeueAfter: r.shortWait}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)
	}
}
// SET FINALIZER
if err := r.managed.PublishConnection(ctx, managed, creation.ConnectionDetails); err != nil {
	// If this is the first time we encounter this issue we'll be
	// requeued implicitly when we update our status with the new error
	// condition. If not, we want to try again after a short wait.
	managed.SetConditions(v1alpha1.ReconcileError(err))
	return reconcile.Result{RequeueAfter: r.shortWait}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)
}

@negz
Copy link
Member

negz commented Oct 30, 2019

Right after if !observation.ResourceExists block.

I don't think there's a good way to add a finalizer right after the if !observation.ResourceExists block. Currently the last thing that block does is return and requeue a new reconcile, because otherwise it would fall through to immediately trying to update the external resource it just created. If we maintained this behaviour and added the finalizer after said block we would not really be adding it until after an entire requeue had happened (including an Observe call that may fail).

I think you spotted this because your example looks like a refactored version of the current if !observation.ResourceExists block that leaves the block immediately after creating the external resource and without requeuing, but I feel like this would run into similar problems of proceeding to update without adding some fairly hairy logic.

At this point my feeling is the best option is to add the finalizer right before the if !observation.ResourceExists block. I don't think we need to worry about the case you highlighted in which the Create call fails, and that failure reason also causes the Delete call to fail, because:

  • By the time we get here we know our Observe call worked. If (for example) our cloud provider credentials were completely wrong, we'd never proceed far enough to add the finalizer.
  • If Observe works but Create fails (for example because we had RO cloud provider credentials) we would already have added the finalizer, but...
  • When the managed resource was deleted we'd be able to Observe that the external resource does not exist (because we were never able to Create it) and thus would not call Delete on the external resource and go straight to unpublishing credentials and removing the finalizer.

@muvaf
Copy link
Member Author

muvaf commented Oct 31, 2019

By the time we get here we know our Observe call worked. If (for example) our cloud provider credentials were completely wrong, we'd never proceed far enough to add the finalizer.

This makes sense. I agree that setting the finalizer before Create call is safer and given that Observe, which is the call that convinces us the deletion succeeded, is called before initializer in your proposal, we wouldn't get into the credential problem stated above. As you pointed out, it's still possible to go with my initial proposal but this one seems better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants