Managed reconciler should set the finalizer only after resource is created #63

muvaf · 2019-10-28T14:25:18Z

What problem are you facing?

Currently, for some reason, if the steps before Create call fails and the reason also causes Delete to fail, the resource is stuck and we can't delete it. Examples:

Insufficient permissions of the provider's credential causes even the first Observe call to fail, but since finalizer is already set, user cannot delete the CR.
If references cannot be solved forever, there is no way to delete the resource since the finalizer is set already and Delete is never executed. Managed Reconciler: A resource with unresolved references never gets deleted #62

This started to happen after the merge of https://github.com/crossplaneio/crossplane-runtime/pull/45/files#diff-c7df5a255a05fb18975e474f9af689a9L400 since we set the finalizer no matter what on this PR.

How could Crossplane help solve your problem?

We can set the finalizer only after the Observe call returns that the resource exists and remove it after deletion signal is received and resource doesn't exist. But I think these calls should be native part of the managed reconciler reconcile loop instead of yet another Initializer or Finalizer objects. There has been some discussion around this #45 (comment) but I think we should fix it for 0.5 of Crossplane.

The text was updated successfully, but these errors were encountered:

negz · 2019-10-28T22:42:34Z

But I think these calls should be native part of the managed reconciler reconcile loop instead of yet another Initializer or Finalizer objects.

I can see why we might want to do this, because all the Initializer and Finalizer terminology kind of papers and over and abstracts the fact that we're adding and removing finalizers here. I implemented finalizer addition this way originally because:

Hypothetically a controller might not need or want to set a finalizer? In practice I can't really imagine any controller dealing with an external system that would not need to use a finalizer, though.
Hypothetically a controller might want to set more than one finalizer? We haven't seen this in practice so far.
Updating finalizers requires updating the managed resource itself, not its status. This means one more call every managed resource reconciler must make, one more thing we must mock out for every test, etc.

negz · 2019-10-28T22:43:21Z

I'm going to play with this today to see if there's something we could do to fix this and #62 in time for possible 0.4.1 release.

negz · 2019-10-28T22:52:28Z

We can set the finalizer only after the Observe call returns that the resource exists

I feel like it's a little risky only adding the finalizer after we've created our external resource. I like that in the current implementation we don't create the managed resource until we know we've set a finalizer to ensure we'll try to clean it up at delete time. If we reverse the ordering here we could open ourselves to situations in which we've created an external resource but not yet recorded a finalizer, in which case we would not do cleanup when deleted.

muvaf · 2019-10-29T10:10:30Z

but not yet recorded a finalizer, in which case we would not do cleanup when deleted.

Well, deletion is still called but only once since you get the deletion event only once when there is no finalizer set.

Ideally, at any point, if managed resource(MR) exists, external resource(ER) exists and if MR disappears so does ER. However, we know that this isn't possible since the user interacts with MR; MR gets created and then ER creation is called, MR gets deleted and only then ER gets deletion call. So, there are necessary creation and deletion lags between these two entities, which is why we use k8s finalizers in the first place.

You either set too early and get a hanging MR with no ER or set too late and you leak ER with no MR. I believe hanging MR is usually much better than leaking ER since the ER is what you're paying for in the end. Setting the finalizer;

Current case: After the Connect call. There are common failure cases between this point and the actual creation of the ER. One is insufficient permissions; all calls to provider could fail but you don't delete the finalizer hence hanging MR. The other one is references; they may not resolve for any kind of reason and the function exits early without having a chance to call Finalize. The latter is easier to solve but for the former we need a code path that removes the finalizer of the MR in case all external client functions return error.
After the ResourceExists: true signal. You miss the time between the succesful Create call and the next Observe call that returns this signal. If you get Delete meanwhile, you still get the event and try to delete but that would be your only chance since MR disappears immediately.
Right before Create call. If Create call fails and its reason causes Delete to fail as well, then you've got a hanging MR with no corresponding ER .
After Create returns success. Covers only the resources we create, not the ones that already exists.
Right after if !observation.ResourceExists block. There is no call between setting the finalizer and the creation call that could fail and leave the ER hanging, except the kube update call that sets the finalizer, which exists in any case we can come up with. I think this seems to cover the most cases, something like this.

if !observation.ResourceExists {
	creation, err := external.Create(ctx, managed)
	if err != nil {
		// We'll hit this condition if we can't create our external
		// resource, for example if our provider credentials don't have
		// access to create it. If this is the first time we encounter this
		// issue we'll be requeued implicitly when we update our status with
		// the new error condition. If not, we want to try again after a
		// short wait.
		managed.SetConditions(v1alpha1.ReconcileError(err))
		return reconcile.Result{RequeueAfter: r.shortWait}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)
	}
}
// SET FINALIZER
if err := r.managed.PublishConnection(ctx, managed, creation.ConnectionDetails); err != nil {
	// If this is the first time we encounter this issue we'll be
	// requeued implicitly when we update our status with the new error
	// condition. If not, we want to try again after a short wait.
	managed.SetConditions(v1alpha1.ReconcileError(err))
	return reconcile.Result{RequeueAfter: r.shortWait}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)
}

negz · 2019-10-30T19:35:05Z

Right after if !observation.ResourceExists block.

I don't think there's a good way to add a finalizer right after the if !observation.ResourceExists block. Currently the last thing that block does is return and requeue a new reconcile, because otherwise it would fall through to immediately trying to update the external resource it just created. If we maintained this behaviour and added the finalizer after said block we would not really be adding it until after an entire requeue had happened (including an Observe call that may fail).

I think you spotted this because your example looks like a refactored version of the current if !observation.ResourceExists block that leaves the block immediately after creating the external resource and without requeuing, but I feel like this would run into similar problems of proceeding to update without adding some fairly hairy logic.

At this point my feeling is the best option is to add the finalizer right before the if !observation.ResourceExists block. I don't think we need to worry about the case you highlighted in which the Create call fails, and that failure reason also causes the Delete call to fail, because:

By the time we get here we know our Observe call worked. If (for example) our cloud provider credentials were completely wrong, we'd never proceed far enough to add the finalizer.
If Observe works but Create fails (for example because we had RO cloud provider credentials) we would already have added the finalizer, but...
When the managed resource was deleted we'd be able to Observe that the external resource does not exist (because we were never able to Create it) and thus would not call Delete on the external resource and go straight to unpublishing credentials and removing the finalizer.

muvaf · 2019-10-31T12:59:28Z

By the time we get here we know our Observe call worked. If (for example) our cloud provider credentials were completely wrong, we'd never proceed far enough to add the finalizer.

This makes sense. I agree that setting the finalizer before Create call is safer and given that Observe, which is the call that convinces us the deletion succeeded, is called before initializer in your proposal, we wouldn't get into the credential problem stated above. As you pointed out, it's still possible to go with my initial proposal but this one seems better.

muvaf added the enhancement New feature or request label Oct 28, 2019

muvaf mentioned this issue Oct 28, 2019

Make 'Observe' method conditional on having all the references resolved #64

Closed

4 tasks

negz self-assigned this Oct 28, 2019

negz mentioned this issue Oct 31, 2019

Managed resource reconciler refactoring #65

Closed

4 tasks

negz mentioned this issue Nov 1, 2019

Add managed resource finalizer immediately before creating #70

Merged

4 tasks

negz closed this as completed in #70 Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managed reconciler should set the finalizer only after resource is created #63

Managed reconciler should set the finalizer only after resource is created #63

muvaf commented Oct 28, 2019 •

edited

Loading

negz commented Oct 28, 2019

negz commented Oct 28, 2019

negz commented Oct 28, 2019

muvaf commented Oct 29, 2019

negz commented Oct 30, 2019 •

edited

Loading

muvaf commented Oct 31, 2019

Managed reconciler should set the finalizer only after resource is created #63

Managed reconciler should set the finalizer only after resource is created #63

Comments

muvaf commented Oct 28, 2019 • edited Loading

What problem are you facing?

How could Crossplane help solve your problem?

negz commented Oct 28, 2019

negz commented Oct 28, 2019

negz commented Oct 28, 2019

muvaf commented Oct 29, 2019

negz commented Oct 30, 2019 • edited Loading

muvaf commented Oct 31, 2019

muvaf commented Oct 28, 2019 •

edited

Loading

negz commented Oct 30, 2019 •

edited

Loading