This repository has been archived by the owner on Oct 7, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 81
Remove finalizer in controller #656
Open
ostromart
wants to merge
1
commit into
istio:master
Choose a base branch
from
ostromart:remove-finalizer
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By removing the finalizer there's no guarantee that this code will be executed. If it is truly unnecessary, it should be removed, too. If the controller adds the ownerReference to every object it creates, this code is indeed unnecessary, as the garbage collector will delete everything that this code is supposed to delete. If that is the case, this code will just result in race conditions between the GC and the controller and might cause conflicts to be logged in the controller's log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought. This code might not be safe to remove, as it also prunes cluster-scoped resources, which should not have an ownerReference added to them, since a namespaced object should not be the owner of a cluster-scoped object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you have a fix for the finalizer but I think we'd still have the race problem. If you have some time to spend looking into it we can put this PR on hold to see if a fix is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my finalizer fix is for a different problem. Regarding the race itself, I don't see any good solutions.
I don't even think we should be fixing this problem. Users install an operator if they want to automate things. By removing the operator, they are saying they no longer want the automation and would instead like to manage things manually. Removing an operator doesn't mean the operator should remove everything it has deployed, as the user might want to deploy the operator in a different namespace / outside the cluster / deploy a different version of the operator / etc. and the user might want the mesh to be running in the mean time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, agreed, but this is a different scenario. Here, the user is deleting the CR, in which case I think it's reasonable for Istio to be deleted and operator to be untouched.
If operator is deleted, nothing should happen to either Istio or the CR.
What the finalizer was doing was waiting for Istio to be deleted before the CR was deleted. Unfortunately, a common scenario is where users also delete the operator at the same time, in which case the CR is in a frozen state with the finalizer not removed.
In that case, is it reasonable to proceed with this PR? The practical effect is that the CR will be deleted before all of Istio is. Hopefully users will read the instructions to remove everything cleanly but at least we won't have this "stuck because of finalizer" situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the linked issue (istio/istio#18815) says that the operator was deleted, too.
If the user had only deleted the ICP, the operator would have undeployed Istio just fine. Well, maybe not, since the operator and the k8s GC are both racing to delete the resources that have ownerReference set. When deleting a resource, the operator treats any error returned from {{client.Delete()}} as an actual error. But it should really treat a NotFound error as a successful deletion, as the end-state is what the operator wants - regardless if it was the one that deleted the object or if it was deleted by anyone else.
This PR would remove that race, but since there would be no guarantee that reconciler.Delete() will always get called, it would cause resources that the k8s GC doesn't delete to remain in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may still need ownerrefs from ICP CR to Istio resources (we should think this through since there may be subtle problems). But delete is not guaranteed to be called here regardless of whether we have the finalizer or not, since we cannot prevent the user from deleting the controller.
So I'm not claiming this solves all the problems, just improves a bad situation. The stuck finalizer is very hard for users to get out of, whereas it's much simpler to delete any leftover orphaned resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OwnerRefs are not set for all objects created by Istio's deployments (e.g. citiadel creates certs, galley creates validatngwebhookconfiguration, etc. The project needs to take a longer look at ownerrefs across the project.
See here for an example of dangling object that has been causing severe trouble in the operator controller: istio/istio#19164 (comment)
There are other dangling objects not GCed by K8s...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bottom line is:
If you remove the finalizer-managing code from the controller, resources won't get cleaned up properly even if the controller is running.
If you keep the finalizer code, the controller will clean up everything properly, but things will get stuck if you delete the controller.
The point of using a controller is to have fully automated management of the control plane. If the controller doesn't clean up everything it should, it's useless. If a user changes their mind and removes the controller before letting it finish cleaning up, they should be prepared to manually clean things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why the resources won't get cleaned up properly? I may be missing something...
The flow we have here without the finalizer is: