-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal cluster deletion #265
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: abutcher The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
#### Upgrade | ||
|
||
Create `ClusterAlive` object for existing clusters. (Is this the right place to create for existing clusters ?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done as a release image manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would risk racing against the CVO which will always attempt to create release image content, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always going to be best-effort, because there's no guarantee that the cluster is alive enough to be able to clean itself up when the aliveness CR is removed. So races are probably acceptable, although it would be good to minimize them where possible.
|
||
### Test Plan | ||
|
||
Deletion of the `ClusterAlive` resource will occur in uninstall by default and will be tested by e2e as part of regular cluster teardown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per above flags proposed, the delete failure policy would be helpful here.
kind: ClusterAlive | ||
metadata: | ||
name: cluster | ||
namespace: openshift-config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be namespaced or should it be global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be namespaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace removed in last update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love non-namespaced things. Especially if we are going to make it a generic Kind like Alive
instead of ClusterAlive
then having something namespaced I could use the same way for me own purpose would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-added namespace to the object.
another think we should include is some mechanism to transfer information about the clusteralive delete.. the send delete and wait without any information about what's happening, or what's failing is going to be too frustrating for installer team to debug or users too. |
Also we should allow the admin to modify the clusteralive in such a way that destroy cannot proceed, just as a safety check. |
Admin being able to block seems useful. Possibly for hive as well. Is examining the finalizers left on the ClusterAlive sufficient for reporting what's blocking it? Admins could inject their own. |
As long as it's documented
|
|
||
### Non-Goals | ||
|
||
Attaching finalizers to operator resources based on the `ClusterAlive` object which would facilitate removal of operator resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in this I'd like to flesh out what we expect to happen here. I believe the proposal from OLM was that operators set finalizers on their CRs, and OLM would set a finalizer on ClusterAlive. When deleted, it would delete every CR for every CRD for every operator, giving them their chance to cleanup resources.
This would still work with regard to @abhinavdahiya 's request for an admin override to block this. Assuming ClusterAlive.Spec.BlockTeardown = true, OLM could respect this flag and refuse to start it's teardown process, we only need to check BlockTeardown in one place.
If however operators start adding their finalizers directly to ClusterAlive, we would need to guarantee that they all also knew to respect this additional BlockTeardown field. The OLM layer of indirection seems required so I'd like to make sure they're still on board with that plan to use CRs+CRDs.
cc @ecordell
* ClusterAlive deletion will be requested by flag * Default, non-configurable timeout * CI attempts ClusterAlive deletion and does normal destroy after
metadata: | ||
name: cluster | ||
spec: | ||
blockTeardown: true # block teardown of operator resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this? What does it do? How does it relate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should allow the admin to modify the clusteralive in such a way that destroy cannot proceed, just as a safety check.
This was suggested by @abhinavdahiya as a mechanism for admins to block teardown in destroy when set by an admin. I'll add this description to the enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the use case or why i'd want what is being described. so hopefully you can lay that out as well. Who is responding to this value in spec? Whatever set the finalizers? Why delete the object if you don't want those things to respond?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read those, but I don't think I get it. Is this supposed to prevent someone from 'accidentally' deleting the Alive object and triggering teardown? Are you trying to make it a 2-step process? What's the use case?
Say blockTeardown: true
I try to delete the clusterAlive object
We expect all finalizers on the object to pay attention to spec and just do nothing?
Should they remove themselves from the finalizer list?
Should the ClusterAlive object get deleted? Or should it just live forever with a deletionTimestamp but never be deleted?
And why am I doing this? I'm perfectly happy to understand answers to these questions inside the enhancement instead of here when ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so currently when deleting the cluster, using destroy cluster we cleanup all the openshift-core-cluster created resources like the kubernetes.io/cluster/infra-id: owned
we don't delete any user created resources that don't users don't opt-in into the ownership, and that is also best-effort.
not with cluster alive and destroy cluster
, there is higher degree of safety concern as now OLM might delete user created resources, like databases etc.
and therefore, i think it's necessary that the admins be allowed to stop prevent any such action to take place.
my initial recommendation was to add this to spec, such that when the user runs destroy cluster --shutdown
that installer can fail saying that the admin has rejected such action.
also we could also write a admission plugin that would reject any deletes on the object when this field is set.
the operators like OLM probably shouldn't need to actively reconcile this i.e removing finalizers when this being set, but i don't see why that would be bad...
if we can have all operators always add their finalizers, and installer check this or even better has the admission plugin I think that would be very helpful @eparis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added these details to the enhancement for blockTeardown
.
|
||
### Test Plan | ||
|
||
In OpenShift CI, clusters will try to delete their `ClusterAlive` object and if that fails, all infra resources will be destroyed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I get a record of every time the --cluster-alive-delete
failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about a field in .spec
or .status
that we increment when we attempt the destroy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant in CI, how do I get that record? how do we know it's broken in CI and not just getting ignored by the ||true forever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...how do we know it's broken in CI...
We have lots of sub-components in the installer that are not independently tested, so I don't think we need to block on closing this one. If you consider it especially vulnerable, are you worried about the installer breaking or a particular component forgetting to register a finalizer or flubbing the finalizer implementation? If you're worried about the installer, you could have CI ask a trusted-enough operator to create an unknown-to-the-installer resource which would block teardown, and then see if the backstopping reapers light up with lots of leaked resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the || true
would only hit the case of a flubbed implementation from something setting a finalizer, which would be a real bug.
Your suggestion, I think, is that the aws reaper might eventually notice we were wasting resources/money and we could then do some spelunking to figure out what left it behind.
Could we somehow, instead do (obviously psuedocode)
retval = openshift-install destroy cluster --cluster-alive-delete
if retval != success
openshift-install destroy cluster
return retval
This moves the failure to where it is introduced, not to some point later after DPP realizes we're wasting money.
I do agree, we don't have to block closing on this issue. But it doesn't seem that hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test plan to include returning the retval of the cluster alive delete test if it failed.
|
||
```yaml | ||
apiVersion: v1 | ||
kind: ClusterAlive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is already expressing Cluster
. Maybe the kind should just be Alive
so it can be used as a top-level finalizer collector for different sets of resources as well? Like, I dunno:
apiVersion: v1alpha1
kind: Alive
metadata:
name: human-resources
spec:
blockTeardown: true
If you wanted to collect a bunch of resources created for HR so you could easily delete them all if HR moved off to a different cluster later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name has been changed to Alive
.
Introduce a new CRD `ClusterAlive` from which operator resources could attach a finalizer initiating cascading delete when the `ClusterAlive` resource is deleted. | ||
|
||
```yaml | ||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably start with v1alpha1, right? Probably some kind of generic CRD stabilization procedure whose docs we can link in the Graduation Criteria section below about how long we expect to cook before moving to beta and stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated object example to v1alpha1
and added a bit of text to graduation criteria re: usage and testing. If you have any docs to a pre-existing CRD graduation criterion I could add it.
* Make Alive object namespaced. * Move API to v1alpha1 and GA plan will include a move from beta to stable following usage & testing. * Include psuedo code for test plan that returns retval of Alive cleanup uninstall failure. * Add details for blockTeardown implementation.
/assign @pmorie |
@@ -68,6 +68,8 @@ spec: | |||
|
|||
Add an admission plugin that prevents delete when `blockTeardown` has been set. | |||
|
|||
We believe it would be best to add the CRD and admission plugin to the cluster version operator. We don't foresee this requiring a controller and doing a separate operator seems like overkill for this object. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LalatenduMohanty Heads up regarding this enhancement. We think it would make the most sense to add this to the CVO. Please let me know if you have any questions or concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, since CVO is instrumental for working of the cluster.
hey there is no owner for this object, let put in CVO
should be immediately stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavdahiya Where do you suggest this live? In its own separate operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavdahiya this resource would be used to manage the lifecycle of operators. It seems logical that it belong with the thing that manages operators. I could see an argument for OLM since no SLO has external things it would want to destroy, but either the CVO or OLM are what would make the most sense to me.
@eparis thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavdahiya this resource would be used to manage the lifecycle of operators.
can you explain how? because to me this helps mange the operands of the operators and not the operators themselves. Its like operators+their operands need k8s api, and therefore CVO should manage the k8s api directly.
Some operator should handle this object and everything required around it. The operator should be able to use the CVO to create the resources for it like CRDs, empty initlal resource instance like others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain how?
This is a signal telling you that you are about to be destroyed. In my mind, something similar to https://developer.android.com/reference/android/app/Activity#onDestroy() or https://developer.apple.com/documentation/uikit/uiapplicationdelegate/1623111-applicationwillterminate
It seems logical that the same thing that created you will tell you that you're going away. Admittedly not a perfect analogy. It can be used by either operators or operands.
IMHO creating a separate SLO just for this signal seems overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU CVO does not mange the life cycle of operators. I agree with @abhinavdahiya with it manges the operands of the operators and not the operators themselves
. So that's a big shift in CVO's scope if this comes to CVO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we will have status conditions, history for this custom resource. It might be useful for debugging purpose when we want to track the events that proceeded a deletion. I think we should ask @smarterclayton about the this as it involves CVO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the blockTeardown
thread debating the utility of that property. So there is some contention between:
blockTeardown
would be nice to have, but needs someone to own the admission hook behind it.- CVO doesn't want to own that hook.
- Creating a new second-level operator seems heavy.
Personally, I come down with "so let's drop blockTeardown
and suggest users scope their admin auth to not include the ability to delete Alive
unless they assume some special role they reserve for super-dangerous things". Then we don't need any code behind this new CRD and CR, just manifests that live somewhere that ends up in the release image for the CVO to reconcile like it does all the other CRD and CR manifests it already manages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if folks are wondering where the manifest would live, we have precedent in cluster-update-keys for a Git repo with thin OWNERS
whose sole purpose is to hold manifests that get stuffed into the release image. Spinning up a new repo for openshift/alive or whatever seems like it would be slightly tedious from an internal-process side, but lightweight for everyone after it gets set up.
|
||
### Goals | ||
|
||
Signal that cluster deletion has begun through deletion of a new CRD `Alive` and wait for that object to be deleted during uninstall before continuing with removing cluster resources such as instances, storage and networking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This begs the implementation a bit - it's not clear at first glance why a new CRD would be needed to solve this problem.
|
||
### User Stories | ||
|
||
#### Story 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to nail these down for readers whose introduction to the topic is this document.
I would suggest the following stories, but I may not have everyone's interests in mind:
- As an OpenShift administrator, I would like for any external resources created in service of the cluster be removed when when I delete the cluster, so that:
- I do not have to track down those resources and remove them manually
- I do not have to pay for resources that are not being used.
-
As an OpenShift (software, big-O) Operator, I would like to have a way to know that the cluster is being shut down, so that I can perform any necessary cleanup steps and remove external resources.
-
As an OpenShift administrator, I would like to have a way to indicate that certain external resources that would normally be cleaned up on cluster deletion be preserved on cluster delete, so that I don't lose important resources stored externally.
|
||
#### API | ||
|
||
Introduce a new CRD `Alive` from which operator resources could attach a finalizer initiating cascading delete when the `Alive` resource is deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this couldn't piggy-back on existing resources, like ClusterVersion
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deletion of the ClusterVersion
and other cluster resources are blocked by validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation gave us the impression that it could not be used.
ping @jeremyeder , KB suggested pinging you here, this is driven by SD's need to have operators that have a chance to cleanup. |
It seems better imo for the service control plane when initiating cluster deletion to uninstall operator/operands rather than put this concept direct into the cluster. Can the enhancement weigh that option and why its not preferred? |
|
||
## Summary | ||
|
||
Operators can create external resources in clouds and other services that have no opportunity to be cleaned up when a cluster is uninstalled, causing extra work for cluster maintainers to track down and remove these resources. Signalling cluster deletion would allow operators to clean up external resources during uninstallation and would ensure that total cleanup is an automatic process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably inherently a best-effort process, and there is a risk that people might interpret it as a guaranteed process. An in-cluster mechanism cannot survive people getting fed up waiting for clean-up to happen, taking things into their own hands and hard deleting the cluster themselves. (Think when you last kill -9'ed a process that was taking too long to quit).
|
||
#### Uninstall | ||
|
||
Delete a cluster's `Alive` resource during uninstall when requested via flag such as `openshift-install destroy cluster --cluster-alive-delete`. Cluster destroy will wait for a default amount of time and fail if `Alive` deletion was not successful. The default timeout will not be configurable and users will be expected to attempt shutdown multiple times upon failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do enough OCP users use openshift-install destroy
rather than deleting the relevant cluster resources directly to make this viable? What about customers or automated systems (e.g. customers' CI systems) that might delete cluster resources directly rather than calling openshift-install destroy
?
These arguments make me think that attempting to solve this problem inside the cluster might not be wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, ARO does not use openshift-install destroy
and would need implementation work to support this enhancement if it happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will rework this to be less openshift-install specific. We do leverage openshift-install destroy code in hive and indeed other provisioning infra would need to do the same, if they want to make use of this cleanup functionality.
It would be difficult to manage externally because we simply don't know the details for cleanup of random things that might be in the cluster. The idea here was to allow operators (or any other componen) to maintain it's own cleanup logic, we just want to signal to tell it a teardown is imminent.
As an OpenShift (software, big-O) Operator, I would like to have a way to know that the cluster is being shut down, so that I can perform any necessary cleanup steps and remove external resources. | ||
|
||
#### Story 3 | ||
As an OpenShift administrator, I would like to have a way to indicate that certain external resources that would normally be cleaned up on cluster deletion be preserved on cluster delete, so that I don't lose important resources stored externally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could easily be a foot gun. If the external resource is incorrectly marked and holds customer data, it could be quite bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jim-minter here, we should not rely on label and artifact markup outside the cluster - each resource incluster, should track the artifacts it needs to / wants to lifecycle and our experience should optimise for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. The interface for preserving anything off cluster should be a CR owned by the operator that created the off cluster resource(s).
Updates to expand focus to other providers.
Enhancement updated for feedback, a little less openshift-install specific, explicitly called out need for support in our managed cloud offerings that do not use it, dropped the admin blocking functionality per arch call discussions. |
@derekwaynecarr to answer the question around if this can be a service trigger and not a product instance trigger : its not possible to actively track how an operator or an operated resource has evolved in its own lifecycle, in cluster; were going to really need a way for an operator to mature along the path of being able to self-service and self-lifecycle its own assets, based on signals for cluster lifecycle that we can deliver |
For reference, this has nothing to do with our managed offerings. This RFE is for OCP to signal to OCM (effectively Red Hat Subscription Manager) that the cluster is being destroyed - otherwise OCM doesn't know if the cluster was destroyed or is simply no longer reporting telemetry data (at least for the use case I care about). |
Actually the original intent is to signal RHMI that it can cleanup. Agree that we could also tap into the deletion signal for OCM purposes, but the original intent of this enhancement was for managed services add-ons to know when they should clean themselves up |
@abhinavdahiya @abutcher @dgoodwin Is this proposal abandoned or still under discussion? The reason I'm asking is that I'm working on another enhancement where OLM needs to know signal for cluster destruction to initiate the cleanup of OLM managed operands and operators. |
That's the reason we opened this. It's not abandoned, but it needs new ownership. @dgoodwin and @abutcher don't have capacity to work on this any longer. |
|
||
The best practice would then become: | ||
|
||
1. Delete the Alive custom resource before destroying the cluster itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At which point of cluster de-provision process is Alive
deleted?
Volumes in some (most?) cloud platforms can't be deleted until all pods that use the volumes are deleted and the volumes are unmounted + detached from nodes. Should a new storage operator find these pods and start killing them in order to remove all storage resources when Alive
is marked for deletion? The pods are likely to be recreated by StatefulSets or even other operators outside of our control. Or can we expect that Alive
is deleted when all user workloads (Pods, StatefulSets, ...) or even Nodes are already deleted? What if a cluster component uses storage that needs to be cleaned out (e.g. monitoring or image registry)?
This may already be here but in realtime discussion about this one thing that came up is basically this should work a bit like Github repository deletion that requires you to type the name of the repository. Scott suggested that there's a CRD that you create, a core controller (CVO) adds something to its spec (a random key) and you have to edit its status to match that key. Basically the thing performing the deletion needs to be a controller; you can't just create (or delete) an object and have the cluster start deleting itself. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @sdodson @dgoodwin @abhinavdahiya