Signal cluster deletion #265

abutcher · 2020-03-30T23:59:35Z

openshift-ci-robot · 2020-03-30T23:59:50Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: abutcher
To complete the pull request process, please assign pmorie
You can assign the PR to them by writing /assign @pmorie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dgoodwin · 2020-03-31T13:13:40Z

enhancements/installer/signal-cluster-deletion.md

+
+#### Upgrade
+
+Create `ClusterAlive` object for existing clusters. (Is this the right place to create for existing clusters ?)


Can this be done as a release image manifest?

That would risk racing against the CVO which will always attempt to create release image content, no?

This is always going to be best-effort, because there's no guarantee that the cluster is alive enough to be able to clean itself up when the aliveness CR is removed. So races are probably acceptable, although it would be good to minimize them where possible.

enhancements/installer/signal-cluster-deletion.md

dgoodwin · 2020-03-31T13:24:12Z

enhancements/installer/signal-cluster-deletion.md

+
+### Test Plan
+
+Deletion of the `ClusterAlive` resource will occur in uninstall by default and will be tested by e2e as part of regular cluster teardown.


Per above flags proposed, the delete failure policy would be helpful here.

dgoodwin · 2020-03-31T13:24:30Z

enhancements/installer/signal-cluster-deletion.md

+kind: ClusterAlive
+metadata:
+  name: cluster
+  namespace: openshift-config


Does this need to be namespaced or should it be global?

I don't think this needs to be namespaced.

Namespace removed in last update.

I don't love non-namespaced things. Especially if we are going to make it a generic Kind like Alive instead of ClusterAlive then having something namespaced I could use the same way for me own purpose would be nice.

Re-added namespace to the object.

abhinavdahiya · 2020-04-02T16:30:50Z

another think we should include is some mechanism to transfer information about the clusteralive delete.. the send delete and wait without any information about what's happening, or what's failing is going to be too frustrating for installer team to debug or users too.

abhinavdahiya · 2020-04-02T16:31:56Z

Also we should allow the admin to modify the clusteralive in such a way that destroy cannot proceed, just as a safety check.

dgoodwin · 2020-04-02T18:34:58Z

Admin being able to block seems useful. Possibly for hive as well.

Is examining the finalizers left on the ClusterAlive sufficient for reporting what's blocking it? Admins could inject their own.

abhinavdahiya · 2020-04-02T18:41:06Z

As long as it's documented

how operators can opt-in
how installer is supposed to trigger
how installer is supposed to track
how admins can add add safety checks, for me this requirement should be, that no deleting should start when admin has set something like this.

dgoodwin · 2020-04-03T12:42:52Z

enhancements/installer/signal-cluster-deletion.md

+
+### Non-Goals
+
+Attaching finalizers to operator resources based on the `ClusterAlive` object which would facilitate removal of operator resources.


Somewhere in this I'd like to flesh out what we expect to happen here. I believe the proposal from OLM was that operators set finalizers on their CRs, and OLM would set a finalizer on ClusterAlive. When deleted, it would delete every CR for every CRD for every operator, giving them their chance to cleanup resources.

This would still work with regard to @abhinavdahiya 's request for an admin override to block this. Assuming ClusterAlive.Spec.BlockTeardown = true, OLM could respect this flag and refuse to start it's teardown process, we only need to check BlockTeardown in one place.

If however operators start adding their finalizers directly to ClusterAlive, we would need to guarantee that they all also knew to respect this additional BlockTeardown field. The OLM layer of indirection seems required so I'd like to make sure they're still on board with that plan to use CRs+CRDs.

cc @ecordell

* ClusterAlive deletion will be requested by flag * Default, non-configurable timeout * CI attempts ClusterAlive deletion and does normal destroy after

enhancements/installer/signal-cluster-deletion.md

eparis · 2020-04-13T16:08:36Z

enhancements/installer/signal-cluster-deletion.md

+metadata:
+  name: cluster
+spec:
+  blockTeardown: true  # block teardown of operator resources


What is this? What does it do? How does it relate?

Also we should allow the admin to modify the clusteralive in such a way that destroy cannot proceed, just as a safety check.

This was suggested by @abhinavdahiya as a mechanism for admins to block teardown in destroy when set by an admin. I'll add this description to the enhancement.

I don't understand the use case or why i'd want what is being described. so hopefully you can lay that out as well. Who is responding to this value in spec? Whatever set the finalizers? Why delete the object if you don't want those things to respond?

#265 (comment)
#265 (comment)

I read those, but I don't think I get it. Is this supposed to prevent someone from 'accidentally' deleting the Alive object and triggering teardown? Are you trying to make it a 2-step process? What's the use case?

Say blockTeardown: true
I try to delete the clusterAlive object
We expect all finalizers on the object to pay attention to spec and just do nothing?
Should they remove themselves from the finalizer list?
Should the ClusterAlive object get deleted? Or should it just live forever with a deletionTimestamp but never be deleted?

And why am I doing this? I'm perfectly happy to understand answers to these questions inside the enhancement instead of here when ready.

so currently when deleting the cluster, using destroy cluster we cleanup all the openshift-core-cluster created resources like the kubernetes.io/cluster/infra-id: owned

we don't delete any user created resources that don't users don't opt-in into the ownership, and that is also best-effort.

not with cluster alive and destroy cluster, there is higher degree of safety concern as now OLM might delete user created resources, like databases etc.

and therefore, i think it's necessary that the admins be allowed to stop prevent any such action to take place.

my initial recommendation was to add this to spec, such that when the user runs destroy cluster --shutdown that installer can fail saying that the admin has rejected such action.

also we could also write a admission plugin that would reject any deletes on the object when this field is set.

the operators like OLM probably shouldn't need to actively reconcile this i.e removing finalizers when this being set, but i don't see why that would be bad...

if we can have all operators always add their finalizers, and installer check this or even better has the admission plugin I think that would be very helpful @eparis

I've added these details to the enhancement for blockTeardown.

eparis · 2020-04-13T16:09:28Z

enhancements/installer/signal-cluster-deletion.md

+
+### Test Plan
+
+In OpenShift CI, clusters will try to delete their `ClusterAlive` object and if that fails, all infra resources will be destroyed.


How do I get a record of every time the --cluster-alive-delete failed?

What do you think about a field in .spec or .status that we increment when we attempt the destroy?

I meant in CI, how do I get that record? how do we know it's broken in CI and not just getting ignored by the ||true forever

...how do we know it's broken in CI...

We have lots of sub-components in the installer that are not independently tested, so I don't think we need to block on closing this one. If you consider it especially vulnerable, are you worried about the installer breaking or a particular component forgetting to register a finalizer or flubbing the finalizer implementation? If you're worried about the installer, you could have CI ask a trusted-enough operator to create an unknown-to-the-installer resource which would block teardown, and then see if the backstopping reapers light up with lots of leaked resources.

the || true would only hit the case of a flubbed implementation from something setting a finalizer, which would be a real bug.

Your suggestion, I think, is that the aws reaper might eventually notice we were wasting resources/money and we could then do some spelunking to figure out what left it behind.

Could we somehow, instead do (obviously psuedocode)

retval = openshift-install destroy cluster --cluster-alive-delete
if retval != success
openshift-install destroy cluster
return retval

This moves the failure to where it is introduced, not to some point later after DPP realizes we're wasting money.

I do agree, we don't have to block closing on this issue. But it doesn't seem that hard.

Updated test plan to include returning the retval of the cluster alive delete test if it failed.

wking · 2020-04-15T18:01:23Z

enhancements/installer/signal-cluster-deletion.md

+
+```yaml
+apiVersion: v1
+kind: ClusterAlive


The name is already expressing Cluster. Maybe the kind should just be Alive so it can be used as a top-level finalizer collector for different sets of resources as well? Like, I dunno:

apiVersion: v1alpha1 kind: Alive metadata: name: human-resources spec: blockTeardown: true

If you wanted to collect a bunch of resources created for HR so you could easily delete them all if HR moved off to a different cluster later.

Name has been changed to Alive.

wking · 2020-04-15T18:02:38Z

enhancements/installer/signal-cluster-deletion.md

+Introduce a new CRD `ClusterAlive` from which operator resources could attach a finalizer initiating cascading delete when the `ClusterAlive` resource is deleted.
+
+```yaml
+apiVersion: v1


Probably start with v1alpha1, right? Probably some kind of generic CRD stabilization procedure whose docs we can link in the Graduation Criteria section below about how long we expect to cook before moving to beta and stable.

Updated object example to v1alpha1 and added a bit of text to graduation criteria re: usage and testing. If you have any docs to a pre-existing CRD graduation criterion I could add it.

* Make Alive object namespaced. * Move API to v1alpha1 and GA plan will include a move from beta to stable following usage & testing. * Include psuedo code for test plan that returns retval of Alive cleanup uninstall failure. * Add details for blockTeardown implementation.

abutcher · 2020-04-21T15:25:10Z

/assign @pmorie

abutcher · 2020-04-21T19:34:01Z

enhancements/installer/signal-cluster-deletion.md

@@ -68,6 +68,8 @@ spec:

 Add an admission plugin that prevents delete when `blockTeardown` has been set.

+We believe it would be best to add the CRD and admission plugin to the cluster version operator. We don't foresee this requiring a controller and doing a separate operator seems like overkill for this object.
+


@LalatenduMohanty Heads up regarding this enhancement. We think it would make the most sense to add this to the CVO. Please let me know if you have any questions or concerns.

No, since CVO is instrumental for working of the cluster.

hey there is no owner for this object, let put in CVO should be immediately stopped.

@abhinavdahiya Where do you suggest this live? In its own separate operator?

@abhinavdahiya this resource would be used to manage the lifecycle of operators. It seems logical that it belong with the thing that manages operators. I could see an argument for OLM since no SLO has external things it would want to destroy, but either the CVO or OLM are what would make the most sense to me.
@eparis thoughts?

@abhinavdahiya this resource would be used to manage the lifecycle of operators.

can you explain how? because to me this helps mange the operands of the operators and not the operators themselves. Its like operators+their operands need k8s api, and therefore CVO should manage the k8s api directly.

Some operator should handle this object and everything required around it. The operator should be able to use the CVO to create the resources for it like CRDs, empty initlal resource instance like others.

can you explain how?

This is a signal telling you that you are about to be destroyed. In my mind, something similar to https://developer.android.com/reference/android/app/Activity#onDestroy() or https://developer.apple.com/documentation/uikit/uiapplicationdelegate/1623111-applicationwillterminate

It seems logical that the same thing that created you will tell you that you're going away. Admittedly not a perfect analogy. It can be used by either operators or operands.

IMHO creating a separate SLO just for this signal seems overkill.

AFAIU CVO does not mange the life cycle of operators. I agree with @abhinavdahiya with it manges the operands of the operators and not the operators themselves. So that's a big shift in CVO's scope if this comes to CVO.

I am wondering if we will have status conditions, history for this custom resource. It might be useful for debugging purpose when we want to track the events that proceeded a deletion. I think we should ask @smarterclayton about the this as it involves CVO.

I see the blockTeardown thread debating the utility of that property. So there is some contention between:

blockTeardown would be nice to have, but needs someone to own the admission hook behind it.

CVO doesn't want to own that hook.

Creating a new second-level operator seems heavy.

Personally, I come down with "so let's drop blockTeardown and suggest users scope their admin auth to not include the ability to delete Alive unless they assume some special role they reserve for super-dangerous things". Then we don't need any code behind this new CRD and CR, just manifests that live somewhere that ends up in the release image for the CVO to reconcile like it does all the other CRD and CR manifests it already manages.

Also, if folks are wondering where the manifest would live, we have precedent in cluster-update-keys for a Git repo with thin OWNERS whose sole purpose is to hold manifests that get stuffed into the release image. Spinning up a new repo for openshift/alive or whatever seems like it would be slightly tedious from an internal-process side, but lightweight for everyone after it gets set up.

ecordell · 2020-04-22T18:47:53Z

enhancements/installer/signal-cluster-deletion.md

+
+### Goals
+
+Signal that cluster deletion has begun through deletion of a new CRD `Alive` and wait for that object to be deleted during uninstall before continuing with removing cluster resources such as instances, storage and networking.


This begs the implementation a bit - it's not clear at first glance why a new CRD would be needed to solve this problem.

ecordell · 2020-04-22T18:54:25Z

enhancements/installer/signal-cluster-deletion.md

+
+### User Stories
+
+#### Story 1


I think it's important to nail these down for readers whose introduction to the topic is this document.

I would suggest the following stories, but I may not have everyone's interests in mind:

As an OpenShift administrator, I would like for any external resources created in service of the cluster be removed when when I delete the cluster, so that:

I do not have to track down those resources and remove them manually

I do not have to pay for resources that are not being used.

As an OpenShift (software, big-O) Operator, I would like to have a way to know that the cluster is being shut down, so that I can perform any necessary cleanup steps and remove external resources.

As an OpenShift administrator, I would like to have a way to indicate that certain external resources that would normally be cleaned up on cluster deletion be preserved on cluster delete, so that I don't lose important resources stored externally.

ecordell · 2020-04-22T18:55:05Z

enhancements/installer/signal-cluster-deletion.md

+
+#### API
+
+Introduce a new CRD `Alive` from which operator resources could attach a finalizer initiating cascading delete when the `Alive` resource is deleted.


Is there a reason this couldn't piggy-back on existing resources, like ClusterVersion?

Deletion of the ClusterVersion and other cluster resources are blocked by validation.

https://github.com/openshift/origin/blob/88baf7f/vendor/k8s.io/kubernetes/openshift-kube-apiserver/admission/customresourcevalidation/config/deny_delete_cluster_config_resource.go#L30

This validation gave us the impression that it could not be used.

dgoodwin · 2020-05-05T11:20:27Z

ping @jeremyeder , KB suggested pinging you here, this is driven by SD's need to have operators that have a chance to cleanup.

…discussion.

derekwaynecarr · 2020-05-07T17:40:06Z

It seems better imo for the service control plane when initiating cluster deletion to uninstall operator/operands rather than put this concept direct into the cluster. Can the enhancement weigh that option and why its not preferred?

jim-minter · 2020-05-07T18:28:04Z

enhancements/installer/signal-cluster-deletion.md

+
+## Summary
+
+Operators can create external resources in clouds and other services that have no opportunity to be cleaned up when a cluster is uninstalled, causing extra work for cluster maintainers to track down and remove these resources. Signalling cluster deletion would allow operators to clean up external resources during uninstallation and would ensure that total cleanup is an automatic process.


This is probably inherently a best-effort process, and there is a risk that people might interpret it as a guaranteed process. An in-cluster mechanism cannot survive people getting fed up waiting for clean-up to happen, taking things into their own hands and hard deleting the cluster themselves. (Think when you last kill -9'ed a process that was taking too long to quit).

jim-minter · 2020-05-07T18:30:35Z

enhancements/installer/signal-cluster-deletion.md

+
+#### Uninstall
+
+Delete a cluster's `Alive` resource during uninstall when requested via flag such as `openshift-install destroy cluster --cluster-alive-delete`. Cluster destroy will wait for a default amount of time and fail if `Alive` deletion was not successful. The default timeout will not be configurable and users will be expected to attempt shutdown multiple times upon failure.


Do enough OCP users use openshift-install destroy rather than deleting the relevant cluster resources directly to make this viable? What about customers or automated systems (e.g. customers' CI systems) that might delete cluster resources directly rather than calling openshift-install destroy?

These arguments make me think that attempting to solve this problem inside the cluster might not be wise.

FWIW, ARO does not use openshift-install destroy and would need implementation work to support this enhancement if it happened.

I will rework this to be less openshift-install specific. We do leverage openshift-install destroy code in hive and indeed other provisioning infra would need to do the same, if they want to make use of this cleanup functionality.

It would be difficult to manage externally because we simply don't know the details for cleanup of random things that might be in the cluster. The idea here was to allow operators (or any other componen) to maintain it's own cleanup logic, we just want to signal to tell it a teardown is imminent.

jim-minter · 2020-05-07T18:32:48Z

enhancements/installer/signal-cluster-deletion.md

+As an OpenShift (software, big-O) Operator, I would like to have a way to know that the cluster is being shut down, so that I can perform any necessary cleanup steps and remove external resources.
+
+#### Story 3
+As an OpenShift administrator, I would like to have a way to indicate that certain external resources that would normally be cleaned up on cluster deletion be preserved on cluster delete, so that I don't lose important resources stored externally.


This could easily be a foot gun. If the external resource is incorrectly marked and holds customer data, it could be quite bad.

I agree with @jim-minter here, we should not rely on label and artifact markup outside the cluster - each resource incluster, should track the artifacts it needs to / wants to lifecycle and our experience should optimise for that.

Agree. The interface for preserving anything off cluster should be a CR owned by the operator that created the off cluster resource(s).

Updates to expand focus to other providers.

Round2

dgoodwin · 2020-05-19T12:03:08Z

Enhancement updated for feedback, a little less openshift-install specific, explicitly called out need for support in our managed cloud offerings that do not use it, dropped the admin blocking functionality per arch call discussions.

kbsingh · 2020-05-26T11:18:52Z

@derekwaynecarr to answer the question around if this can be a service trigger and not a product instance trigger : its not possible to actively track how an operator or an operated resource has evolved in its own lifecycle, in cluster; were going to really need a way for an operator to mature along the path of being able to self-service and self-lifecycle its own assets, based on signals for cluster lifecycle that we can deliver

luciddreamz · 2020-06-16T17:07:13Z

Enhancement updated for feedback, a little less openshift-install specific, explicitly called out need for support in our managed cloud offerings that do not use it, dropped the admin blocking functionality per arch call discussions.

For reference, this has nothing to do with our managed offerings. This RFE is for OCP to signal to OCM (effectively Red Hat Subscription Manager) that the cluster is being destroyed - otherwise OCM doesn't know if the cluster was destroyed or is simply no longer reporting telemetry data (at least for the use case I care about).

gregsheremeta · 2020-06-16T20:24:56Z

this has nothing to do with our managed offerings

Actually the original intent is to signal RHMI that it can cleanup.

Agree that we could also tap into the deletion signal for OCM purposes, but the original intent of this enhancement was for managed services add-ons to know when they should clean themselves up

hasbro17 · 2020-09-02T22:27:35Z

@abhinavdahiya @abutcher @dgoodwin Is this proposal abandoned or still under discussion?
It seems there are still some open questions on who manages the lifecycle of the Alive CRD.

The reason I'm asking is that I'm working on another enhancement where OLM needs to know signal for cluster destruction to initiate the cleanup of OLM managed operands and operators.
And I'm wondering if this is still the plan for signaling cluster destruction.

gregsheremeta · 2020-09-02T22:30:27Z

I'm working on another enhancement where OLM needs to know signal for cluster destruction to initiate the cleanup of OLM managed operands and operators.

That's the reason we opened this.

It's not abandoned, but it needs new ownership. @dgoodwin and @abutcher don't have capacity to work on this any longer.

jsafrane · 2020-09-07T15:22:43Z

enhancements/installer/signal-cluster-deletion.md

+
+The best practice would then become:
+
+  1. Delete the Alive custom resource before destroying the cluster itself.


At which point of cluster de-provision process is Alive deleted?

Volumes in some (most?) cloud platforms can't be deleted until all pods that use the volumes are deleted and the volumes are unmounted + detached from nodes. Should a new storage operator find these pods and start killing them in order to remove all storage resources when Alive is marked for deletion? The pods are likely to be recreated by StatefulSets or even other operators outside of our control. Or can we expect that Alive is deleted when all user workloads (Pods, StatefulSets, ...) or even Nodes are already deleted? What if a cluster component uses storage that needs to be cleaned out (e.g. monitoring or image registry)?

cgwalters · 2020-12-03T16:54:30Z

This may already be here but in realtime discussion about this one thing that came up is basically this should work a bit like Github repository deletion that requires you to type the name of the repository.

Scott suggested that there's a CRD that you create, a core controller (CVO) adds something to its spec (a random key) and you have to edit its status to match that key. Basically the thing performing the deletion needs to be a controller; you can't just create (or delete) an object and have the cluster start deleting itself.

openshift-bot · 2021-03-03T19:42:58Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-04-02T21:36:02Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-05-03T00:57:13Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-05-03T00:57:25Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signal cluster deletion.

798f7e9

openshift-ci-robot requested review from abhinavdahiya, dgoodwin and sdodson March 30, 2020 23:59

dgoodwin reviewed Mar 31, 2020

View reviewed changes

dgoodwin reviewed Apr 3, 2020

View reviewed changes

Updates from comments

dd4263d

* ClusterAlive deletion will be requested by flag * Default, non-configurable timeout * CI attempts ClusterAlive deletion and does normal destroy after

abutcher commented Apr 6, 2020

View reviewed changes

enhancements/installer/signal-cluster-deletion.md Outdated Show resolved Hide resolved

eparis reviewed Apr 13, 2020

View reviewed changes

wking reviewed Apr 15, 2020

View reviewed changes

openshift-ci-robot assigned pmorie Apr 21, 2020

Add note about Alive's potential addition to CVO.

cc4acdd

abutcher commented Apr 21, 2020

View reviewed changes

ecordell reviewed Apr 23, 2020

View reviewed changes

Add user stories and replace CVO placement with a note regarding OLM …

ad25bc0

…discussion.

jim-minter reviewed May 7, 2020

View reviewed changes

dgoodwin and others added 3 commits May 11, 2020 14:06

Updates to expand focus to other providers.

5a79435

Merge pull request #1 from dgoodwin/signal-cluster-deletion

05eba0a

Updates to expand focus to other providers.

Additional updates.

c9271d2

Merge pull request #2 from dgoodwin/round2

22c9b7f

Round2

jsafrane reviewed Sep 7, 2020

View reviewed changes

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 2, 2021

openshift-ci-robot closed this May 3, 2021

abutcher deleted the signal-cluster-deletion branch December 8, 2022 19:11


		#### Upgrade

		Create `ClusterAlive` object for existing clusters. (Is this the right place to create for existing clusters ?)


		### Test Plan

		Deletion of the `ClusterAlive` resource will occur in uninstall by default and will be tested by e2e as part of regular cluster teardown.


		### Non-Goals

		Attaching finalizers to operator resources based on the `ClusterAlive` object which would facilitate removal of operator resources.


		### Test Plan

		In OpenShift CI, clusters will try to delete their `ClusterAlive` object and if that fails, all infra resources will be destroyed.

		@@ -68,6 +68,8 @@ spec:

		Add an admission plugin that prevents delete when `blockTeardown` has been set.

		We believe it would be best to add the CRD and admission plugin to the cluster version operator. We don't foresee this requiring a controller and doing a separate operator seems like overkill for this object.


		### Goals

		Signal that cluster deletion has begun through deletion of a new CRD `Alive` and wait for that object to be deleted during uninstall before continuing with removing cluster resources such as instances, storage and networking.


		#### API

		Introduce a new CRD `Alive` from which operator resources could attach a finalizer initiating cascading delete when the `Alive` resource is deleted.


		## Summary

		Operators can create external resources in clouds and other services that have no opportunity to be cleaned up when a cluster is uninstalled, causing extra work for cluster maintainers to track down and remove these resources. Signalling cluster deletion would allow operators to clean up external resources during uninstallation and would ensure that total cleanup is an automatic process.


		#### Uninstall

		Delete a cluster's `Alive` resource during uninstall when requested via flag such as `openshift-install destroy cluster --cluster-alive-delete`. Cluster destroy will wait for a default amount of time and fail if `Alive` deletion was not successful. The default timeout will not be configurable and users will be expected to attempt shutdown multiple times upon failure.


		The best practice would then become:

		1. Delete the Alive custom resource before destroying the cluster itself.

Signal cluster deletion #265

Signal cluster deletion #265

Conversation

abutcher commented Mar 30, 2020

openshift-ci-robot commented Mar 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavdahiya commented Apr 2, 2020

abhinavdahiya commented Apr 2, 2020

dgoodwin commented Apr 2, 2020

abhinavdahiya commented Apr 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abutcher commented Apr 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csrwng Apr 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wking Apr 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgoodwin commented May 5, 2020

derekwaynecarr commented May 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgoodwin commented May 19, 2020

kbsingh commented May 26, 2020

luciddreamz commented Jun 16, 2020 • edited Loading

gregsheremeta commented Jun 16, 2020

hasbro17 commented Sep 2, 2020

gregsheremeta commented Sep 2, 2020

jsafrane Sep 7, 2020 • edited Loading

Choose a reason for hiding this comment

cgwalters commented Dec 3, 2020 • edited Loading

openshift-bot commented Mar 3, 2021

openshift-bot commented Apr 2, 2021

openshift-bot commented May 3, 2021

openshift-ci-robot commented May 3, 2021

csrwng Apr 22, 2020 •

edited

Loading

wking Apr 22, 2020 •

edited

Loading

luciddreamz commented Jun 16, 2020 •

edited

Loading

jsafrane Sep 7, 2020 •

edited

Loading

cgwalters commented Dec 3, 2020 •

edited

Loading