Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Machine Deletion Hooks Proposal #862

Merged

Conversation

JoelSpeed
Copy link
Contributor

This proposal was originally implemented upstream by @michaelgugino in the Cluster API project.

We would like to include this downstream to allow other controllers to hook into the Machine removal process.
In particular, this will be used by the etcd team to prevent the Machine controller deleting control plane machines when etcd is in a bad state.

CC @enxebre @elmiko @michaelgugino @alexander-demichev for review from the Machine API/Cluster API side

CC @hexfusion @jeremyeder as this relates to the managed control plane instances work

Copy link
Contributor

@hexfusion hexfusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for putting this PR together. I added a few notes/questions on accountability and the durability of using an annotation.

enhancements/machine-api/machine-deletion-hooks.md Outdated Show resolved Hide resolved
enhancements/machine-api/machine-deletion-hooks.md Outdated Show resolved Hide resolved
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, i think this reads well.

my only suggestion might be a mention of the existing alert we have for MachineNotYetDeleted, and how this will continue to operate even if a user has opted-in to the hooks. i think it's good to be explicit that using these hooks will not sidestep the alerting we have in place currently.

@enxebre
Copy link
Member

enxebre commented Aug 12, 2021

Proposal lgtm and it's also already in capi. Though I'm only aware of a use case where we'll want to leverage this i.e control plane automated management and I'm not sure why we could use simply a finaliser?

Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of aligning with the upstream API on this.

enhancements/machine-api/machine-deletion-hooks.md Outdated Show resolved Hide resolved
enhancements/machine-api/machine-deletion-hooks.md Outdated Show resolved Hide resolved
enhancements/machine-api/machine-deletion-hooks.md Outdated Show resolved Hide resolved
to just using annotations in the first place.

### Spec Field
We probably don’t want other controllers dynamically adding and removing spec fields on an object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if this is a concern about granting the permission to another controller or a claim that server-side-apply doesn't allow for "I want this field to exist in spec" (I think it does).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting annotations requires spec update powers, so I don't think you're gaining anything rbac-wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll expand this once I've double checked the upstream conversation that happened around this, but as I mentioned in another comment, I think this was more about trying to avoid mixing controller owned fields into the spec which is normally owned by a user. Will get back to this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll expand this once I've double checked the upstream conversation that happened around this, but as I mentioned in another comment, I think this was more about trying to avoid mixing controller owned fields into the spec which is normally owned by a user. Will get back to this

I'm interested in seeing a link from here. Given that server-side-apply is based on per-field ownership, a subdivided spec is tractable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the upstream conversation on the enhancement PR (I haven't found the original google doc yet):

Part of the reason to use annotations is that the keys have to be unique, so it's less likely that there would be conflicts between different controllers who are dealing with the hooks. (kubernetes-sigs/cluster-api#3132 (comment))

Another comment specifically on this title is leading to a conversation about possibly making this a spec field in a future iteration, though I don't think anything has been actioned from that (kubernetes-sigs/cluster-api#3132 (comment))

Some more concern about the fact that using annotations is unversioned later in the thread, but agreement that this is ok for an alpha API. Interestingly, the CAPI project is looking to move to beta in the next few weeks, there has been no further conversation around this annotation API, I'll poke them to remind them. (kubernetes-sigs/cluster-api#3132 (comment))

If we can convince the upstream community to revisit how this mechanism works and then it moves to a spec field, then that's great, I think that would work well for us, I'm just a little concerned about the timing here. I'd like to provide something for the control plane scaling project to use, preferably with the same API as CAPI so that there's an easy conversion in the future. Perhaps we can implement the annotations temporarily and then move to a spec before we cut a release, assuming we can upstream consensus 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still isn't clear to me. Ownership in the spec can be separated, it's what server-side-aply is for. In addition, the permissions required to modify annotations are identical to permissions required to modify spec, so I don't see a benefit permissions-wise.

Duplicates are prevented by setting the appropriate list merge key and enforcing as we do for other fields.

Conflicts can be managed via server-side apply, patches, or (default) conditional updates. Using a spec field increases expressivity.

Using an annotation for evaluation guarantees that there will be a migration later when a real field is developed.

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is reading really well to me, seems like there are just a couple open questions that need to be resolved:

@JoelSpeed
Copy link
Contributor Author

I've pushed in my last commit the updates to the details in the summary, finalizers and the status reporting stuff.

I've also take a look at the upstream conversation and left some feedback here. Upstream are still using annotations for the moment, and it looks like they had already considered changing how these would work in a later API, but they haven't got to that point for now. So it makes it difficult for us, due to wanting to try to keep parity, should we keep the parity now and deal with the API change later, or do we try and guess what they new API might look like and deal with a potential delta later?

@dhellmann
Copy link
Contributor

I've pushed in my last commit the updates to the details in the summary, finalizers and the status reporting stuff.

I've also take a look at the upstream conversation and left some feedback here. Upstream are still using annotations for the moment, and it looks like they had already considered changing how these would work in a later API, but they haven't got to that point for now. So it makes it difficult for us, due to wanting to try to keep parity, should we keep the parity now and deal with the API change later, or do we try and guess what they new API might look like and deal with a potential delta later?

One benefit of keeping parity is that whatever upgrade mechanism they develop upstream could be used downstream, too.

Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this version of the proposal.

@enxebre
Copy link
Member

enxebre commented Oct 11, 2021

I've also take a look at the upstream conversation and left some feedback here. Upstream are still using annotations for the moment, and it looks like they had already considered changing how these would work in a later API, but they haven't got to that point for now. So it makes it difficult for us, due to wanting to try to keep parity, should we keep the parity now and deal with the API change later, or do we try and guess what they new API might look like and deal with a potential delta later?

The activity around this feature has been very scarce to my awareness. It was only mentioned it here kubernetes-sigs/cluster-api#4707 and we decided to go with core support. I'm personally not aware of any use cases where this is being leveraged (though that doesn't mean there are any).

I think the problem that is motivating this proposal could be scoped and solved with finalisers. But if we are going with a more generic solution with wider scope we should share our vision for the long term and (re-)drive the implementation effort upstream. To this end using the current existing annotations and consolidate back seems reasonable to me. This might be a good starting place to bring this up kubernetes-sigs/cluster-api#5175

@JoelSpeed
Copy link
Contributor Author

I'm personally not aware of any use cases where this is being leveraged (though that doesn't mean there are any).

The only place I'm aware of it being used today upstream is within their KubeADM control plane implementation which uses the mechanism to protect etcd, much as intended within OpenShift.

I think the problem that is motivating this proposal could be scoped and solved with finalisers. But if we are going with a more generic solution with wider scope we should share our vision for the long term and (re-)drive the implementation effort upstream. To this end using the current existing annotations and consolidate back seems reasonable to me. This might be a good starting place to bring this up kubernetes-sigs/cluster-api#5175

I think it may also be worth starting an email thread to gather feedback and see if there are other people using it. With that feedback we can start to work out what we want the API to look like in the future. I can get a thread going to gather this feedback

@JoelSpeed
Copy link
Contributor Author

Have posted a thread in the cluster lifecycle mailing list requesting feedback about how the upstream feature has been used, let's see if we get any interesting notes added in here https://groups.google.com/g/kubernetes-sig-cluster-lifecycle/c/UkautWVDVKM

@JoelSpeed
Copy link
Contributor Author

JoelSpeed commented Oct 14, 2021

Some feedback from the community about how they are using this feature upstream:

We do some workload specific changes on pre-drain to ensure that the transition
when removing the node goes smoothly, and on pre-terminate, we do some
management actions on the hardware (firmware related).

And

just to clarify, it seems like you’re using the feature when removing a collection of machines, to make sure that any workloads on a machine that is going away, get scheduled onto a node that isn’t going away in the near future, which in turn reduces the interruptions to the workloads?

Yes, that's totally it. The pre-deletion hook was the easiest point for us to interrupt and add that feature without the need to maintain a fork of capi for that.
I think if wanted, for upstream this would be more a feature of the machinedeployment or the machineset controller instead.
We do that by determining the machines machinedeployment and all outdated machinesets of the machinedeployment. We then set the taint for all machines of outdated machinesets.

For the Pre-Terminate hook to be more specific:

  • we first set the `node.kubernetes.io/exclude-from-external-load-balancers label on the node in the destination cluster
  • this triggers CCM in the destination cluster to remove the node from the existing load balancers
  • our controller then queries the IaaS API and waits for the removal

after that it also checks if the drain did detach all volumes (normally it is just okay to delete the vm, however due to bugs in our IaaS Provider we started to clean this disks up prior to deleting the machines which did help)

So it seems that some people are using the feature in the upstream at least to be more careful about how their machines are removed, and in particular, the use case about checking for other machines going away to minimise disruptions seems quite interesting. I guess when the drain starts there is nothing to guarantee that other deleted machines have already been cordoned.


That aside, I'm not sure if there's anything else left for me to address here, is there anything blocking from anyone for this or are we happy to go ahead with the annotation approach until we work out a more formal API with the upstream?

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reads really well to me, no criticisms or suggestions
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021
@JoelSpeed
Copy link
Contributor Author

Keen to move this enhancement forward as there hasn't been any further comment recently.

Anyone on the owners list happy to add their approval?

JoelSpeed added a commit to JoelSpeed/api that referenced this pull request Nov 4, 2021
This is to implement openshift/enhancements#862 
as it currently stands. These consts will be used as annotation key 
prefixes on machines as an OpenShift internal API for the time being. 
Once the upstream creates their new API for this feature, we will copy 
their changes and work on a migration path between the internal 
annotation API and the first class API
@JoelSpeed
Copy link
Contributor Author

I have now started the implementation work for this in case anyone would like to review, it comes as three PRs for the moment:

@elmiko
Copy link
Contributor

elmiko commented Nov 8, 2021

thanks again Joel
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2021
Copy link
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using annotation instead of structured fields is not recommended because of

  1. lack of expressivity
  2. lack of versioning for future changes
  3. guaranteed migration pain during API evolution
    You can see the history of pain we had in kube proper here https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#new-field-in-existing-api-version

The Machine status will contain conditions identifying why the Machine controller
has not progressed in removing the Machine. It is expected that hook implementing
controllers should signal to users when they are having issues via some mechanism
(eg. their `ClusterOperator` status) and that the conditions on the Machine should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clusteroperator status isn't likely to be an entry point given the cardinality differences between the hook provider and the number of machines.

Is there any mechanism to be included in the machine status. These hooks are pretty deep and condition names can be scope limited. What about encouraging these hooks to place a condition aligned to their name when they fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A well behaved operator could update the Machine status if they so desired, there's nothing to prevent them from doing so. We have a Conditions list which fits the normal pattern, so they could introduce their own condition. Our controllers will leave unrecognised conditions alone so this should be safe.

Encouraging a condition specific to the operator seems reasonable. I guess we were thinking of the etcd operator use case as an example, the issue would be that the etcd operator can't remove the hook because the etcd operator is waiting for the etcd member to be safe to be removed from the cluster, would it not also make sense to report issues with this process at the etcd operator level as well?

DeletionTimestamp or enters the "Deleting" phase.

Fine-tuned coordination is not possible at this time; eg, it's not
possible to execute a pre-terminate hook only after a node has been drained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't obvious to me. Why can a hook implementing controller not detect that the node has been drained?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not impossible, but I guess it's more difficult. Today we don't annotate or signify on a machine anywhere that we have completed a drain operation. We add a taint when we start, but the drain could fail for a period due to PDBs. To be sure the drain has been completed, the HIC would have to, today at least, identify the same set of pods that the machine controller is removing, and work out if that list is non-zero.

This could of course be better expressed by a Machine condition where we mark the machine drained once we think it's drained.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2021
Owner string `json:"owner"`
}
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please flesh out the new conditions types and reasons here?

When errors occur, hook implementing controllers are encouraged to add a condition
to Machines with details of the error. The condition `Type` should be formad as the
hook name, prefixed by the lifecycle point. For example, for a `preDrain` hook with name
`MigrateImportantApp`, the condition `Type` would be `PreDrainMigrateImportantApp`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PreDrainMigrateImportantAppSucceeded ? So there's a clear meaning to true an false.

// Owner defines the owner of the lifecycle hook.
// This should be descriptive enough so that users can identify
// who/what is responsible for blocking the lifecycle.
// This could be the name of a controller or an administrator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: example with clusteroperator/etcd perhaps?

@deads2k
Copy link
Contributor

deads2k commented Nov 12, 2021

api, failure handling, and debugging lgtm

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content looks good to me, i made some suggestions to help with consistency and also calling out a few mentions of annotations. thanks Joel!

### Deletion Phase
Describes when a machine has been marked for deletion but is still present
in the API. Various actions happen during this phase, such as draining a node,
deleting an instance from a cloud provider, and deleting the node object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit for consistency (we seem to use "cloud provider" or "infrastructure provider" interchangeably)

Suggested change
deleting an instance from a cloud provider, and deleting the node object.
deleting an instance from an infrastructure provider, and deleting the node object.


## Summary

Defines a set of lifecycle hookos that can be applied to a Machine which can be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit

Suggested change
Defines a set of lifecycle hookos that can be applied to a Machine which can be used
Defines a set of lifecycle hooks that can be applied to a Machine which can be used

### Implementation Details/Notes/Constraints

For each defined lifecycle point, one or more hooks may be applied within the spec
of the machine object. These annotations will pause reconciliation of a Machine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still referencing annotations here

```

Hooks defined at this point will prevent the machine-controller from
removing/terminating the instance in the cloud provider until the hooks are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit for consistency

Suggested change
removing/terminating the instance in the cloud provider until the hooks are
removing/terminating the instance in the infrastructure provider until the hooks are

removed.

"pre-terminate" has been chosen over "pre-delete" because "terminate" is more
easily associated with an instance being removed from the cloud or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could just drop "cloud" here

Suggested change
easily associated with an instance being removed from the cloud or
easily associated with an instance being removed from the

* Coordinate with other Hook Implementing Controllers through any means
possible, such as using common annotations, CRDs, etc. For example, one hook
controller could set an annotation indicating it has finished its work, and
another hook controller could wait for the presence of the annotation before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still referencing annotations here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is a legitimate reference to an annotation, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, my confusion

@JoelSpeed
Copy link
Contributor Author

/override ci/prow/markdownlint

The linter is complaining of missing fields that were added to the template after this PR was majorly reviewed. The new sections won't add much for this proposal so I'm not going to add them at this late stage.

I think this is ready to merge now as all of the feedback has been addressed and the API is updated and has been approved by David (not sure how to dismiss the change requested block)

The PRs linked in #862 (comment) are now also up to date for this version of the proposal.

@elmiko @enxebre Could you add the LGTM and Approve labels respectively if you are happy with the final changes

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2021

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/markdownlint

In response to this:

/override ci/prow/markdownlint

The linter is complaining of missing fields that were added to the template after this PR was majorly reviewed. The new sections won't add much for this proposal so I'm not going to add them at this late stage.

I think this is ready to merge now as all of the feedback has been addressed and the API is updated and has been approved by David (not sure how to dismiss the change requested block)

The PRs linked in #862 (comment) are now also up to date for this version of the proposal.

@elmiko @enxebre Could you add the LGTM and Approve labels respectively if you are happy with the final changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reads well to me, thanks for all the updates @JoelSpeed
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2021
This creates a tight coupling between the Machine controller and other components within OpenShift.
The upstream solution today, matches the proposal set out in this document. As we are experimenting
with Cluster API in HyperShift, Central Machine Management and for future IPI clusters, it is
preferred to not create a tight coupling with these components and to copy the upstream solution to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, none of these fields exists upstream today, though nothing preclude us from proposing this approach. A hook mechanism is to be discussed as part of kubernetes-sigs/cluster-api#5556 (comment)

@enxebre
Copy link
Member

enxebre commented Nov 22, 2021

I still think this could have been hidden to users by using a particular finalizer to satisfy the main known and pursued use case which is the smooth etcd removal (which also it'd be nice to elaborate in this proposal) so we could keep API changes small without exposing more changes to users given the plans already to introduce CAPI. Nevertheless I think this field strategy is also solid and provides a flexibility that may enable other use cases so I'm not oppose to it. Let's implement it and revisit later if necessary.

/lgtm
for @deads2k to approve since he has some not resolved comments I think.

@deads2k
Copy link
Contributor

deads2k commented Nov 22, 2021

/lgtm
for @deads2k to approve since he has some not resolved comments I think.

Thanks. With the API changes and condition specification, I like this.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 22, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, hexfusion

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2021
@JoelSpeed
Copy link
Contributor Author

/override ci/prow/markdownlint

Based on above comments, the new headings were added late in the review process for this enahncement

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 22, 2021

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/markdownlint

In response to this:

/override ci/prow/markdownlint

Based on above comments, the new headings were added late in the review process for this enahncement

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 291f25a into openshift:master Nov 22, 2021
@JoelSpeed JoelSpeed deleted the machine-deletion-hooks branch November 22, 2021 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants