-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.1] Machine controller: drain node before machine deletion #1103
[0.1] Machine controller: drain node before machine deletion #1103
Conversation
|
||
// Drain node before deletion | ||
// If a machine is not linked to a node, just delete the machine. | ||
if _, exists := m.ObjectMeta.Annotations[clusterv1.ExcludeNodeDrainingAnnotation]; !exists && m.Status.NodeRef != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't combine in this branch with m.Stat.NodeRef != nil
below because we delete instance before deleting node.
44dd5ab
to
bc50543
Compare
@michaelgugino does the OpenShift drain library respect pod disruption budgets? What happens if some pod sticks around and won't go away? |
/lgtm |
@ncdc yes, it does respect them. If a pod cannot be evicted due to disruption budgets, the drain cannot complete, and the request is requeued by the machine-controller. In cases where it can never complete (in our case, we have etcd-quorum on 3 masters, pdb=1, and if you try delete more than 1 master, you fail), it will just keep requeuing and not forcefully delete anything. |
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: michaelgugino The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should find an alternative library or pull in the drain functionality from kubectl if the imports aren't terrible (note: they probably are)
|
||
"github.com/go-log/log/info" | ||
kubedrain "github.com/openshift/kubernetes-drain" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some reservations about this import. The project itself looks abandoned, there is no README, the last commit was 10 months ago and it is not structured like a standard go library. It does not use any dependency management and it is pulling in an additional logging library that we don't want.
We should avoid importing repos like this that are likely to go stale/are unmaintained. I don't want sig-SCL to adopt maintenance of this library, we already have plenty to keep up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not likely to go stale or unmaintained because we ship it in OpenShift. In any case, this is no different than vendoring in any number of dependencies that you see in the wild. Here's an example of said 'non standard' type library: https://github.com/petar/GoLLRB
It's actually not pulling in an additional logging library, that library is actually just a simple interface (discussed previously).
I've looked into adding support from the kubectl bits as well, we decided against that already if you check out the resolved conversations.
If there's another library you can suggest, I'm happy to use that instead. Or we could vendor the code here so it doesn't go stale. Or we can be good consumers and consume from the community and contribute fixes back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to vendor both the cmd and library portions from k/k?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncdc I looked into that as well. The cmd portion is the tricky part, it imports a lot of other things from k/k. So, it would need a total refactor. I started down this road, but it quickly became a mess.
I think we should either use it as-is, or we should copy the code into cluster-api. I'm in favor of the former, and then when the kubectl stuff gets pulled into it's own project, we can consume that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncdc I confirm this is a bug. We're not actually waiting for the pod to be deleted it seems, and this is not what we want.
Fine with me to merge the open shift library for this version. |
/hold I am reviewing the openshift drain code for comparison with kubernetes/kubernetes. I'll remove the hold once I'm done. |
@michaelgugino do you know why this is ignoring the pod's namespace and switching to use the one defined in the drain options? getPodFn := func(namespace, name string) (*corev1.Pod, error) {
return client.CoreV1().Pods(options.Namespace).Get(name, metav1.GetOptions{})
} |
@ncdc based on the git-blame, it looks like it's a bug, but it's definitely the code we're using. I'll confirm. |
The node draining code itself is imported from github.com/openshift/kubernetes-drain.
This commit updates kubernetes-drain to remove namespace bug. kubernetes-drain pr: openshift/kubernetes-drain#1
@ncdc updated the kubernetes-drain library. PTAL. |
@michaelgugino: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@michaelgugino: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Now that kubernetes/kubernetes#80045 has merged, can you please update to using a temporary copy of that implementation? |
@detiber will do. I'll try to get to it this afternoon. |
@michaelgugino hi! Do you think you'll be able to get to this today or early next week? |
We are no longer working on v0.1.x, and #1096 will add this to v0.2.x. |
@ncdc: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
Centralizes node-drain behavior into cluster-api instead of down-stream actuators.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #994
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note: