Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachineDeployment rolls out Unhealthy Machines with Ready Nodes #11023

Closed
sm4ll-3gg opened this issue Aug 6, 2024 · 11 comments
Closed

MachineDeployment rolls out Unhealthy Machines with Ready Nodes #11023

sm4ll-3gg opened this issue Aug 6, 2024 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@sm4ll-3gg
Copy link

sm4ll-3gg commented Aug 6, 2024

What steps did you take and what happened?

We have Node Problem Detector deployed which reports additional conditions for workload cluster Nodes. There's a task to avoid scailing downs old Machine Set until new nodes are completely available (including additional condition), so we tried to use MachineHealthCheck to make Machines not ready when Node doesn't satisfy required conditions.

So, how it looks:

  • MD scales a new MS up
  • A Node starts broken (but Ready = True)
  • Machine becomes Ready = False because of HealthCheckSucceeded = False
  • The MS shows that all Machines are Ready
  • MD scales old MS down
  • All new Machines are broken, all old healthy Machines are deleted

What did you expect to happen?

Provided that MachineSet orchestrate Machines and a Machine has its own status (which is wider than related Node status), I guess its expected that MachineSet.AvailableReplicas is equal to the count of underlying Ready Machines (instead of Nodes)

Cluster API version

latest (for now)

Kubernetes version

No response

Anything else you would like to add?

The original decision to count Ready Nodes for MachineSet.AvailableReplicas was made here: #47. Is it possible to reassess it?

I'd suggest one of solutions:

  • Replace a Node Ready condition with both Ready and NodeHealthy of the Machine (to preserve backward compatibility)
  • Change both Machine and MachineSet controllers
    • M: Add NodeHealthy condition to the Machine summary calculation (in order to propagate Readiness from Node to Machine)
    • MS: Replace Node Ready condition with single Machine Ready one

Code references:

By the way, the MachinePool controller behaves the same

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. labels Aug 6, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 6, 2024
@sm4ll-3gg
Copy link
Author

@sbueringer hi! Could you please take a look when you have a bit of time? 🙏

P.S. Sorry if the mention is incorrect, I choose you because you were the one doing triage for the most of PRs

@chrischdi
Copy link
Member

The original decision to count Ready Nodes for MachineSet.AvailableReplicas was made here: #47. Is it possible to reassess it?

This already gets re-assessed in this Proposal:

Via the proposal, a new ExperimentalAvailableReplicas gets introduced to v1beta1 (and will later be AvailableReplicas in v1beta2, and the old one gets removed).

Also AvailableReplicas will determine Machine's availability via Machine's Available condition instead of computing availability as of today (based on the Node Ready condition)

Also the proposal changes how the Ready condition gets calculated.

Question: Could it help that setting a proper MachineDeployment.spec.minReadySeconds to a value which would allow the healthcheck to add the condition before it is marked as ready?

@sm4ll-3gg
Copy link
Author

Oh, that's exactly what we need. Thank you!
Looking forward for these changes are released :)

Could it help that setting a proper MachineDeployment.spec.minReadySeconds to a value which would allow the healthcheck to add the condition before it is marked as ready?

Unfortunately, no, until the proposal is accepted and implemented. That's because MHC makes a Machine unready, but MS doesn't care about it during AvailableReplicas calculation

@enxebre
Copy link
Member

enxebre commented Aug 26, 2024

We have Node Problem Detector deployed which reports additional conditions for workload cluster Nodes. There's a task to avoid scailing downs old Machine Set until new nodes are completely available (including additional condition)

fwiw one thought for a potential way to delay scale down until MachineReadinessGate and the prop Christian mentions get implemented, would be to have your own controller and a blocking PDB for you pool of Nodes that only increases disruption budget as your desired condition is set.

Also could you elaborate on the use case? You might also be able to leverage taints and tolerations on your workloads to achieve delaying the scale down until running workload can be re allocated.

@sm4ll-3gg
Copy link
Author

Also could you elaborate on the use case? You might also be able to leverage taints and tolerations on your workloads to achieve delaying the scale down until running workload can be re allocated.

I guess we are talking about different things. Let me clarify our intent)

TLDR: We want MachineDeployment to pause roll out of unhealthy Machines based not only on Node readiness

Let's imagine that we have a bug in CCM, which installed in our cluster. It creates improperly configured nodes and NPD (for example) detects it, but kubeletet decides that the Node is ready. In current implementation a roll out of Machine Deployment will lead to all old healthy machines are replaced with new unhealthy ones (despite all new machines could be Ready = false).

If I understand you correctly, then PDB and taints/tolerations are unable to help us because they affect only workload lifecycle. We won't have pods running on broken nodes, but cluster capacity will be undesirably reduced anyway. Let me know please if you meant something else!

@chrischdi
Copy link
Member

cc @fabriziopandini because this is related to recent discussions about "Availablity".

@fabriziopandini
Copy link
Member

IMO, using MHC is definitely not the way to solve this issue, because MHC will trigger machine deletion.
Also, as of today, Machine.Ready is not considered during rollout.

Also based on my understanding of this issue, the proposal linked above is going to make it possible to solve this issue by introducing two changes:

  • Machine.Available condition (and thus Machines.Ready condition) will be considered during rollout
  • Machne.Ready condition will be extensible via ReadynessGates

This will give you the option to add into the mix additional conditions surfacing at machine level whatever is relevant for you, in this case a condition added by the node detector on the corresponding Node. Please note that this will require to write your own controller

If that's ok, I would suggest to close this issue since this is already being tracked somewhere else

@sm4ll-3gg
Copy link
Author

IMO, using MHC is definitely not the way to solve this issue, because MHC will trigger machine deletion.

Actually MHC support skipping remediation, so I thought it would be a nice way to reflect Node conditions on Machine.


Overall, it seems that the proposal does solves the issue. If you think it's better to close it, I'm OK with that. I left it opened just to highlight that we faced the problem and let you consider this during the discussion of the proposal! :)

@fabriziopandini
Copy link
Member

/close

As per comment above, this should be addressed by #10897

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close

As per comment above, this should be addressed by #10897

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

5 participants