MachineDeployment rolls out Unhealthy Machines with Ready Nodes #11023

sm4ll-3gg · 2024-08-06T16:33:14Z

What steps did you take and what happened?

We have Node Problem Detector deployed which reports additional conditions for workload cluster Nodes. There's a task to avoid scailing downs old Machine Set until new nodes are completely available (including additional condition), so we tried to use MachineHealthCheck to make Machines not ready when Node doesn't satisfy required conditions.

So, how it looks:

MD scales a new MS up
A Node starts broken (but Ready = True)
Machine becomes Ready = False because of HealthCheckSucceeded = False
The MS shows that all Machines are Ready
MD scales old MS down
All new Machines are broken, all old healthy Machines are deleted

What did you expect to happen?

Provided that MachineSet orchestrate Machines and a Machine has its own status (which is wider than related Node status), I guess its expected that MachineSet.AvailableReplicas is equal to the count of underlying Ready Machines (instead of Nodes)

Cluster API version

latest (for now)

Kubernetes version

No response

Anything else you would like to add?

The original decision to count Ready Nodes for MachineSet.AvailableReplicas was made here: #47. Is it possible to reassess it?

I'd suggest one of solutions:

Replace a Node Ready condition with both Ready and NodeHealthy of the Machine (to preserve backward compatibility)
Change both Machine and MachineSet controllers
- M: Add NodeHealthy condition to the Machine summary calculation (in order to propagate Readiness from Node to Machine)
- MS: Replace Node Ready condition with single Machine Ready one

Code references:

MD Controller scales down old MS based on AvailableReplicas: https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_rolling.go#L155
MS Controller calculates AvailableReplicas based on Node conditions: https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machineset/machineset_controller.go#L939
M Controller readiness calculation: https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machine/machine_controller.go#L250

By the way, the MachinePool controller behaves the same

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-08-06T16:33:22Z

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sm4ll-3gg · 2024-08-10T15:53:43Z

@sbueringer hi! Could you please take a look when you have a bit of time? 🙏

P.S. Sorry if the mention is incorrect, I choose you because you were the one doing triage for the most of PRs

chrischdi · 2024-08-22T13:48:33Z

The original decision to count Ready Nodes for MachineSet.AvailableReplicas was made here: #47. Is it possible to reassess it?

This already gets re-assessed in this Proposal:

📖 Proposal: Improving status in CAPI resources #10897

Via the proposal, a new ExperimentalAvailableReplicas gets introduced to v1beta1 (and will later be AvailableReplicas in v1beta2, and the old one gets removed).

Also AvailableReplicas will determine Machine's availability via Machine's Available condition instead of computing availability as of today (based on the Node Ready condition)

Also the proposal changes how the Ready condition gets calculated.

Question: Could it help that setting a proper MachineDeployment.spec.minReadySeconds to a value which would allow the healthcheck to add the condition before it is marked as ready?

sm4ll-3gg · 2024-08-23T11:27:29Z

Oh, that's exactly what we need. Thank you!
Looking forward for these changes are released :)

Could it help that setting a proper MachineDeployment.spec.minReadySeconds to a value which would allow the healthcheck to add the condition before it is marked as ready?

Unfortunately, no, until the proposal is accepted and implemented. That's because MHC makes a Machine unready, but MS doesn't care about it during AvailableReplicas calculation

enxebre · 2024-08-26T12:51:32Z

We have Node Problem Detector deployed which reports additional conditions for workload cluster Nodes. There's a task to avoid scailing downs old Machine Set until new nodes are completely available (including additional condition)

fwiw one thought for a potential way to delay scale down until MachineReadinessGate and the prop Christian mentions get implemented, would be to have your own controller and a blocking PDB for you pool of Nodes that only increases disruption budget as your desired condition is set.

Also could you elaborate on the use case? You might also be able to leverage taints and tolerations on your workloads to achieve delaying the scale down until running workload can be re allocated.

sm4ll-3gg · 2024-08-27T16:12:00Z

Also could you elaborate on the use case? You might also be able to leverage taints and tolerations on your workloads to achieve delaying the scale down until running workload can be re allocated.

I guess we are talking about different things. Let me clarify our intent)

TLDR: We want MachineDeployment to pause roll out of unhealthy Machines based not only on Node readiness

Let's imagine that we have a bug in CCM, which installed in our cluster. It creates improperly configured nodes and NPD (for example) detects it, but kubeletet decides that the Node is ready. In current implementation a roll out of Machine Deployment will lead to all old healthy machines are replaced with new unhealthy ones (despite all new machines could be Ready = false).

If I understand you correctly, then PDB and taints/tolerations are unable to help us because they affect only workload lifecycle. We won't have pods running on broken nodes, but cluster capacity will be undesirably reduced anyway. Let me know please if you meant something else!

chrischdi · 2024-08-27T18:12:11Z

cc @fabriziopandini because this is related to recent discussions about "Availablity".

fabriziopandini · 2024-08-28T19:34:34Z

IMO, using MHC is definitely not the way to solve this issue, because MHC will trigger machine deletion.
Also, as of today, Machine.Ready is not considered during rollout.

Also based on my understanding of this issue, the proposal linked above is going to make it possible to solve this issue by introducing two changes:

Machine.Available condition (and thus Machines.Ready condition) will be considered during rollout
Machne.Ready condition will be extensible via ReadynessGates

This will give you the option to add into the mix additional conditions surfacing at machine level whatever is relevant for you, in this case a condition added by the node detector on the corresponding Node. Please note that this will require to write your own controller

If that's ok, I would suggest to close this issue since this is already being tracked somewhere else

sm4ll-3gg · 2024-08-29T15:56:21Z

IMO, using MHC is definitely not the way to solve this issue, because MHC will trigger machine deletion.

Actually MHC support skipping remediation, so I thought it would be a nice way to reflect Node conditions on Machine.

Overall, it seems that the proposal does solves the issue. If you think it's better to close it, I'm OK with that. I left it opened just to highlight that we faced the problem and let you consider this during the discussion of the proposal! :)

fabriziopandini · 2024-08-30T08:38:53Z

/close

As per comment above, this should be addressed by #10897

k8s-ci-robot · 2024-08-30T08:38:57Z

@fabriziopandini: Closing this issue.

In response to this:

/close

As per comment above, this should be addressed by #10897

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. labels Aug 6, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 6, 2024

k8s-ci-robot closed this as completed Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MachineDeployment rolls out Unhealthy Machines with Ready Nodes #11023

MachineDeployment rolls out Unhealthy Machines with Ready Nodes #11023

sm4ll-3gg commented Aug 6, 2024 •

edited

Loading

k8s-ci-robot commented Aug 6, 2024

sm4ll-3gg commented Aug 10, 2024

chrischdi commented Aug 22, 2024

sm4ll-3gg commented Aug 23, 2024

enxebre commented Aug 26, 2024

sm4ll-3gg commented Aug 27, 2024

chrischdi commented Aug 27, 2024

fabriziopandini commented Aug 28, 2024

sm4ll-3gg commented Aug 29, 2024

fabriziopandini commented Aug 30, 2024

k8s-ci-robot commented Aug 30, 2024

MachineDeployment rolls out Unhealthy Machines with Ready Nodes #11023

MachineDeployment rolls out Unhealthy Machines with Ready Nodes #11023

Comments

sm4ll-3gg commented Aug 6, 2024 • edited Loading

What steps did you take and what happened?

What did you expect to happen?

Cluster API version

Kubernetes version

Anything else you would like to add?

Label(s) to be applied

k8s-ci-robot commented Aug 6, 2024

sm4ll-3gg commented Aug 10, 2024

chrischdi commented Aug 22, 2024

sm4ll-3gg commented Aug 23, 2024

enxebre commented Aug 26, 2024

sm4ll-3gg commented Aug 27, 2024

chrischdi commented Aug 27, 2024

fabriziopandini commented Aug 28, 2024

sm4ll-3gg commented Aug 29, 2024

fabriziopandini commented Aug 30, 2024

k8s-ci-robot commented Aug 30, 2024

sm4ll-3gg commented Aug 6, 2024 •

edited

Loading