Add support AWS EKS auto repair feature #7591

Atom-oh · 2025-01-14T04:20:38Z

Description

What problem are you trying to solve?
Here's the English translation:

The EKS Auto Repair feature has been created, but it is only supported for EKS node groups and not for Karpenter. In Auto Repair, when a node's status becomes problematic, it changes to Not Ready. Please make it possible to terminate these nodes at a specific time.

How important is this feature to you?
ometimes, when a node's disk encounters issues or when the kubelet malfunctions, and these problems don't automatically resolve, it can lead to significant problems with pod operations. This situation is considered very critical.

In EKS node groups, this issue is addressed through the auto scaling group mechanism, which terminates the problematic node and replaces it with a new one. However, this functionality is not currently supported in Karpenter nodepools, which makes it a critical concern.

To elaborate on this:

EKS Auto Repair: This feature is designed to maintain the health of the cluster by automatically addressing node issues.

Node Group vs. Karpenter: While EKS node groups have this auto-repair capability through Auto Scaling Groups (ASGs), Karpenter, which is an alternative node provisioning solution, currently lacks this feature.

Critical Impact: The absence of this feature in Karpenter can lead to prolonged downtime or degraded performance if a node becomes unhealthy, as there's no automatic mechanism to replace the problematic node.

Desired Solution: Implementing a similar auto-repair or node replacement mechanism for Karpenter nodepools would be beneficial. This could involve detecting nodes in a 'Not Ready' state and scheduling them for termination and replacement at a specified time, similar to how it works with EKS node groups.

Importance: This feature is crucial for maintaining the reliability and performance of Kubernetes clusters, especially in production environments where downtime can have significant impacts.

Is there any specific aspect of this issue you'd like to explore further, or do you have any questions about potential solutions or workarounds?

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

jigisha620 · 2025-01-16T00:09:49Z

I think what you are looking for is Node Repair in Karpenter and this has been implemented as part of this PR - kubernetes-sigs/karpenter#1793. It should be included in our next release.

Atom-oh added feature New feature or request needs-triage Issues that need to be triaged labels Jan 14, 2025

jigisha620 removed the needs-triage Issues that need to be triaged label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support AWS EKS auto repair feature #7591

Add support AWS EKS auto repair feature #7591

Atom-oh commented Jan 14, 2025

jigisha620 commented Jan 16, 2025

Add support AWS EKS auto repair feature #7591

Add support AWS EKS auto repair feature #7591

Comments

Atom-oh commented Jan 14, 2025

Description

jigisha620 commented Jan 16, 2025