Autoscaler interfering in meltdown scenario solution #741
Labels
area/disaster-recovery
Disaster recovery related
area/high-availability
High availability related
area/robustness
Robustness, reliability, resilience related
effort/2w
Effort for issue is around 2 weeks
kind/bug
Bug
kind/design
lifecycle/rotten
Nobody worked on this for 12 months (final aging stage)
needs/planning
Needs (more) planning with other MCM maintainers
priority/2
Priority (lower number equals higher priority)
How to categorize this issue?
/area performance
/kind bug
/priority 2
What happened:
Autoscaler's
fixNodeGroupSize
logic interferes with meltdown logic where we remove only maxReplacement machines per machinedeployment, and it removes the otherUnknown
machines as well.What you expected to happen:
Autoscaler even on taking decision of
DecreaseTargetSize
should not be able to removeUnknown
machines, because the node object is actually present for them.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
This is happening because the way machineSet prioritizes machine while deletion based on their status
machine-controller-manager/pkg/controller/controller_utils.go
Lines 769 to 776 in d7e3c5d
*We need to look into any other implication of prioritizing
Pending
machine overUnknown
machines for solution.Environment:
kubectl version
):CA version 1.23.1
The text was updated successfully, but these errors were encountered: