Autoscaler interfering in meltdown scenario solution #741

himanshu-kun · 2022-08-16T13:00:04Z

How to categorize this issue?

/area performance
/kind bug
/priority 2

What happened:
Autoscaler's fixNodeGroupSize logic interferes with meltdown logic where we remove only maxReplacement machines per machinedeployment, and it removes the other Unknown machines as well.

What you expected to happen:
Autoscaler even on taking decision of DecreaseTargetSize should not be able to remove Unknown machines, because the node object is actually present for them.

How to reproduce it (as minimally and precisely as possible):

Create a machinedeployment with 2 replicas (its assumed autoscaler is enabled for the cluster)
block all traffic to/from the zone machinedeployment is for
with default maxReplacement 1 node will stay in Pending state
after around 20 min , the Unknown machine would be deleted when autoscaler fixes the node grp size by reducing machinedeployment replicas to 1

Anything else we need to know?:
This is happening because the way machineSet prioritizes machine while deletion based on their status

machine-controller-manager/pkg/controller/controller_utils.go

Lines 769 to 776 in d7e3c5d

    
           m := map[v1alpha1.MachinePhase]int{ 
        
           	v1alpha1.MachineTerminating:      0, 
        
           	v1alpha1.MachineFailed:           1, 
        
           	v1alpha1.MachineCrashLoopBackOff: 2, 
        
           	v1alpha1.MachineUnknown:          3, 
        
           	v1alpha1.MachinePending:          4, 
        
           	v1alpha1.MachineAvailable:        5, 
        
           	v1alpha1.MachineRunning:          6,

*We need to look into any other implication of prioritizing Pending machine over Unknown machines for solution.

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
Others:
CA version 1.23.1

The text was updated successfully, but these errors were encountered:

himanshu-kun · 2022-08-16T13:04:14Z

cc @unmarshall

himanshu-kun · 2022-12-05T10:27:52Z

fixNodeGrpSize only understands Registered and Non-Registered nodes, it doesn't do anything even if the node joins and is NotReady for a long time.
Here the intention of fixNodeGrpSize was to remove the Pending machine, but because of our preference in machineSet controller it removes the Unknown machine.
The fixNodeGrpSize currently just acts when the RemoveLongUnregistered logic is not able to remove the longUnregistered nodes because node grp is already at the minimum.
Also if we are not at min. node grp size, the RemoveLongUnregistered logic wouldn't delete the Unknown machine because it uses the priority annotation to pinpoint the machine it wants to delete and as per machineSet preference priority annotation is preffered over machine phase.

himanshu-kun · 2023-02-17T05:53:50Z

Prioritizing Pending machine removal over Unknown would make more sense because:

Unknown means node obj(in most cases) is there and pods are runnning on it mostly
There is a case where node obj is not there and machine is marked Unknown , but after addressing this issues React faster if VM instance is gone (i.e. don’t wait until full machineHealthTimeout/machineCreationTimeout lapses) #755 , the case would be gone.
Pending machine means the node obj is not hosting any pods as of now , or node obj is not there at all. Also with the recent development , in Gardener context, node obj are created with node.gardener.cloud/critical-components-not-ready taint , which also keep the machine object in pending state and stop workload from getting scheduled on the node.(Consider critical-components-not-ready taint when machine joins first time #778)

himanshu-kun · 2023-02-17T06:32:16Z

Also if we are not at min. node grp size, the RemoveLongUnregistered logic wouldn't delete the Unknown machine because it uses the priority annotation to pinpoint the machine it wants to delete and as per machineSet preference priority annotation is preffered over machine phase.

But the RemoveLongUnregistered/RemoveOldUnregistered logic will remove the Pending machine if autoscaler maxNodeProvisionTimeout runs out, thinking it long unregistered. This will again kick in a loop where machine deployment size is reduced and then meltdown logic again turns maxReplacement machines into Pending and finally it'll stop when node grp min size is reached as that time RemoveLongUnregistered logic would stop.

The ideal solution is to make autoscaler aware that the meltdown logic is in play because of an outage in the zone, and it doesn't interfere.

himanshu-kun added the kind/bug Bug label Aug 16, 2022

gardener-robot added area/performance Performance (across all domains, such as control plane, networking, storage, etc.) related priority/2 Priority (lower number equals higher priority) labels Aug 16, 2022

himanshu-kun self-assigned this Mar 31, 2023

gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Dec 13, 2023

gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaler interfering in meltdown scenario solution #741

Autoscaler interfering in meltdown scenario solution #741

himanshu-kun commented Aug 16, 2022 •

edited

Loading

himanshu-kun commented Aug 16, 2022

himanshu-kun commented Dec 5, 2022

himanshu-kun commented Feb 17, 2023

himanshu-kun commented Feb 17, 2023

Autoscaler interfering in meltdown scenario solution #741

Autoscaler interfering in meltdown scenario solution #741

Comments

himanshu-kun commented Aug 16, 2022 • edited Loading

himanshu-kun commented Aug 16, 2022

himanshu-kun commented Dec 5, 2022

himanshu-kun commented Feb 17, 2023

himanshu-kun commented Feb 17, 2023

himanshu-kun commented Aug 16, 2022 •

edited

Loading