-
Notifications
You must be signed in to change notification settings - Fork 560
Scale down shouldn't depend on VM index #3644
Conversation
/lgtm, testing scale against this PR, thanks @shanalily! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, @JackQuincy for review
Scale tests succeeded |
This should work. My only concern is it does a string sort not an int sort. So if you have vms -1...-23 it will start deleting at -9 not at -23. Which is more of a cleanliness thing but it will confuse devs.(got an support case about something similar to this in the service recently). And Ideally we switch to deleting the least utilized nodes not off the end but that would take some work. Those are my thoughts. |
@JackQuincy thanks for the feedback, agree that int sort is preferable to reduce the number of confused humans. We have a long-term intention of doing just what you say, which is to define our own criteria for sorting nodes by "most appropriate for cordon/drain), and then enforce it for both scale and upgrade. |
@JackQuincy That makes sense, I'll fix it to the delete highest index first (if I don't get to it today then this weekend). |
Codecov Report
@@ Coverage Diff @@
## master #3644 +/- ##
=========================================
- Coverage 55.77% 55.5% -0.28%
=========================================
Files 107 107
Lines 16238 16194 -44
=========================================
- Hits 9057 8988 -69
- Misses 6408 6432 +24
- Partials 773 774 +1 |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, shanalily The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
When upgrading a cluster, there can be gaps in the index in the VM names (e.g. there are 3 VMs with names k8s-agentpool1-00000000-0, k8s-agentpool1-00000000-1, k8s-agentpool1-00000000-3 but none ending in 2). This is a problem when scaling down a cluster with the
acs-engine scale
command because the index in the name is used for iterating over the VMs so there might be an attempt to drain a node with no name and the scale operation will fail. The index isn't needed for iterating over the VMs so they can be stored in a slice instead.Which issue this PR fixes: fixes #2362
Special notes for your reviewer:
I want to know that I'm not breaking anything, though I've been using this change for a while without problems. This seems like a simple fix so I feel like I'm missing something.
If applicable:
Release note: