Scale down shouldn't depend on VM index #3644

shanalily · 2018-08-09T07:03:18Z

What this PR does / why we need it:
When upgrading a cluster, there can be gaps in the index in the VM names (e.g. there are 3 VMs with names k8s-agentpool1-00000000-0, k8s-agentpool1-00000000-1, k8s-agentpool1-00000000-3 but none ending in 2). This is a problem when scaling down a cluster with the acs-engine scale command because the index in the name is used for iterating over the VMs so there might be an attempt to drain a node with no name and the scale operation will fail. The index isn't needed for iterating over the VMs so they can be stored in a slice instead.

Which issue this PR fixes: fixes #2362

Special notes for your reviewer:
I want to know that I'm not breaking anything, though I've been using this change for a while without problems. This seems like a simple fix so I feel like I'm missing something.

If applicable:

documentation
unit tests
tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

jackfrancis · 2018-08-09T15:48:16Z

/lgtm, testing scale against this PR, thanks @shanalily!

CecileRobertMichon

lgtm, @JackQuincy for review

jackfrancis · 2018-08-09T17:08:21Z

Scale tests succeeded

JackQuincy · 2018-08-10T18:23:53Z

This should work. My only concern is it does a string sort not an int sort. So if you have vms -1...-23 it will start deleting at -9 not at -23. Which is more of a cleanliness thing but it will confuse devs.(got an support case about something similar to this in the service recently). And Ideally we switch to deleting the least utilized nodes not off the end but that would take some work. Those are my thoughts.

jackfrancis · 2018-08-10T18:26:18Z

@JackQuincy thanks for the feedback, agree that int sort is preferable to reduce the number of confused humans.

We have a long-term intention of doing just what you say, which is to define our own criteria for sorting nodes by "most appropriate for cordon/drain), and then enforce it for both scale and upgrade.

shanalily · 2018-08-10T18:40:43Z

@JackQuincy That makes sense, I'll fix it to the delete highest index first (if I don't get to it today then this weekend).

codecov · 2018-08-13T15:19:27Z

Codecov Report

Merging #3644 into master will decrease coverage by 0.27%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master   #3644      +/-   ##
=========================================
- Coverage   55.77%   55.5%   -0.28%     
=========================================
  Files         107     107              
  Lines       16238   16194      -44     
=========================================
- Hits         9057    8988      -69     
- Misses       6408    6432      +24     
- Partials      773     774       +1

jackfrancis · 2018-08-16T23:04:48Z

/lgtm

acs-bot · 2018-08-16T23:04:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, shanalily

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jackfrancis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shanalily added 4 commits August 8, 2018 08:38

array instead of map for vms, still need to test

cbf8647

sort vm names

709de92

should use variable

28704f7

got rid of comment

19615b0

shanalily requested a review from CecileRobertMichon August 9, 2018 07:03

ghost assigned shanalily Aug 9, 2018

ghost added the in progress label Aug 9, 2018

acs-bot added the size/S label Aug 9, 2018

shanalily changed the title ~~Vm index~~ Scale down shouldn't depend on VM index Aug 9, 2018

CecileRobertMichon requested a review from JackQuincy August 9, 2018 16:34

CecileRobertMichon approved these changes Aug 9, 2018

View reviewed changes

depends on index again

b4adba6

acs-bot added size/XS and removed size/S labels Aug 13, 2018

deleted comments

cf16e8d

acs-bot assigned jackfrancis Aug 16, 2018

acs-bot added the lgtm label Aug 16, 2018

acs-bot added the approved label Aug 16, 2018

jackfrancis merged commit b3dca61 into Azure:master Aug 16, 2018

ghost removed the in progress label Aug 16, 2018

CecileRobertMichon mentioned this pull request Aug 28, 2018

How to recover from out of sequence node names #3568

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale down shouldn't depend on VM index #3644

Scale down shouldn't depend on VM index #3644

shanalily commented Aug 9, 2018

jackfrancis commented Aug 9, 2018

CecileRobertMichon left a comment

jackfrancis commented Aug 9, 2018

JackQuincy commented Aug 10, 2018

jackfrancis commented Aug 10, 2018

shanalily commented Aug 10, 2018 •

edited

Loading

codecov bot commented Aug 13, 2018

jackfrancis commented Aug 16, 2018

acs-bot commented Aug 16, 2018

Scale down shouldn't depend on VM index #3644

Scale down shouldn't depend on VM index #3644

Conversation

shanalily commented Aug 9, 2018

jackfrancis commented Aug 9, 2018

CecileRobertMichon left a comment

Choose a reason for hiding this comment

jackfrancis commented Aug 9, 2018

JackQuincy commented Aug 10, 2018

jackfrancis commented Aug 10, 2018

shanalily commented Aug 10, 2018 • edited Loading

codecov bot commented Aug 13, 2018

Codecov Report

jackfrancis commented Aug 16, 2018

acs-bot commented Aug 16, 2018

shanalily commented Aug 10, 2018 •

edited

Loading