Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Scale down shouldn't depend on VM index #3644

Merged
merged 6 commits into from
Aug 16, 2018
Merged

Conversation

shanalily
Copy link
Member

What this PR does / why we need it:
When upgrading a cluster, there can be gaps in the index in the VM names (e.g. there are 3 VMs with names k8s-agentpool1-00000000-0, k8s-agentpool1-00000000-1, k8s-agentpool1-00000000-3 but none ending in 2). This is a problem when scaling down a cluster with the acs-engine scale command because the index in the name is used for iterating over the VMs so there might be an attempt to drain a node with no name and the scale operation will fail. The index isn't needed for iterating over the VMs so they can be stored in a slice instead.

Which issue this PR fixes: fixes #2362

Special notes for your reviewer:
I want to know that I'm not breaking anything, though I've been using this change for a while without problems. This seems like a simple fix so I feel like I'm missing something.

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

@ghost ghost assigned shanalily Aug 9, 2018
@ghost ghost added the in progress label Aug 9, 2018
@acs-bot acs-bot added the size/S label Aug 9, 2018
@shanalily shanalily changed the title Vm index Scale down shouldn't depend on VM index Aug 9, 2018
@jackfrancis
Copy link
Member

/lgtm, testing scale against this PR, thanks @shanalily!

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @JackQuincy for review

@jackfrancis
Copy link
Member

Scale tests succeeded

@JackQuincy
Copy link
Contributor

This should work. My only concern is it does a string sort not an int sort. So if you have vms -1...-23 it will start deleting at -9 not at -23. Which is more of a cleanliness thing but it will confuse devs.(got an support case about something similar to this in the service recently). And Ideally we switch to deleting the least utilized nodes not off the end but that would take some work. Those are my thoughts.

@jackfrancis
Copy link
Member

@JackQuincy thanks for the feedback, agree that int sort is preferable to reduce the number of confused humans.

We have a long-term intention of doing just what you say, which is to define our own criteria for sorting nodes by "most appropriate for cordon/drain), and then enforce it for both scale and upgrade.

@shanalily
Copy link
Member Author

shanalily commented Aug 10, 2018

@JackQuincy That makes sense, I'll fix it to the delete highest index first (if I don't get to it today then this weekend).

@codecov
Copy link

codecov bot commented Aug 13, 2018

Codecov Report

Merging #3644 into master will decrease coverage by 0.27%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master   #3644      +/-   ##
=========================================
- Coverage   55.77%   55.5%   -0.28%     
=========================================
  Files         107     107              
  Lines       16238   16194      -44     
=========================================
- Hits         9057    8988      -69     
- Misses       6408    6432      +24     
- Partials      773     774       +1

@acs-bot acs-bot added size/XS and removed size/S labels Aug 13, 2018
@jackfrancis
Copy link
Member

/lgtm

@acs-bot
Copy link

acs-bot commented Aug 16, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, shanalily

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scaling does not work on upgraded clusters with multiple agent nodes
5 participants