-
Notifications
You must be signed in to change notification settings - Fork 294
Allow more control over ASG definition #142
Allow more control over ASG definition #142
Conversation
Min/max size and min instances in service can now be directly defined. Defaults take from the older `controllerCount` and `workerCount`
Current coverage is 69.68% (diff: 100%)@@ master #142 diff @@
==========================================
Files 5 4 -1
Lines 1110 1095 -15
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 776 763 -13
+ Misses 249 248 -1
+ Partials 85 84 -1
|
@@ -9,3 +9,6 @@ | |||
coverage.txt | |||
profile.out | |||
/.glide | |||
|
|||
# files generated during tests | |||
config/temp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
But I've fixed test helper in 194ff10#diff-490c12c05aa246994472cb51aa5c358bR22 so if you've rebased to master, tests don't leave garbages like config/temp
, nodepool/config/temp
anymore.
Would you mind undoing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind removing but one thing is if viewing git diff during a test run, it shows many uncommitted files before this. I think that could be a good reason to leave it in. I'm using some Atom extensions that mean even just saving will run the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase to the latest master and rm -Rf config/temp nodepool/config/temp
and these uncommitted files won't appear again 💮
If it appears, it is a bug. I'd rather like to notice bugs instead of ignoring them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if anything was changed on latest master that would help this as master hasn't changed since this comment? Anyway, it still happens off latest - my local save runs the tests and if I happen to go to git diff while that is running it shows many files in config/temp
. Not a big problem, but it is still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info!
That's odd 😭
Would you mind reverting this and then raising an another issue specific to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
if config.WorkerASG.RollingUpdateMinInstancesInService == 0 { | ||
config.WorkerASG.RollingUpdateMinInstancesInService = config.WorkerASG.MaxSize - 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go this way, I guess we also need to set defaults for WorkerASG in nodepool/config/config.go
or we'll end up with all of MinSize, MaxSize, RollingUPdateMinInstancesInService to be kept as is - 0 for node pools, hence stack creation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I guess we can do this more immutably(=no need to mutate WorkerASG.RollingUpdateminInstancesInService) and DRY(=no need to copy-paste this to nodepool/config/config.go
) by wrapping RollingUpdateMinInstancesInService field into a func named the same.
Wrote a snippet to show my thoughts: https://play.golang.org/p/c_0Pu_UDMf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As yes you are right, I had thought nodepools config was inherited from the main one. I want to share it somehow as you say so will look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only static, default values set in NewDefaultCluster()
are inherited to node pool config.
IMHO currently there's no timing to inherit "computed" values like these.
That is one of reasons I believe it is better to implement "computed" values by wrapping them into functions, instead of mutating values in place. ref https://github.com/coreos/kube-aws/pull/142/files#r91641540
// setup worker ASG size, one is the default WorkerCount | ||
if config.WorkerCount != 1 && (config.WorkerASG.MinSize != 0 || config.WorkerASG.MaxSize != 0) { | ||
return nil, fmt.Errorf("`workerASG.MinSize` and `workerASG.MaxSize` can only be specified without `workerCount`") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A validation like this should be done in func (c WorkerSettings) Valid()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I missed that, I'm sure as you are aware it is a little hard to know about the style with old+new code together and such a big file! I will fix it.
} | ||
if config.WorkerASG.MinSize == 0 { | ||
config.WorkerASG.MinSize = config.WorkerCount | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just alter func (c WorkerSettings) MinWorkerCount() int
to achieve the same end result?
e.g.
func (c WorkerSettings) MinWorkerCount() int {
if c.WorkerASG.MinSize == 0 {
return c.WorkerCount
}
return c.WorkerASG.MinSize
// Instead of:
// return c.WorkerCount - 1
}
So that we don't even need to modify stack-template.json
and could keep backward-compatibility of what are (exposed to|usable from) stack-template.json, while no need to copy-pasting some code from config/config.to to nodepool/config/config.go like I've suggested in my prev comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid keeping MinWorkerCount
for two reasons:
- I think it's better if the layout of the names in config
cluster.yaml
match thosestack-template.json
- Regarding backwards compatibility,
MinWorkerCount
was only introduced in the new kube-aws repo so I think the impact is minimal for anyone with major modifications as they would have to do similar to I have done (switch over and change quite a lot). I don't think we should maintain too many options/functions that do the same thing.
What are you thoughts assuming I do the function/DRY work above?
(Side note, this relates to me wanting to remove the old workerCount
too as it's trivial for the user to update to min/max compared to keeping all so many different options available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically agree to both 1. and 2. in your comment. I also agree to remove workerCount
.
However I'd like them to be deprecated first and then dropped, in the future, not in this pull request. Adding/improving something and dropping existing feature/setting/option should be done in separate pull requests IMHO.
To be extra clear, I'm not satisfied with neither MinWorkerCount
nor MaxWorkerCount
as of today but that doesn't mean I want to drop them immediately, hence I'd like to leave MinWorkerCount
, MaxWorkerCount
and workerCount
"for now".
} | ||
if config.WorkerASG.MaxSize == 0 { | ||
config.WorkerASG.MaxSize = util.IntMax(config.WorkerCount, config.WorkerASG.MinSize) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as https://github.com/coreos/kube-aws/pull/142/files#r91637776 for func (c WorkerSettings) MaxWorkerCount() int
config.WorkerASG.MinSize = config.WorkerCount | ||
} | ||
if config.WorkerASG.MaxSize == 0 { | ||
config.WorkerASG.MaxSize = util.IntMax(config.WorkerCount, config.WorkerASG.MinSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we'd like to add a validation like this to func (c WorkerSettings) Valid() error
rather than silently fixing invalid configuration (i.e. MinSize > WorkerCount).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do. What do you want the expected behaviour to be for the setup in TestASGsMinConfigured
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A validation error when MinSize > WorkerCount would be nice
@@ -17,19 +17,19 @@ | |||
"{{$subnet.AvailabilityZone}}" | |||
{{end}} | |||
], | |||
"DesiredCapacity": "{{.WorkerCount}}", | |||
"DesiredCapacity": "{{.WorkerASG.MinSize}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you've not intended this?
I guess we should leave this {{.WorkerCount}}
or we become unable set desired capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption is desired capacity is usually min size for cluster rollout? Afterwards AWS changes the desired capacity when we implement auto scaling. We could allow this to be customised as well? I can't remember the AWS behaviour if desired capacity is outside min/max etc. Right now I just replaced like for like so we could leave the extra config for later, i.e. it was the min count before and it is still the min count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. However if that's your intention it should be achieved in an another way.
DesiredCapacity
is not a required parameter.
If you want to leave your initial ASG size as same as MinSize, you could just omit "DesiredCapacity": "{{.WorkerASG.MinSize}}",
so stack-template would look like:
{{if gt .WorkerCount 0}}"DesiredCapacity": "{{.WorkerCount}}",{{end}}
With this, If workerCount is omitted from cluster.yaml, DesiredCapacity
is automatically set to MinSize by ASG itself and we're even unable to update DesiredCapacity via cfn stack updates. Thus, now, the user is responsible to update DesiredCapacity outside of kube-aws or CloudFormation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention of this change is that the new WorkerASG.MinSize
has the same value as the older WorkerCount
. So if we want to omit it for that case, I would just omit it altogether and that would clarify that we currently do not support configuration of DesiredCapacity
. What do you think?
It depends whether you want the DesiredCapacity
to be set to the exact value it is prior to this change which it is MinSize
is always set, defaulting to WorkerCount
. In turn, this also keeps stack-template.json
clean as it only uses the ASG properties rather than having multiple levels of defaulting/switching inside code and templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to give users option to choose either they modify DesiredCapacity inside or outside of CloudFormation plus I'd like not to break the existing use-case that set DesiredCapacity via workerCount.
We could add support for MinSize and MaxSize without removing such option or breaking existing use-case hence #142 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick thought but do you prefer entirely removing DesiredCapacity
from stack-template.json in this pr and later(in an another issue/pr) improve kube-aws update
command to support updating DesiredCapacity outside of cfn like e.g. kube-aws update --num-workers 10
?
Doing this allows users to choose either updating desired capacity via kube-aws or not, anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. Since it's currently set to the same size as min and AWS will default to that anyway, it's not currently doing much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so let's remove DesiredCapacity
then 👍
Btw, I've noticed that omitting DesiredCapacity
here would make the situation easier for #148 too.
@@ -99,19 +99,19 @@ | |||
"{{$subnet.AvailabilityZone}}" | |||
{{end}} | |||
], | |||
"DesiredCapacity": "{{.ControllerCount}}", | |||
"DesiredCapacity": "{{.ControllerASG.MinSize}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# workerASG: | ||
# minSize: 1 | ||
# maxSize: 3 | ||
# rollingUpdateMinInstancesInService: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we adapt these to the new worker:
key I've recently added for spot fleets so that we can have more consistency/distinction between 2 deployment strategy: auto scaling group and spot fleet?
#worker:
# # Auto Scaling Group definition for workers. If only `workerCount` is specified, min and max will be the set to that value and `rollingUpdateMinInstancesInService` will be one less.
# autoScalingGroup:
# minSize: 1
# maxSize: 3
# rollingUpdateMinInstancesInservice: 2
# # Spot Fleet definitions for workers
# # You may specify autoScalingGroup or spotFleet, but not both.
# # Spot Fleet support is still experimental. It can be changed in backward-incompatible ways
# spotFleet:
# *snip*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can! I saw that and was wondering if it was the way you wanted to go, I can update it. One thing will be we have old settings outside worker
and new settings inside. My opinion is we need to deprecate some things in order to ensure a consistent and easy to understand tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also want the controller ASG in a controller
YAML property? Do you only mean nodepools or both cluster.yaml
? We want to keep it consistent right (as it's all read from the same place)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes! I'd like to have a new controller
property and both worker
and controller
to be consistently used in both main cluster's and node pools' cluster.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entire sure how I can get this in without moving worker.go
since otherwise there will be a duplicate Worker
inside Cluster
. Are you ok for me to move it? I saw there was a bunch of placeholder code anyway in config.go
so it's pretty messy as it is.
@c-knowles Thanks for the PR! Its functionality seems good 👍 however some comments on backward-compatibility, consistency and DRY. |
@mumoshu great, thanks. I'll start on a few of those changes and I put some follow up comments above. |
I've mistakenly commented via my another github account several times 😅 Never mind about that. |
@c-knowles I've replied to every comment 👍 |
- Start model package - Fix up nodepool ASGs as they only inherit static default values - Move validation into Valid() methods - Move ASG config into `worker` and `controller` - Maintain backwards compatibility for now on MinWorkerCount/MaxWorkerCount - Increase validation for setting count along with min/max - Remove desired capacity, it currently do not do much
@mumoshu I believe I've done most of what you requested above, I've started to align worker/controller as well as I was forced to combine those to remove duplicate definitions inside config/nodepools. I'd like to move more of the parsing and validation of the model into that package but I kept it to a minimum for now. |
@@ -65,7 +64,7 @@ | |||
{{if .Experimental.WaitSignal.Enabled}} | |||
"CreationPolicy" : { | |||
"ResourceSignal" : { | |||
"Count" : "{{.WorkerCount}}", | |||
"Count" : "{{.MinWorkerCount}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In kube-aws update
, this should match the desired capacity at that time rather than min size, or I guess we will end up destroying the former autoscaling group too early the overall cluster capacity temporarily decreased until the rest of worker nodes(DesiredCapacity - MinSize
) becomes ready.
So, do you mind that I'll follow up with a pr addressing this after merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, if that's appropriate. Since this is the CreationPolicy
I thought it didn't apply to stack updates at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! You're certainly correct. I've mistakenly mixed things up from the past when we've tried to do node replacements via "create asg and then destroy old one" instead of the current "gradually replace nodes in an asg".
(it's still needed just there is some want to commit it separately)
@c-knowles Thanks for your continuous efforts 🙇 |
Done. By the way, I think this resolves #158 as well. |
Oh nice, that seems true. Thanks! |
@c-knowles Merged. Thanks for the PR and a lot of follow ups 👍 |
* Allow more control over ASG definition Min/max size and min instances in service can now be directly defined. Defaults take from the older `controllerCount` and `workerCount` * Start to unify worker/controller configuration - Start model package - Fix up nodepool ASGs as they only inherit static default values - Move validation into Valid() methods - Move ASG config into `worker` and `controller` - Maintain backwards compatibility for now on MinWorkerCount/MaxWorkerCount - Increase validation for setting count along with min/max - Remove desired capacity, it currently do not do much
Min/max size and min instances in service can now be directly defined.
Defaults taken from the older
controllerCount
andworkerCount