Add instance-selector cmd to toolbox #9478

bwagner5 · 2020-07-02T14:34:48Z

This PR adds kops toolbox instance-selector which is used to create kops instance groups based on resource criteria of AWS instance types. There are built in best-practices for generating heterogeneous spot autoscaling groups w/ capacity-optimized allocation strategy. The instance-selector can also generate on-demand instance-groups which are still heterogeneous but use a lowest-price allocation strategy.

This command is implemented by utilizing the github.com/aws/amazon-ec2-instance-selector go pkg.

Testing:

Cluster has already been created on AWS

$ make kops
$ .build/local/kops toolbox instance-selector --flexible --usage-class spot --instance-group-name spot-test-group
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-07-02T14:32:38Z"
  labels:
    kops.k8s.io/cluster: guac.kops.sh
  name: spot-test-group
spec:
  cloudLabels:
    kops.k8s.io/instance-selector: "1"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200528
  machineType: c4.xlarge
  maxSize: 15
  minSize: 2
  mixedInstancesPolicy:
    instances:
    - c4.xlarge
    - c5.xlarge
    - c5a.xlarge
    onDemandAboveBase: 0
    onDemandBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    kops.k8s.io/instancegroup: spot-test-group
  role: Node
  subnets:
  - us-east-2a
  - us-east-2b
  - us-east-2c

$ .build/local/kops toolbox instance-selector --vcpus 4 --memory-min 6000 --memory-max 9000 --instance-group-name od-group
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-07-02T14:34:27Z"
  labels:
    kops.k8s.io/cluster: guac.kops.sh
  name: od-group
spec:
  cloudLabels:
    kops.k8s.io/instance-selector: "1"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200528
  machineType: c4.xlarge
  maxSize: 15
  minSize: 2
  mixedInstancesPolicy:
    instances:
    - c4.xlarge
    - c5.xlarge
    - c5a.xlarge
    - c5d.xlarge
  nodeLabels:
    kops.k8s.io/instancegroup: od-group
  role: Node
  subnets:
  - us-east-2a
  - us-east-2b
  - us-east-2c

k8s-ci-robot · 2020-07-02T14:34:56Z

Hi @bwagner5. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hakman · 2020-07-02T14:47:47Z

/ok-to-test

hakman · 2020-07-02T14:50:42Z

@bwagner5 Any reason why not using aws-sdk-go v1.31.15?

bwagner5 · 2020-07-02T14:57:27Z

@hakman just an oversight, updated.

hakman · 2020-07-02T15:01:46Z

Thanks @bwagner5.

bwagner5 · 2020-07-02T22:24:01Z

/assign @geojaz

hakman · 2020-07-07T18:12:19Z

@bwagner5 The current aws-sdk-go version is 1.32.13: 5107e1d

bwagner5 · 2020-07-07T18:40:48Z

@hakman my bad, the commit message is just wrong. I'll update

hakman · 2020-07-07T18:55:17Z

I think there is still some weirdness in the vendor related commits. For example defaults.go: 832f9b3.

Maybe squashing the the vendor and gomod commits into one would fix all this.

hakman · 2020-07-08T06:11:01Z

I don't think it worked. Still there are some commits reverting changes from previous ones. Probably would be best to restage them.

bwagner5 · 2020-07-08T17:50:29Z

apologies on all the noise :) I rebased to clean up the PR into 2 nicer commits. Seems there was a transient failure in the 1 year cert issue test too. All tests passed locally, and passed after I reran the build on gh-actions.

hakman · 2020-08-04T09:40:41Z

Maybe I am doing something wrong, but it's not working for me anymore:

% .build/local/kops toolbox instance-selector test --dry-run --memory-min "4 GiB" --memory-max "16 GiB"

Invalid input for --memory-max. A valid example is 16gb. Processing failed.

% .build/local/kops toolbox instance-selector test --dry-run --memory-min 4gb --memory-max 16gb

Invalid input for --memory-min. A valid example is 16gb. Processing failed.

% .build/local/kops toolbox instance-selector test --dry-run --memory-min 4gb --memory-max 16gb
panic: interface conversion: interface {} is *string, not *bytequantity.ByteQuantity

goroutine 1 [running]:
github.com/aws/amazon-ec2-instance-selector/v2/pkg/cli.(*CommandLineInterface).ByteQuantityMinMaxRangeFlagOnFlagSet.func1(0x0, 0x0, 0x4a815ae, 0x6)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/vendor/github.com/aws/amazon-ec2-instance-selector/v2/pkg/cli/flags.go:197 +0x585
github.com/aws/amazon-ec2-instance-selector/v2/pkg/cli.(*CommandLineInterface).ValidateFlags(0xc000178dc0, 0x0, 0x0)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/vendor/github.com/aws/amazon-ec2-instance-selector/v2/pkg/cli/cli.go:115 +0xfa
main.processAndValidateFlags(0xc000178dc0, 0x1, 0xc0005d76f0, 0x2)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/cmd/kops/toolbox_instance_selector.go:334 +0x74
main.RunToolboxInstanceSelector(0x4ff5f60, 0xc000054080, 0xc0005b7ae0, 0xc00010e960, 0x1, 0x6, 0x4f9d400, 0xc000010018, 0xc000178dc0, 0x0, ...)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/cmd/kops/toolbox_instance_selector.go:193 +0x85
main.NewCmdToolboxInstanceSelector.func2(0xc000306580, 0xc00010e960, 0x1, 0x6)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/cmd/kops/toolbox_instance_selector.go:125 +0x85
github.com/spf13/cobra.(*Command).execute(0xc000306580, 0xc00010e8a0, 0x6, 0x6, 0xc000306580, 0xc00010e8a0)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/vendor/github.com/spf13/cobra/command.go:842 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0x6bb4a60, 0x6bff7f0, 0x0, 0x0)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/vendor/github.com/spf13/cobra/command.go:943 +0x336
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/hakman/Documents/git/go/src/k8s.io/kops/vendor/github.com/spf13/cobra/command.go:883
main.Execute()
	/Users/hakman/Documents/git/go/src/k8s.io/kops/cmd/kops/root.go:96 +0x8f
main.main()
	/Users/hakman/Documents/git/go/src/k8s.io/kops/cmd/kops/main.go:25 +0x25

I am more a fan of the "16gb" notation instead of "16 GiB".

bwagner5 · 2020-08-04T14:25:19Z

@hakman Sorry, that was my bad... pushed a little too quickly before I went on vacation. It's fixed now:

➜  kops git:(feat-instance-selector) .build/local/kops toolbox instance-selector --memory-min=4gb --memory-max=16gb --dry-run hi --state s3://tti-kops --name tti.k8s.local
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: ""
  name: hi
spec:
  cloudLabels:
    kops.k8s.io/instance-selector: "1"
  image: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-07-20
  machineType: c1.xlarge
  maxSize: 15
  minSize: 2
  mixedInstancesPolicy:
    instances:
    - c1.xlarge
    - c3.2xlarge
    - c3.xlarge
    - c4.2xlarge
    - c4.xlarge
    - c5.2xlarge
    - c5.large
    - c5.xlarge
    - c5a.2xlarge
    - c5a.large
    - c5a.xlarge
    - c5d.2xlarge
    - c5d.large
    - c5d.xlarge
    - c5n.large
    - c5n.xlarge
    - g2.2xlarge
    - g4dn.xlarge
    - i3.large
    - i3en.large
  nodeLabels:
    kops.k8s.io/instancegroup: hi
  role: Node
  subnets:
  - us-east-1a

I agree, I like the "4gb" syntax better as well. I've updated the CLI examples in the usage to reflect. 4 GiB will still be parsable.

hakman · 2020-08-04T14:27:34Z

Thanks for the update @bwagner5. Will take another look tomorrow. Enjoy vacation 😄 !

hakman · 2020-08-11T20:06:38Z

Hey @bwagner5, I finally go some time to finish the review. It looks very well, except a few nits/questions:

The defaults:

	clusterAutoscalerDefault := false
  	nodeCountMinDefault := 2
  	nodeCountMaxDefault := 15

would change them to:

	clusterAutoscalerDefault := true
  	nodeCountMinDefault := 1
  	nodeCountMaxDefault := 10

gpu-memory-total should become gpu-memory, but keep the description as is to explain that it's the total.
There is a bug with the cluster-autoscaler labels, the cluster name is not added:

$ kops toolbox instance-selector ondemand-ig --dry-run --cluster-autoscaler
Using cluster from kubectl context: instance-selector.test.com

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: ""
  name: ondemand-ig
spec:
  cloudLabels:
=>  k8s.io/cluster-autoscaler/: "1"
    k8s.io/cluster-autoscaler/enabled: "1"
    kops.k8s.io/instance-selector: "1"

What do you think?

bwagner5 · 2020-08-11T20:45:17Z

Those defaults sound very reasonable, I've updated the PR.

The cluster-autoscaler labels should be added properly with the label now (tested with --name and export KOPS_CLUSTER_NAME=tti.k8s.local:

.build/local/kops toolbox instance-selector --memory-min=4gb --memory-max=16gb --dry-run hi --state s3://tti-kops --name tti.k8s.local
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: tti.k8s.local
  name: hi
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: "1"
    k8s.io/cluster-autoscaler/tti.k8s.local: "1"
    kops.k8s.io/instance-selector: "1"
  image: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-07-20
  machineType: c1.xlarge
  maxSize: 10
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - c1.xlarge
    - c3.2xlarge
    - c3.xlarge
    - c4.2xlarge
    - c4.xlarge
    - c5.2xlarge
    - c5.large
    - c5.xlarge
    - c5a.2xlarge
    - c5a.large
    - c5a.xlarge
    - c5d.2xlarge
    - c5d.large
    - c5d.xlarge
    - c5n.large
    - c5n.xlarge
    - g2.2xlarge
    - g4dn.xlarge
    - i3.large
    - i3en.large
  nodeLabels:
    kops.k8s.io/instancegroup: hi
  role: Node
  subnets:
  - us-east-1a

hakman · 2020-08-11T21:08:45Z

Nice work. Thanks!
/lgtm
/approve

k8s-ci-robot · 2020-08-11T21:09:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bwagner5, hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hakman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 2, 2020

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 2, 2020

k8s-ci-robot requested review from hakman and mikesplain July 2, 2020 14:35

k8s-ci-robot added the area/documentation label Jul 2, 2020

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 2, 2020

k8s-ci-robot assigned geojaz Jul 2, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2020

bwagner5 force-pushed the feat-instance-selector branch from 296eab3 to 308c5c7 Compare July 7, 2020 17:34

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2020

bwagner5 force-pushed the feat-instance-selector branch from 308c5c7 to 8bc13bc Compare July 7, 2020 17:41

bwagner5 force-pushed the feat-instance-selector branch 3 times, most recently from cca320e to 832f9b3 Compare July 7, 2020 18:49

bwagner5 force-pushed the feat-instance-selector branch from 832f9b3 to 98ba1d4 Compare July 7, 2020 20:18

bwagner5 force-pushed the feat-instance-selector branch 2 times, most recently from 5cced34 to fa65aea Compare July 8, 2020 16:54

bwagner5 force-pushed the feat-instance-selector branch 4 times, most recently from 981dfc2 to 443ed67 Compare July 31, 2020 11:54

bwagner5 requested a review from hakman July 31, 2020 17:57

bwagner5 force-pushed the feat-instance-selector branch from 443ed67 to 7e2cb54 Compare August 4, 2020 14:18

bwagner5 and others added 10 commits August 10, 2020 16:16

go.mod deps for feat toolbox instance-selector

fe3671f

feat toolbox instance-selector implementation

9d9ca84

pr comments

8d81c22

move from zones input to subnets input

1bb593a

ove instance-group-name to arg like create ig

2a33b98

update cli docs for instance-selector

b4bc9b5

cpuarch amd64 is now supported in upstream lib

89c90c8

use byte quantity flag instead of int MiBs for memory args

602564d

fix new cli api for byte quantities

e1136f6

update cli docs

2d6d7ec

bwagner5 force-pushed the feat-instance-selector branch from df964c1 to 2d6d7ec Compare August 10, 2020 22:13

change defaults

c4e2497

k8s-ci-robot assigned hakman Aug 11, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 11, 2020

k8s-ci-robot merged commit b7871e2 into kubernetes:master Aug 11, 2020

k8s-ci-robot added this to the v1.19 milestone Aug 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add instance-selector cmd to toolbox #9478

Add instance-selector cmd to toolbox #9478

bwagner5 commented Jul 2, 2020 •

edited

Loading

k8s-ci-robot commented Jul 2, 2020

hakman commented Jul 2, 2020

hakman commented Jul 2, 2020

bwagner5 commented Jul 2, 2020

hakman commented Jul 2, 2020

bwagner5 commented Jul 2, 2020

hakman commented Jul 7, 2020

bwagner5 commented Jul 7, 2020

hakman commented Jul 7, 2020 •

edited

Loading

hakman commented Jul 8, 2020

bwagner5 commented Jul 8, 2020

hakman commented Aug 4, 2020

bwagner5 commented Aug 4, 2020

hakman commented Aug 4, 2020

hakman commented Aug 11, 2020 •

edited

Loading

bwagner5 commented Aug 11, 2020

hakman commented Aug 11, 2020

k8s-ci-robot commented Aug 11, 2020

Add instance-selector cmd to toolbox #9478

Add instance-selector cmd to toolbox #9478

Conversation

bwagner5 commented Jul 2, 2020 • edited Loading

Testing:

k8s-ci-robot commented Jul 2, 2020

hakman commented Jul 2, 2020

hakman commented Jul 2, 2020

bwagner5 commented Jul 2, 2020

hakman commented Jul 2, 2020

bwagner5 commented Jul 2, 2020

hakman commented Jul 7, 2020

bwagner5 commented Jul 7, 2020

hakman commented Jul 7, 2020 • edited Loading

hakman commented Jul 8, 2020

bwagner5 commented Jul 8, 2020

hakman commented Aug 4, 2020

bwagner5 commented Aug 4, 2020

hakman commented Aug 4, 2020

hakman commented Aug 11, 2020 • edited Loading

bwagner5 commented Aug 11, 2020

hakman commented Aug 11, 2020

k8s-ci-robot commented Aug 11, 2020

bwagner5 commented Jul 2, 2020 •

edited

Loading

hakman commented Jul 7, 2020 •

edited

Loading

hakman commented Aug 11, 2020 •

edited

Loading