Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding override capability to set lifecycle validation for a phase #4153

Closed

Conversation

chrislovecnm
Copy link
Contributor

This PR adds the --phase-overrides to kops update cluster command. This
flag accepts a comma seperated list of phases and lifecycle. This
feature allows for a user to have fine grain control over how kops
validates specific phases.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 27, 2017
@chrislovecnm chrislovecnm force-pushed the phases-config-warnifchanges branch 2 times, most recently from 67864db to 02ffb5b Compare December 28, 2017 20:25
@chrislovecnm
Copy link
Contributor Author

chrislovecnm commented Dec 28, 2017

So I am having problems testing this. Running into the same problems as before where I am not able to bypass validation.

How I am setting up testing

  • $KOPS create -f $KOPS_FILE
  • $KOPS create secret --name $CLUSTER_NAME sshpublickey admin -i ~/.ssh/id_rsa.pub
  • $KOPS update cluster --phase assets $CLUSTER_NAME --yes
  • $KOPS update cluster --phase network $CLUSTER_NAME --yes
  • remove the internet gw from the vpc

Test Case

I want to be warned that there is not an IGW, but I do not want kops to create one.

Ignore

$KOPS update cluster --name $CLUSTER_NAME --phase security  --phase-overrides=network=Ignore --yes

This fails with a panic because kops does not look up the VPC

ExistsAndWarnIfChanges

This tries to find the IGW, and keeps trying to find it. IGW does not exist.

$ kops update cluster --name $CLUSTER_NAME --phase security  --phase-overrides=network=ExistsAndWarnIfChanges
W1228 13:32:13.984580   96127 urls.go:71] Using base url from KOPS_BASE_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/"
W1228 13:32:14.109475   96127 urls.go:126] Using nodeup location from NODEUP_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/linux/amd64/nodeup"
I1228 13:32:14.600872   96127 executor.go:91] Tasks: 0 done / 73 total; 31 can run
I1228 13:32:14.975564   96127 executor.go:91] Tasks: 31 done / 73 total; 24 can run
W1228 13:32:15.247856   96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.247900   96127 executor.go:91] Tasks: 54 done / 73 total; 16 can run
W1228 13:32:15.314336   96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.314375   96127 executor.go:91] Tasks: 69 done / 73 total; 3 can run
W1228 13:32:15.374445   96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.374471   96127 executor.go:91] Tasks: 71 done / 73 total; 1 can run
W1228 13:32:15.446662   96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.446700   96127 executor.go:124] No progress made, sleeping before retrying 1 failed task(s)

WarnIfInsufficientAccess

I think this is the lifecycle that I should be using, but dry-run says it is going to create an IGW. When you run with --yes it does create an IGW.

$ kops update cluster --name $CLUSTER_NAME --phase security  --phase-overrides network=WarnIfInsufficientAccess
W1228 13:33:58.545113   96275 urls.go:71] Using base url from KOPS_BASE_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/"
W1228 13:33:58.685635   96275 urls.go:126] Using nodeup location from NODEUP_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/linux/amd64/nodeup"
I1228 13:33:59.165235   96275 executor.go:91] Tasks: 0 done / 73 total; 31 can run
I1228 13:34:00.040046   96275 executor.go:91] Tasks: 31 done / 73 total; 24 can run
I1228 13:34:00.304322   96275 executor.go:91] Tasks: 55 done / 73 total; 16 can run
I1228 13:34:00.552913   96275 executor.go:91] Tasks: 71 done / 73 total; 2 can run
I1228 13:34:00.552962   96275 executor.go:91] Tasks: 73 done / 73 total; 0 can run
Will create resources:
  IAMInstanceProfileRole/masters.us-west-2.aws.k8spro.com
  	InstanceProfile     	name:masters.us-west-2.aws.k8spro.com id:masters.us-west-2.aws.k8spro.com
  	Role                	name:masters.us-west-2.aws.k8spro.com id:AROAIYE3OPMYK6ML4S6NK

  IAMInstanceProfileRole/nodes.us-west-2.aws.k8spro.com
  	InstanceProfile     	name:nodes.us-west-2.aws.k8spro.com id:nodes.us-west-2.aws.k8spro.com
  	Role                	name:nodes.us-west-2.aws.k8spro.com id:AROAJVUWZ44FXKLFASE44

  IAMRolePolicy/masters.us-west-2.aws.k8spro.com
  	Role                	name:masters.us-west-2.aws.k8spro.com id:AROAIYE3OPMYK6ML4S6NK

  IAMRolePolicy/nodes.us-west-2.aws.k8spro.com
  	Role                	name:nodes.us-west-2.aws.k8spro.com id:AROAJVUWZ44FXKLFASE44

  InternetGateway/us-west-2.aws.k8spro.com
  	VPC                 	name:us-west-2.aws.k8spro.com id:vpc-e0e32799
  	Shared              	false

  SecurityGroup/masters.us-west-2.aws.k8spro.com
  	Description         	Security group for masters
  	VPC                 	name:us-west-2.aws.k8spro.com id:vpc-e0e32799
  	RemoveExtraRules    	[port=22, port=443, port=2380, port=2381, port=4001, port=4002, port=4789, port=179]

  SecurityGroup/nodes.us-west-2.aws.k8spro.com
  	Description         	Security group for nodes
  	VPC                 	name:us-west-2.aws.k8spro.com id:vpc-e0e32799
  	RemoveExtraRules    	[port=22]

  SecurityGroupRule/all-master-to-master
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	SourceGroup         	name:masters.us-west-2.aws.k8spro.com

  SecurityGroupRule/all-master-to-node
  	SecurityGroup       	name:nodes.us-west-2.aws.k8spro.com
  	SourceGroup         	name:masters.us-west-2.aws.k8spro.com

  SecurityGroupRule/all-node-to-node
  	SecurityGroup       	name:nodes.us-west-2.aws.k8spro.com
  	SourceGroup         	name:nodes.us-west-2.aws.k8spro.com

  SecurityGroupRule/https-external-to-master-0.0.0.0/0
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	CIDR                	0.0.0.0/0
  	Protocol            	tcp
  	FromPort            	443
  	ToPort              	443

  SecurityGroupRule/master-egress
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	CIDR                	0.0.0.0/0
  	Egress              	true

  SecurityGroupRule/node-egress
  	SecurityGroup       	name:nodes.us-west-2.aws.k8spro.com
  	CIDR                	0.0.0.0/0
  	Egress              	true

  SecurityGroupRule/node-to-master-tcp-1-2379
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	Protocol            	tcp
  	FromPort            	1
  	ToPort              	2379
  	SourceGroup         	name:nodes.us-west-2.aws.k8spro.com

  SecurityGroupRule/node-to-master-tcp-2382-4000
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	Protocol            	tcp
  	FromPort            	2382
  	ToPort              	4000
  	SourceGroup         	name:nodes.us-west-2.aws.k8spro.com

  SecurityGroupRule/node-to-master-tcp-4003-65535
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	Protocol            	tcp
  	FromPort            	4003
  	ToPort              	65535
  	SourceGroup         	name:nodes.us-west-2.aws.k8spro.com

  SecurityGroupRule/node-to-master-udp-1-65535
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	Protocol            	udp
  	FromPort            	1
  	ToPort              	65535
  	SourceGroup         	name:nodes.us-west-2.aws.k8spro.com

  SecurityGroupRule/ssh-external-to-master-0.0.0.0/0
  	SecurityGroup       	name:masters.us-west-2.aws.k8spro.com
  	CIDR                	0.0.0.0/0
  	Protocol            	tcp
  	FromPort            	22
  	ToPort              	22

  SecurityGroupRule/ssh-external-to-node-0.0.0.0/0
  	SecurityGroup       	name:nodes.us-west-2.aws.k8spro.com
  	CIDR                	0.0.0.0/0
  	Protocol            	tcp
  	FromPort            	22
  	ToPort              	22

Will modify resources:
  Route/0.0.0.0/0
  	InternetGateway     	 <nil> -> name:us-west-2.aws.k8spro.com

Must specify --yes to apply changes

Questions

  1. Which lifecycle should I be using for the test case?
  2. Do we need another lifecycle?
  3. Should WarnIfInsufficientAccess not create the IGW?

@chrislovecnm
Copy link
Contributor Author

/assign @justinsb
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 28, 2017
@chrislovecnm chrislovecnm changed the title [WIP] Adding override capability to set lifecycle validation for a phase Adding override capability to set lifecycle validation for a phase Dec 28, 2017
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 28, 2017
@chrislovecnm chrislovecnm force-pushed the phases-config-warnifchanges branch from 02ffb5b to fe46e1e Compare December 28, 2017 20:40
@chrislovecnm
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 7, 2018

for _, override := range c.PhaseOverrides {

values := strings.Split(override, "=")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitN I think (or validate that len(values) == 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitN I do not think will help, validating len


phaseOverride, err := getPhase(phaseName)
if err != nil {
return fmt.Errorf("incorrect phase name in overides: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include the phaseName in the log message (or the override value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the getPhase func.

case string(fi.LifecycleExistsAndWarnIfChanges):
lifecycleOverrideMap[phaseOverride] = fi.LifecycleExistsAndWarnIfChanges
default:
return fmt.Errorf("overrides error, unknown lifecycle %q, available lifecycles: %s", lifecycle, strings.Join(fi.Lifecycles.List(), ","))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unknown lifecycle %q in overrides, available lifecycles: %s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, could we just have a parse function instead? In particular because we can call fi.Lifecycles.List() to avoid the switch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err message is populating from the new func.

@@ -336,6 +360,21 @@ func RunUpdateCluster(f *util.Factory, clusterName string, out io.Writer, c *Upd
return nil
}

func getPhase(phase string) (cloudup.Phase, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe parsePhase (and parseLifeycle)

@@ -123,6 +123,9 @@ type ApplyClusterCmd struct {

// Phase can be set to a Phase to run the specific subset of tasks, if we don't want to run everything
Phase Phase

// LifecycleOverrides contains a map of lifecycle values to fi.Lifecycle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment just restates the type signature. Remove, or replace with something that describes what it is for (e.g. how does this interact with phases)

@@ -198,6 +201,11 @@ func (c *ApplyClusterCmd) Run() error {
return fmt.Errorf("unknown phase %q", c.Phase)
}

stageAssetsLifecycle = c.overrideLifecycle(PhaseStageAssets, stageAssetsLifecycle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it would be simpler to pass in the map of phases to lifecycle values into applycluster.

Then as you parse them overrides, you can simply set the overrides.

Also, we know we're going to want more sophisticated logic in future - a lifecycle oracle class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a bit more context here.

Looks like it would be simpler to pass in the map of phases to lifecycle values into applycluster.

I am passing in:

LifecycleOverrides map[Phase]fi.Lifecycle

Which is a map of phases => lifecycle. Not sure what you are recommending.


// PhaseLifecycleMap map of phases to lifecycle name
var PhaseLifecycleMap = map[Phase]string{
PhaseStageAssets: "stageAssetsLifecycle",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the word "lifecycle" belongs in here - these are phases, not lifecycles

@@ -39,3 +41,12 @@ const (
type HasLifecycle interface {
GetLifecycle() *Lifecycle
}

// Phases are used for validation and cli help.
var Lifecycles = sets.NewString(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use []fi.Lifecycle, so we don't give up type safety? We might need another helper or two, but that's OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place we are using this is to output the lifecycle names in the CLI UX. Because of that, we are not needing type safety. Is there another place that you assumed was using this?

@@ -109,6 +110,7 @@ func NewCmdUpdateCluster(f *util.Factory, out io.Writer) *cobra.Command {
cmd.Flags().StringVar(&options.OutDir, "out", options.OutDir, "Path to write any local output")
cmd.Flags().BoolVar(&options.CreateKubecfg, "create-kube-config", options.CreateKubecfg, "Will control automatically creating the kube config file on your local filesystem")
cmd.Flags().StringVar(&options.Phase, "phase", options.Phase, "Subset of tasks to run: "+strings.Join(cloudup.Phases.List(), ", "))
cmd.Flags().StringSliceVar(&options.PhaseOverrides, "phase-overrides", options.PhaseOverrides, "comma separated list of phase overrides, example: assets=ExistsAndWarnIfChanges,network=ExistsAndValidates")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be cleaner to have one flag here. When we get down to the per-task overrides, what do you imagine the syntax to look like (e.g. cluster,SecurityGroupRule=ExistsAndWarnIfChanges)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would rather have this in the api, but another PR for that.

For one I strongly disagree with you. It is super confusing to me to not have a separate flag.

I understand where you are coming from, and I can see you point, but I would typo a single flags so easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be cleaner to have one flag here. When we get down to the per-task overrides, what do you imagine the syntax to look like (e.g. cluster,SecurityGroupRule=ExistsAndWarnIfChanges)?

Need more context here. Are you recommending re-using the phases flag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking how we should express: "cluster phase, but do ExistsAndWarnIfChanges on the SecurityGroupRules". Or - for example - "iam, but skip security groups" or "only security groups". If we get that right, we don't need #4434 for every single special-case.

Copy link
Contributor Author

@chrislovecnm chrislovecnm Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here is an example of what this PR provides:

kops update cluster --phase cluster --phase-overrides "assets=Ignore,network=ExistsAndValidates,iam=WarnIfInsufficientAccess,security=ExistsAndWarnIfChanges" --yes

This runs cluster phase and

  1. assets phase are ignored
  2. network phase is validated and infrastructure will be created
  3. IAM phase should only be a warning (not sure if this is fully working, as I mentioned in the PR above)
  4. Security groups phase will be validated and updated warned if changes exist
  5. Cluster phase will be synced

So that is where we are at, not certain what is in scope with this PR that will give you:

If we get that right, we don't need #4434 for every single special-case.

I think we will need #4434 because phase does not exist unless I am not understanding how you want this done. The IAM phase does not exist, so we cannot adjust how the lifecycle behaves. Please help me out here to understand if there is a better way.

Not needing phases for every different task sounds to me like a change in how the phases and lifecycles work with the tasks, which is kinda a lot more than what I am trying to do in this PR. I like the idea, but is it simple to implement? How do we implement it?

Need some more actionable direction here, please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking:

"cluster phase, but do ExistsAndWarnIfChanges on the SecurityGroupRules" => --phase=cluster,SecurityGroupRule=ExistsAndWarnIfChanges
"iam, but skip security groups" => --phase=iam,SecurityGroup=ExistsAndWarnIfChanges,SecurityGroupRule=ExistsAndWarnIfChanges or --phase=iam,-SecurityGroup,-SecurityGroupRule
"only security groups" => "--phase=SecurityGroup,SecurityGroupRule"

If we have that, do we still need #4434?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we will still need #4434, cause we gotta have a phase. I can do the flag like that, but to me two flags are more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well re-reading what you wrote, how the heck do I code that ... yes if we named each task, we could provide super fine grain flexibility, but not certain how to code that. Any ideas? Let me chew on it a bit

This PR adds the --phase-overrides to kops update cluster command.  This
flag accepts a comma seperated list of phases and lifecycle.  This
feature allows for a user to have fine grain control over how kops
validates specific phases.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@chrislovecnm chrislovecnm force-pushed the phases-config-warnifchanges branch from fe46e1e to d75d535 Compare February 13, 2018 01:23
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrislovecnm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2018
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2018
@k8s-ci-robot
Copy link
Contributor

@chrislovecnm: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chrislovecnm
Copy link
Contributor Author

Closing since we have #4445 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants