-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding override capability to set lifecycle validation for a phase #4153
Adding override capability to set lifecycle validation for a phase #4153
Conversation
67864db
to
02ffb5b
Compare
So I am having problems testing this. Running into the same problems as before where I am not able to bypass validation. How I am setting up testing
Test CaseI want to be warned that there is not an IGW, but I do not want kops to create one. Ignore$KOPS update cluster --name $CLUSTER_NAME --phase security --phase-overrides=network=Ignore --yes This fails with a panic because kops does not look up the VPC ExistsAndWarnIfChangesThis tries to find the IGW, and keeps trying to find it. IGW does not exist. $ kops update cluster --name $CLUSTER_NAME --phase security --phase-overrides=network=ExistsAndWarnIfChanges
W1228 13:32:13.984580 96127 urls.go:71] Using base url from KOPS_BASE_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/"
W1228 13:32:14.109475 96127 urls.go:126] Using nodeup location from NODEUP_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/linux/amd64/nodeup"
I1228 13:32:14.600872 96127 executor.go:91] Tasks: 0 done / 73 total; 31 can run
I1228 13:32:14.975564 96127 executor.go:91] Tasks: 31 done / 73 total; 24 can run
W1228 13:32:15.247856 96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.247900 96127 executor.go:91] Tasks: 54 done / 73 total; 16 can run
W1228 13:32:15.314336 96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.314375 96127 executor.go:91] Tasks: 69 done / 73 total; 3 can run
W1228 13:32:15.374445 96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.374471 96127 executor.go:91] Tasks: 71 done / 73 total; 1 can run
W1228 13:32:15.446662 96127 executor.go:109] error running task "InternetGateway/us-west-2.aws.k8spro.com" (9m59s remaining to succeed): Lifecycle set to ExistsAndWarnIfChanges, but object was not found
I1228 13:32:15.446700 96127 executor.go:124] No progress made, sleeping before retrying 1 failed task(s) WarnIfInsufficientAccessI think this is the lifecycle that I should be using, but dry-run says it is going to create an IGW. When you run with $ kops update cluster --name $CLUSTER_NAME --phase security --phase-overrides network=WarnIfInsufficientAccess
W1228 13:33:58.545113 96275 urls.go:71] Using base url from KOPS_BASE_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/"
W1228 13:33:58.685635 96275 urls.go:126] Using nodeup location from NODEUP_URL env var: "https://s3.amazonaws.com/clove-kops/kops/1.8.0/linux/amd64/nodeup"
I1228 13:33:59.165235 96275 executor.go:91] Tasks: 0 done / 73 total; 31 can run
I1228 13:34:00.040046 96275 executor.go:91] Tasks: 31 done / 73 total; 24 can run
I1228 13:34:00.304322 96275 executor.go:91] Tasks: 55 done / 73 total; 16 can run
I1228 13:34:00.552913 96275 executor.go:91] Tasks: 71 done / 73 total; 2 can run
I1228 13:34:00.552962 96275 executor.go:91] Tasks: 73 done / 73 total; 0 can run
Will create resources:
IAMInstanceProfileRole/masters.us-west-2.aws.k8spro.com
InstanceProfile name:masters.us-west-2.aws.k8spro.com id:masters.us-west-2.aws.k8spro.com
Role name:masters.us-west-2.aws.k8spro.com id:AROAIYE3OPMYK6ML4S6NK
IAMInstanceProfileRole/nodes.us-west-2.aws.k8spro.com
InstanceProfile name:nodes.us-west-2.aws.k8spro.com id:nodes.us-west-2.aws.k8spro.com
Role name:nodes.us-west-2.aws.k8spro.com id:AROAJVUWZ44FXKLFASE44
IAMRolePolicy/masters.us-west-2.aws.k8spro.com
Role name:masters.us-west-2.aws.k8spro.com id:AROAIYE3OPMYK6ML4S6NK
IAMRolePolicy/nodes.us-west-2.aws.k8spro.com
Role name:nodes.us-west-2.aws.k8spro.com id:AROAJVUWZ44FXKLFASE44
InternetGateway/us-west-2.aws.k8spro.com
VPC name:us-west-2.aws.k8spro.com id:vpc-e0e32799
Shared false
SecurityGroup/masters.us-west-2.aws.k8spro.com
Description Security group for masters
VPC name:us-west-2.aws.k8spro.com id:vpc-e0e32799
RemoveExtraRules [port=22, port=443, port=2380, port=2381, port=4001, port=4002, port=4789, port=179]
SecurityGroup/nodes.us-west-2.aws.k8spro.com
Description Security group for nodes
VPC name:us-west-2.aws.k8spro.com id:vpc-e0e32799
RemoveExtraRules [port=22]
SecurityGroupRule/all-master-to-master
SecurityGroup name:masters.us-west-2.aws.k8spro.com
SourceGroup name:masters.us-west-2.aws.k8spro.com
SecurityGroupRule/all-master-to-node
SecurityGroup name:nodes.us-west-2.aws.k8spro.com
SourceGroup name:masters.us-west-2.aws.k8spro.com
SecurityGroupRule/all-node-to-node
SecurityGroup name:nodes.us-west-2.aws.k8spro.com
SourceGroup name:nodes.us-west-2.aws.k8spro.com
SecurityGroupRule/https-external-to-master-0.0.0.0/0
SecurityGroup name:masters.us-west-2.aws.k8spro.com
CIDR 0.0.0.0/0
Protocol tcp
FromPort 443
ToPort 443
SecurityGroupRule/master-egress
SecurityGroup name:masters.us-west-2.aws.k8spro.com
CIDR 0.0.0.0/0
Egress true
SecurityGroupRule/node-egress
SecurityGroup name:nodes.us-west-2.aws.k8spro.com
CIDR 0.0.0.0/0
Egress true
SecurityGroupRule/node-to-master-tcp-1-2379
SecurityGroup name:masters.us-west-2.aws.k8spro.com
Protocol tcp
FromPort 1
ToPort 2379
SourceGroup name:nodes.us-west-2.aws.k8spro.com
SecurityGroupRule/node-to-master-tcp-2382-4000
SecurityGroup name:masters.us-west-2.aws.k8spro.com
Protocol tcp
FromPort 2382
ToPort 4000
SourceGroup name:nodes.us-west-2.aws.k8spro.com
SecurityGroupRule/node-to-master-tcp-4003-65535
SecurityGroup name:masters.us-west-2.aws.k8spro.com
Protocol tcp
FromPort 4003
ToPort 65535
SourceGroup name:nodes.us-west-2.aws.k8spro.com
SecurityGroupRule/node-to-master-udp-1-65535
SecurityGroup name:masters.us-west-2.aws.k8spro.com
Protocol udp
FromPort 1
ToPort 65535
SourceGroup name:nodes.us-west-2.aws.k8spro.com
SecurityGroupRule/ssh-external-to-master-0.0.0.0/0
SecurityGroup name:masters.us-west-2.aws.k8spro.com
CIDR 0.0.0.0/0
Protocol tcp
FromPort 22
ToPort 22
SecurityGroupRule/ssh-external-to-node-0.0.0.0/0
SecurityGroup name:nodes.us-west-2.aws.k8spro.com
CIDR 0.0.0.0/0
Protocol tcp
FromPort 22
ToPort 22
Will modify resources:
Route/0.0.0.0/0
InternetGateway <nil> -> name:us-west-2.aws.k8spro.com
Must specify --yes to apply changes Questions
|
/assign @justinsb |
02ffb5b
to
fe46e1e
Compare
/hold cancel |
cmd/kops/update_cluster.go
Outdated
|
||
for _, override := range c.PhaseOverrides { | ||
|
||
values := strings.Split(override, "=") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SplitN I think (or validate that len(values) == 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SplitN I do not think will help, validating len
cmd/kops/update_cluster.go
Outdated
|
||
phaseOverride, err := getPhase(phaseName) | ||
if err != nil { | ||
return fmt.Errorf("incorrect phase name in overides: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the phaseName in the log message (or the override
value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the getPhase func.
cmd/kops/update_cluster.go
Outdated
case string(fi.LifecycleExistsAndWarnIfChanges): | ||
lifecycleOverrideMap[phaseOverride] = fi.LifecycleExistsAndWarnIfChanges | ||
default: | ||
return fmt.Errorf("overrides error, unknown lifecycle %q, available lifecycles: %s", lifecycle, strings.Join(fi.Lifecycles.List(), ",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknown lifecycle %q in overrides, available lifecycles: %s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, could we just have a parse function instead? In particular because we can call fi.Lifecycles.List()
to avoid the switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err message is populating from the new func.
cmd/kops/update_cluster.go
Outdated
@@ -336,6 +360,21 @@ func RunUpdateCluster(f *util.Factory, clusterName string, out io.Writer, c *Upd | |||
return nil | |||
} | |||
|
|||
func getPhase(phase string) (cloudup.Phase, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe parsePhase
(and parseLifeycle
)
upup/pkg/fi/cloudup/apply_cluster.go
Outdated
@@ -123,6 +123,9 @@ type ApplyClusterCmd struct { | |||
|
|||
// Phase can be set to a Phase to run the specific subset of tasks, if we don't want to run everything | |||
Phase Phase | |||
|
|||
// LifecycleOverrides contains a map of lifecycle values to fi.Lifecycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment just restates the type signature. Remove, or replace with something that describes what it is for (e.g. how does this interact with phases)
@@ -198,6 +201,11 @@ func (c *ApplyClusterCmd) Run() error { | |||
return fmt.Errorf("unknown phase %q", c.Phase) | |||
} | |||
|
|||
stageAssetsLifecycle = c.overrideLifecycle(PhaseStageAssets, stageAssetsLifecycle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it would be simpler to pass in the map of phases to lifecycle values into applycluster.
Then as you parse them overrides, you can simply set the overrides.
Also, we know we're going to want more sophisticated logic in future - a lifecycle oracle class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a bit more context here.
Looks like it would be simpler to pass in the map of phases to lifecycle values into applycluster.
I am passing in:
LifecycleOverrides map[Phase]fi.Lifecycle
Which is a map of phases => lifecycle. Not sure what you are recommending.
upup/pkg/fi/cloudup/phase.go
Outdated
|
||
// PhaseLifecycleMap map of phases to lifecycle name | ||
var PhaseLifecycleMap = map[Phase]string{ | ||
PhaseStageAssets: "stageAssetsLifecycle", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the word "lifecycle" belongs in here - these are phases, not lifecycles
@@ -39,3 +41,12 @@ const ( | |||
type HasLifecycle interface { | |||
GetLifecycle() *Lifecycle | |||
} | |||
|
|||
// Phases are used for validation and cli help. | |||
var Lifecycles = sets.NewString( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use []fi.Lifecycle
, so we don't give up type safety? We might need another helper or two, but that's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place we are using this is to output the lifecycle names in the CLI UX. Because of that, we are not needing type safety. Is there another place that you assumed was using this?
@@ -109,6 +110,7 @@ func NewCmdUpdateCluster(f *util.Factory, out io.Writer) *cobra.Command { | |||
cmd.Flags().StringVar(&options.OutDir, "out", options.OutDir, "Path to write any local output") | |||
cmd.Flags().BoolVar(&options.CreateKubecfg, "create-kube-config", options.CreateKubecfg, "Will control automatically creating the kube config file on your local filesystem") | |||
cmd.Flags().StringVar(&options.Phase, "phase", options.Phase, "Subset of tasks to run: "+strings.Join(cloudup.Phases.List(), ", ")) | |||
cmd.Flags().StringSliceVar(&options.PhaseOverrides, "phase-overrides", options.PhaseOverrides, "comma separated list of phase overrides, example: assets=ExistsAndWarnIfChanges,network=ExistsAndValidates") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be cleaner to have one flag here. When we get down to the per-task overrides, what do you imagine the syntax to look like (e.g. cluster,SecurityGroupRule=ExistsAndWarnIfChanges
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would rather have this in the api, but another PR for that.
For one I strongly disagree with you. It is super confusing to me to not have a separate flag.
I understand where you are coming from, and I can see you point, but I would typo a single flags so easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be cleaner to have one flag here. When we get down to the per-task overrides, what do you imagine the syntax to look like (e.g. cluster,SecurityGroupRule=ExistsAndWarnIfChanges)?
Need more context here. Are you recommending re-using the phases flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking how we should express: "cluster phase, but do ExistsAndWarnIfChanges on the SecurityGroupRules". Or - for example - "iam, but skip security groups" or "only security groups". If we get that right, we don't need #4434 for every single special-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here is an example of what this PR provides:
kops update cluster --phase cluster --phase-overrides "assets=Ignore,network=ExistsAndValidates,iam=WarnIfInsufficientAccess,security=ExistsAndWarnIfChanges" --yes
This runs cluster phase and
- assets phase are ignored
- network phase is validated and infrastructure will be created
- IAM phase should only be a warning (not sure if this is fully working, as I mentioned in the PR above)
- Security groups phase will be validated and updated warned if changes exist
- Cluster phase will be synced
So that is where we are at, not certain what is in scope with this PR that will give you:
If we get that right, we don't need #4434 for every single special-case.
I think we will need #4434 because phase does not exist unless I am not understanding how you want this done. The IAM phase does not exist, so we cannot adjust how the lifecycle behaves. Please help me out here to understand if there is a better way.
Not needing phases for every different task sounds to me like a change in how the phases and lifecycles work with the tasks, which is kinda a lot more than what I am trying to do in this PR. I like the idea, but is it simple to implement? How do we implement it?
Need some more actionable direction here, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking:
"cluster phase, but do ExistsAndWarnIfChanges on the SecurityGroupRules" => --phase=cluster,SecurityGroupRule=ExistsAndWarnIfChanges
"iam, but skip security groups" => --phase=iam,SecurityGroup=ExistsAndWarnIfChanges,SecurityGroupRule=ExistsAndWarnIfChanges
or --phase=iam,-SecurityGroup,-SecurityGroupRule
"only security groups" => "--phase=SecurityGroup,SecurityGroupRule"
If we have that, do we still need #4434?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we will still need #4434, cause we gotta have a phase. I can do the flag like that, but to me two flags are more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well re-reading what you wrote, how the heck do I code that ... yes if we named each task, we could provide super fine grain flexibility, but not certain how to code that. Any ideas? Let me chew on it a bit
This PR adds the --phase-overrides to kops update cluster command. This flag accepts a comma seperated list of phases and lifecycle. This feature allows for a user to have fine grain control over how kops validates specific phases.
fe46e1e
to
d75d535
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrislovecnm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these OWNERS Files:
Approvers can indicate their approval by writing |
@chrislovecnm: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Closing since we have #4445 now |
This PR adds the --phase-overrides to kops update cluster command. This
flag accepts a comma seperated list of phases and lifecycle. This
feature allows for a user to have fine grain control over how kops
validates specific phases.