Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[occm] Allow changing cluster-name on existing deployments #2552

Merged
merged 1 commit into from
Apr 8, 2024

Conversation

dulek
Copy link
Contributor

@dulek dulek commented Feb 16, 2024

What this PR does / why we need it:
It's a common issue that clusters are deployed with the default --cluster-name=kubernetes and later on it's discovered that next deployments on the same cloud will have conflicts when trying to manage LBs of the same namespace and name.

This commit aims at allowing to change the cluster-name on a running environment and handling all the renames of the LB resources and their tags.

Which issue this PR fixes(if applicable):
fixes #2241

Special notes for reviewers:

Release note:

It's now allowed to change the `--cluster-name` on existing deployments and OCCM will handle all the LB renames.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 16, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 16, 2024
@dulek dulek marked this pull request as draft February 16, 2024 12:51
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 16, 2024
@dulek
Copy link
Contributor Author

dulek commented Feb 16, 2024

This is a WiP, definitely needs unit tests and some more manual testing from me.

@dulek dulek force-pushed the rename-lbs branch 2 times, most recently from 7f29797 to bf40e6b Compare February 16, 2024 12:55
@jichenjc
Copy link
Contributor

jichenjc commented Mar 4, 2024

It's a common issue that clusters are deployed with the default --cluster-name=kubernetes and later on it's discovered that next deployments on the same cloud will have conflicts when trying to manage LBs of the same namespace and name.

I remember it's been discussed multiple times before (with --cluster-name=kubernetes)
at that time seems the conclusion is related to cloud provider itself or I might remember wrong thing
but I think it's worthy to open an issue to have some discussion there to get agreement on the goal and fix proposal..

@dulek
Copy link
Contributor Author

dulek commented Mar 5, 2024

@jichenjc: I dug out #2241 and added it to this PR. Citing @zetaab from there:

the problem is that quite many cluster still uses default clusterName which is kubernetes. It is also difficult to migrate away from it.

This is what the PR proposes - to make it easier to migrate away from a wrong clusterName.

@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2024
It's a common issue that clusters are deployed with the default
`--cluster-name=kubernetes` and later on it's discovered that next
deployments on the same cloud will have conflicts when trying to manage
LBs of the same namespace and name.

This commit aims at allowing to change the cluster-name on a running
environment and handling all the renames of the LB resources and their
tags.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2024
dulek added a commit to shiftstack/cluster-cloud-controller-manager-operator that referenced this pull request Apr 3, 2024
By an terrible oversight, since the in-tree, OpenStack cloud provider
was run without passing `--cluster-name` argument, meaning that it just
used the default of "kubernetes" as the cluster name when constructing
the names of the LB resources. Obviously this led to conflicts when
there were multiple OCP clusters in a single OpenStack tenant and it's
still a challenge for example for QE running tests on a single cloud.

This issue couldn't be fixed easily, because we haven't had an upgrade
story solved. Now kubernetes/cloud-provider-openstack#2552 is an attempt
to make CCM rename the LBs when the cluster-name is reconfigured. This
should allow us to finally solve the name conflict issues.

This commit makes sure that CCMO deploys OpenStack CCM passing
`--cluster-name`.
@dulek
Copy link
Contributor Author

dulek commented Apr 3, 2024

Hey, @stephenfin, could you take a preliminary look?

@dulek
Copy link
Contributor Author

dulek commented Apr 3, 2024

And you @gryf.

dulek added a commit to shiftstack/cluster-cloud-controller-manager-operator that referenced this pull request Apr 4, 2024
By an terrible oversight, since the in-tree, OpenStack cloud provider
was run without passing `--cluster-name` argument, meaning that it just
used the default of "kubernetes" as the cluster name when constructing
the names of the LB resources. Obviously this led to conflicts when
there were multiple OCP clusters in a single OpenStack tenant and it's
still a challenge for example for QE running tests on a single cloud.

This issue couldn't be fixed easily, because we haven't had an upgrade
story solved. Now kubernetes/cloud-provider-openstack#2552 is an attempt
to make CCM rename the LBs when the cluster-name is reconfigured. This
should allow us to finally solve the name conflict issues.

This commit makes sure that CCMO deploys OpenStack CCM passing
`--cluster-name`.
@dulek dulek marked this pull request as ready for review April 5, 2024 12:23
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2024
@dulek
Copy link
Contributor Author

dulek commented Apr 5, 2024

Alright, I tested this through and through, it's safe.

@stephenfin
Copy link
Member

Nice work. The logic looks good as does the structure. No complaints from my end.

/approve

@gryf
Copy link
Contributor

gryf commented Apr 5, 2024

Looks good from my side as well. Good job, @dulek!

// lbHasOldClusterName checks if the OCCM LB prefix is present and if so, validates the cluster-name
// component value. Returns true if the cluster-name component of the loadbalancer's name doesn't match
// clusterName.
func lbHasOldClusterName(loadbalancer *loadbalancers.LoadBalancer, clusterName string) bool {
Copy link
Member

@zetaab zetaab Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so basically this will assume that all "kubernetes" loadbalancers are from old clustername? This is quite risky if someone have now two kubernetes clusters named "kubernetes" in same openstack project. However, that kind of setup is already problematic but I think someone is doing that

Like:
cluster1 named "kubernetes"
cluster2 named "kubernetes"

now cluster1 will rename all loadbalancers to "cluster1". It will also rename cluster2 loadbalancers incorrectly (and will break things)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach will still be safe when there are multiple clusters using default cluster name, assuming that there were no conflicting LBs, which would be not functional anyway. Note that renameLoadBalancer() is only called from ensureLoadBalancer() [1] meaning that it's not looking up all LBs the tenant sees in OpenStack, but rather reacts on Services that were supposed to have LBs, so renames should be constrained to a single K8s cluster.

[1] https://github.com/kubernetes/cloud-provider-openstack/pull/2552/files#diff-89af30f72ae5c8aefb464330d654b95dc41544d558d9c04c9e728a8ea9a94793R1617

@stephenfin
Copy link
Member

/lgtm

...is what I meant 😅

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 8, 2024
@jichenjc
Copy link
Contributor

jichenjc commented Apr 8, 2024

/approve

I think @zetaab you approved this :)
will read this later and input if I have more time

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jichenjc, stephenfin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2024
@k8s-ci-robot k8s-ci-robot merged commit 290e7c7 into kubernetes:master Apr 8, 2024
11 checks passed
@dulek
Copy link
Contributor Author

dulek commented Apr 8, 2024

/cherry-pick release-1.29

We'd like this to be backported to 1.29 as a bugfix. I'd like to hear opinions if that's feasible.

@k8s-infra-cherrypick-robot

@dulek: new pull request created: #2568

In response to this:

/cherry-pick release-1.29

We'd like this to be backported to 1.29 as a bugfix. I'd like to hear opinions if that's feasible.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[openstack-cloud-controller-manager] Shared tenancy OpenStack project service type LoadBalancer collision
7 participants