Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm: Remove ClusterStatus from kubeadm-config #1380

Conversation

fabriziopandini
Copy link
Member

This KEP is proposing a new mode for tracking the list of the API endpoints in a cluster, thus allowing to remove the ClusterStatus entry in the kubeadm-config ConfigMap and solve the problems that arise when, for any reasons, such entry does not reflect anymore the real status of the cluster.

/sig cluster-lifecycle
/assign @neolit123 @rosti @ereslibre @ncdc @timothysc

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Nov 24, 2019
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 24, 2019
Copy link

@ereslibre ereslibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like removing shared state.

I guess one of the motivations for this KEP is to improve the multiple control plane join at once case. Is this correct? Removing shared state is nice, but I think it's harder to block on a known shared resource (as we could with a shared configmap -- with leader election).

Here, when a control plane node joins, it will have to:

  • List nodes, retrieve annotations
  • Perform the etcd join with the list of current endpoints
  • Let the kubelet register with TLS bootstrap
  • Annotate node

I see potential race conditions if several control planes are joining at the same time, and I believe it's harder to sync on several annotations on different resources (nodes), than to sync on a shared config map. What do you think about this? Should it be mentioned under risks?


The `LocalAPIEndpoint` is also used in the stacked `etcd` pod manifest for composing the `peer-urls` and the `client-urls`; the latter is used by kubeadm when accessing etcd in an existing cluster, e.g. when doing `join --control-plane`.

We are going to echo the `client-urls` value into a new annotation named `kubeadm.kubernetes.io/etcd.advertise-client-urls`. Once the annotation will be in place, it will be possible to easily retrieve the etcd client urls by querying the `etcd` pods.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we are composing this information with:

func GetClientURL(localEndpoint *kubeadmapi.APIEndpoint) string {
	return "https://" + net.JoinHostPort(localEndpoint.AdvertiseAddress, strconv.Itoa(constants.EtcdListenClientPort))
}

So, we are today setting etcd's advertise-client-urls forcing the port to the etcd listen client one, reusing the current local endpoint advertise address.

What's the goal of this new annotation as opposed to kubeadm.kubernetes.io/kube-apiserver.advertise-address? Couldn't we theoretically compose the advertise-client-urls based on kubeadm.kubernetes.io/kube-apiserver.advertise-address?

Copy link
Member Author

@fabriziopandini fabriziopandini Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to build the list of the etcd peers looking at the current etcd pods (instead of looking at the kube-apiserver pods, which can be somehow confusing or eventually also error-prone)

This is especially helpful in the current join --control-plane sequence because the kube-apiserver and the etcd static pods are created in two separated moments

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fabriziopandini !
I like the idea very much! However, we must also ensure, that we don't break anyone who's relying on ClusterStatus-es.

During upgrades:

- The new annotations `kubeadm.kubernetes.io/kube-apiserver.advertise-address` and `kubeadm.kubernetes.io/etcd.advertise-client-urls` will be generated during the upgrade of the static pod manifests.
- The `ClusterStatus` entry will be cleaned up during the upgrade of the `kubeadm-config` ConfigMap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to guarantee that ClusterStatus-es are kept together with annotations for some time (they can be considered beta, so at least 9 months/3 cycles) not to break people.

Copy link
Member Author

@fabriziopandini fabriziopandini Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see [1]
The whole point of this KEP is that it is not possible to keep ClusterStatus up to date with the current status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed during kubeadm office hours that we would like to keep the ClusterStatus for a while.
@fabriziopandini please confirm. does this section needs a minor update?

@fabriziopandini
Copy link
Member Author

@ereslibre

Here, when a control plane node joins, it will have to:

  • List nodes, retrieve annotations
  • Perform the etcd join with the list of current endpoints
  • Let the kubelet register with TLS bootstrap
  • Annotate node

I think that there is some misunderstanding here.
This KEP is proposing to add annotations on the kube-apiserver static pod manifest and on the etcd static pod manifest. No annotation at node level is part of this proposal

The join process will be:

  • create control-plane static pod manifests (as of today, only with an additional annotation on the kubea-api server static pod manifest)
  • start kubelet (as of today)
  • get the list of current etcd pods/kubeadm.kubernetes.io/etcd.advertise-client-urls (replacement of get ClusterStatus)
  • create control-plane static pod manifests (as of today, only with an additional annotation)

Considering that I don't see potential race conditions, but happy to discuss this if this is not clear yet

Copy link

@ereslibre ereslibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarifications @fabriziopandini. This LGTM.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Sorry for my late response, pre-holidays was nuts.
I have some minor questions but I'm unblocking approval.

/approve


We are going to echo the `client-urls` value into a new annotation named `kubeadm.kubernetes.io/etcd.advertise-client-urls`. Once the annotation will be in place, it will be possible to easily retrieve the etcd client urls by querying the `etcd` pods.

### Risks and Mitigations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not checked, but if a static pod goes down is that always removed from the pod list? Also LBs would still need to update periodically.

Copy link
Member

@neolit123 neolit123 Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the static manifest is removed the mirror Pod should be removed too.
but if the manifest is in place the kubelet will try to create the Pod. it would then be possible do kubectl describe po on it to get the annotation.


### Test Plan

No additional test E2E test are required for this change because all the affected behaviors are already covered by existing E2E test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do a destructive LB update test?

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 7, 2020
@timothysc
Copy link
Member

/hold
for other comments and lgtms

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 7, 2020
@ereslibre
Copy link

/lgtm

Feel free to unhold.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2020
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments were non-blocking too.
/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2020
@fabriziopandini
Copy link
Member Author

@neolit123 @timothysc all the comments are addressed, please PTAL

@neolit123
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, fabriziopandini, neolit123, rosti, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123
Copy link
Member

please "hold cancel" before Tuesday next week.

@ereslibre
Copy link

Tim's comments were non blocking and there's consensus on this KEP. Thank you everyone!

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2020
@k8s-ci-robot k8s-ci-robot merged commit 2367af9 into kubernetes:master Jan 24, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants