kubeadm: deprecate the `ClusterStatus` dependency #87656

ereslibre · 2020-01-29T16:52:57Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
While ClusterStatus will be maintained and uploaded, it won't be
used by the internal kubeadm logic in order to determine the etcd
endpoints anymore.

The only exception is during the first upgrade cycle (kubeadm upgrade apply, kubeadm upgrade node), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.

Which issue(s) this PR fixes:
Implements kubernetes/enhancements#1380

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: The ClusterStatus struct present in the kubeadm-config ConfigMap is deprecated and will be removed on a future version. It is going to be maintained by kubeadm until it gets removed. The same information can be found on `etcd` and `kube-apiserver` pod annotations, `kubeadm.kubernetes.io/etcd.advertise-client-urls` and `kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint` respectively.

@kubernetes/sig-cluster-lifecycle @kubernetes/sig-cluster-lifecycle-pr-reviews

ereslibre · 2020-01-29T17:00:12Z

/assign @fabriziopandini @neolit123 @rosti

neolit123 · 2020-01-29T20:55:17Z

applying hold for review.
/hold

rosti

Thanks @ereslibre !
Overall, I like how this is going.

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go

cmd/kubeadm/app/util/config/cluster_test.go

cmd/kubeadm/app/util/config/cluster.go

cmd/kubeadm/app/util/etcd/etcd.go

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go

ereslibre · 2020-01-31T12:23:10Z

Thanks @rosti for your review!, this is ready for another pass.

rosti

Thanks @ereslibre !
I feel that we need to focus our unit tests on the utility funcs. That way we can make more thorough and easy to follow spec as some problems may be hidden by our broad spec tests.

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go

cmd/kubeadm/app/util/config/cluster.go

cmd/kubeadm/app/apis/kubeadm/apiendpoint_test.go

cmd/kubeadm/app/constants/constants.go

cmd/kubeadm/app/util/etcd/etcd.go

cmd/kubeadm/app/util/config/cluster.go

neolit123 · 2020-02-10T14:43:23Z

cmd/kubeadm/app/util/config/cluster.go

-	e, ok := clusterStatus.APIEndpoints[nodeName]
-	if !ok {
-		return errors.New("failed to get APIEndpoint information for this node")
+func getRawAPIEndpointFromPodAnnotationWithoutRetry(client clientset.Interface, nodeName string) (string, error) {


do we need the ...WithoutRetry here, given there are no other getRawAPIEndpointFromPodAnnotation variants?
i think i saw comments on this topic before.

EDIT: ok i see there is also getRawEtcdEndpointsFromPodAnnotation (plural) and getRawEtcdEndpointsFromPodAnnotationWithoutRetry.
IMO the default should be "without-retry".

i must admit the interface is becoming a bit confusing to me with some many func. variants.

The way functions were splitted were to ease unit testing for each one:

getAPIEndpoint just calls to getAPIEndpointWithBackoff with constants.StaticPodMirroringDefaultRetry: no explicit testing needed here.

getAPIEndpointWithBackoff: first tries to retrieve the endpoint with getAPIEndpointFromPodAnnotation, if it's not present, try to use the cluster status with getAPIEndpointFromClusterStatus. Tests ensure that this behavior happens.

getAPIEndpointFromPodAnnotation: tries to fetch the api endpoint from pod annotations with a backoff. Tests ensure that this behavior happens. Calls to getAPIEndpointFromPodAnnotationWithoutRetry.

getAPIEndpointFromPodAnnotationWithoutRetry: tries to fetch the api endpoint from pod annotations without any kind of retry. Tests ensure that this behavior happens.

getAPIEndpointFromClusterStatus: already existed.

Please, note that this will be very simplified when we remove the ClusterStatus. I tried to separate all concerns on this interim while we must fallback to the cluster status if pod annotations are missing.

I see where you are going, but can't we merge getAPIEndpointWithBackoff and getAPIEndpoint?

in the code base we already have the "WithRetry" naming, but not "WithoutRetry"
it seems a bit odd that we now have a default that always retries.

Requested here: #87656 (comment)

rosti

Thanks @ereslibre !
Looks good to me! Only minor naming nits at this point.
I'll nevertheless hold for a review by @fabriziopandini as he is the KEP author and may spot some detail that I've missed.
/lgtm
/hold

cmd/kubeadm/app/util/config/cluster.go

rosti · 2020-02-11T13:27:45Z

cmd/kubeadm/app/util/config/cluster.go

-	e, ok := clusterStatus.APIEndpoints[nodeName]
-	if !ok {
-		return errors.New("failed to get APIEndpoint information for this node")
+func getRawAPIEndpointFromPodAnnotationWithoutRetry(client clientset.Interface, nodeName string) (string, error) {


I see where you are going, but can't we merge getAPIEndpointWithBackoff and getAPIEndpoint?

rosti · 2020-02-11T13:28:20Z

cmd/kubeadm/app/util/etcd/etcd.go

@@ -127,6 +122,95 @@ func NewFromCluster(client clientset.Interface, certificatesDir string) (*Client
 	return etcdClient, nil
 }

+// getEtcdEndpoints returns the list of etcd endpoints.
+func getEtcdEndpoints(client clientset.Interface) ([]string, error) {
+	return getEtcdEndpointsWithBackoff(client, constants.StaticPodMirroringDefaultRetry)


Again, can we merge getEtcdEndpoints and getEtcdEndpointsWithBackoff?

The reason for these functions is that getEtcdEndpoints and getAPIEndpoint don't need testing (they have no logic). We test their WithBackoff counterparts, where we can control the backoff on the unit tests, so we have faster unit test execution and controlled with the backoff required depending on the test cases we are stubbing.

fabriziopandini

@ereslibre this is turning out well.
The only nit from my side is about avoiding to test the ExponentialBackoff behavior

cmd/kubeadm/app/util/config/cluster_test.go

cmd/kubeadm/app/util/etcd/etcd_test.go

neolit123 · 2020-02-13T16:41:55Z

/retitle kubeadm: deprecate the ClusterStatus dependency

ereslibre · 2020-02-19T15:01:51Z

This should be ready now @kubernetes/sig-cluster-lifecycle-pr-reviews.

rosti

Thanks @ereslibre !
/lgtm

ereslibre · 2020-02-19T21:02:14Z

/retest

neolit123 · 2020-02-20T08:40:54Z

/priority important-longterm

neolit123 · 2020-02-20T08:41:50Z

@ereslibre bazel needs a update:

pull-kubernetes-verify — Job failed.

While `ClusterStatus` will be maintained and uploaded, it won't be used by the internal `kubeadm` logic in order to determine the etcd endpoints anymore. The only exception is during the first upgrade cycle (`kubeadm upgrade apply`, `kubeadm upgrade node`), in which we will fallback to the ClusterStatus to let the upgrade path add the required annotations to the newly created static pods.

When doing the very first upgrade from a cluster that contains the source of truth in the ClusterStatus struct, the new kubeadm logic will try to retrieve this information from annotations. This changeset adds to both etcd and apiserver endpoint retrieval the special case in which they won't retry if we are in such cases. The logic will retry if we find any unknown error, but will not retry in the following cases: - etcd annotations do not contain etcd endpoints, but the overall list of etcd pods is greater than 0. This means that we listed at least one etcd pod, but they are missing the annotation. - API server annotation is not found on the api server pod for a given node name, but no errors aside from that one were found. This means that the API server pod is present, but is missing the annotation. In both cases there is no point in retrying, and so, this speeds up the upgrade path when coming from a previous existing cluster.

ereslibre · 2020-02-20T11:20:05Z

bazel needs a update:

Ouch, updated and took the opportunity to rebase on top of latest master.

fabriziopandini

Thanks @ereslibre for addressing all the comments!
Let's keep an eye on the test grid now!
/hold cancel
/approve
/lgtm

k8s-ci-robot · 2020-02-22T21:37:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [ereslibre,fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2020-02-23T01:08:41Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2020-02-23T04:17:32Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

k8s-ci-robot requested review from detiber and yagonobre January 29, 2020 16:53

k8s-ci-robot added area/kubeadm approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 29, 2020

k8s-ci-robot assigned fabriziopandini, neolit123 and rosti Jan 29, 2020

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2020

rosti reviewed Jan 30, 2020

View reviewed changes

ereslibre commented Jan 31, 2020

View reviewed changes

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go Show resolved Hide resolved

ereslibre requested a review from rosti January 31, 2020 12:07

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2020

rosti reviewed Jan 31, 2020

View reviewed changes

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go Show resolved Hide resolved

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved

ereslibre commented Jan 31, 2020

View reviewed changes

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved

ereslibre changed the title ~~kubeadm: Remove ClusterStatus dependency~~ WIP: kubeadm: Remove ClusterStatus dependency Jan 31, 2020

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 31, 2020

neolit123 reviewed Feb 1, 2020

View reviewed changes

cmd/kubeadm/app/apis/kubeadm/apiendpoint_test.go Show resolved Hide resolved

neolit123 reviewed Feb 1, 2020

View reviewed changes

cmd/kubeadm/app/constants/constants.go Outdated Show resolved Hide resolved

neolit123 reviewed Feb 1, 2020

View reviewed changes

cmd/kubeadm/app/util/etcd/etcd.go Outdated Show resolved Hide resolved

neolit123 reviewed Feb 1, 2020

View reviewed changes

cmd/kubeadm/app/util/config/cluster.go Show resolved Hide resolved

neolit123 reviewed Feb 10, 2020

View reviewed changes

cmd/kubeadm/app/util/config/cluster.go Show resolved Hide resolved

neolit123 reviewed Feb 10, 2020

View reviewed changes

rosti approved these changes Feb 11, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2020

fabriziopandini reviewed Feb 12, 2020

View reviewed changes

cmd/kubeadm/app/util/config/cluster_test.go Outdated Show resolved Hide resolved

cmd/kubeadm/app/util/etcd/etcd_test.go Outdated Show resolved Hide resolved

k8s-ci-robot changed the title ~~kubeadm: Remove ClusterStatus dependency~~ kubeadm: deprecate the ClusterStatus dependency Feb 13, 2020

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2020

rosti approved these changes Feb 19, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2020

k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 20, 2020

ereslibre added 2 commits February 20, 2020 12:18

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2020

fabriziopandini reviewed Feb 22, 2020

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 22, 2020

k8s-ci-robot merged commit 31b8c0d into kubernetes:master Feb 23, 2020

k8s-ci-robot added this to the v1.18 milestone Feb 23, 2020

ereslibre deleted the do-not-depend-on-cluster-status branch February 25, 2020 00:41

neolit123 mentioned this pull request Sep 16, 2020

remove ClusterStatus from v1beta3 kubernetes/kubeadm#2290

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm: deprecate the `ClusterStatus` dependency #87656

kubeadm: deprecate the `ClusterStatus` dependency #87656

ereslibre commented Jan 29, 2020 •

edited

Loading

ereslibre commented Jan 29, 2020

neolit123 commented Jan 29, 2020

rosti left a comment

ereslibre commented Jan 31, 2020

rosti left a comment

neolit123 Feb 10, 2020 •

edited

Loading

ereslibre Feb 10, 2020

rosti Feb 11, 2020

neolit123 Feb 12, 2020

ereslibre Feb 12, 2020

rosti left a comment

rosti Feb 11, 2020

rosti Feb 11, 2020

ereslibre Feb 11, 2020 •

edited

Loading

fabriziopandini left a comment

neolit123 commented Feb 13, 2020

ereslibre commented Feb 19, 2020

rosti left a comment

ereslibre commented Feb 19, 2020

neolit123 commented Feb 20, 2020

neolit123 commented Feb 20, 2020

ereslibre commented Feb 20, 2020

fabriziopandini left a comment

k8s-ci-robot commented Feb 22, 2020

fejta-bot commented Feb 23, 2020

fejta-bot commented Feb 23, 2020

kubeadm: deprecate the ClusterStatus dependency #87656

kubeadm: deprecate the ClusterStatus dependency #87656

Conversation

ereslibre commented Jan 29, 2020 • edited Loading

ereslibre commented Jan 29, 2020

neolit123 commented Jan 29, 2020

rosti left a comment

Choose a reason for hiding this comment

ereslibre commented Jan 31, 2020

rosti left a comment

Choose a reason for hiding this comment

neolit123 Feb 10, 2020 • edited Loading

Choose a reason for hiding this comment

ereslibre Feb 10, 2020

Choose a reason for hiding this comment

rosti Feb 11, 2020

Choose a reason for hiding this comment

neolit123 Feb 12, 2020

Choose a reason for hiding this comment

ereslibre Feb 12, 2020

Choose a reason for hiding this comment

rosti left a comment

Choose a reason for hiding this comment

rosti Feb 11, 2020

Choose a reason for hiding this comment

rosti Feb 11, 2020

Choose a reason for hiding this comment

ereslibre Feb 11, 2020 • edited Loading

Choose a reason for hiding this comment

fabriziopandini left a comment

Choose a reason for hiding this comment

neolit123 commented Feb 13, 2020

ereslibre commented Feb 19, 2020

rosti left a comment

Choose a reason for hiding this comment

ereslibre commented Feb 19, 2020

neolit123 commented Feb 20, 2020

neolit123 commented Feb 20, 2020

ereslibre commented Feb 20, 2020

fabriziopandini left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 22, 2020

fejta-bot commented Feb 23, 2020

fejta-bot commented Feb 23, 2020

kubeadm: deprecate the `ClusterStatus` dependency #87656

kubeadm: deprecate the `ClusterStatus` dependency #87656

ereslibre commented Jan 29, 2020 •

edited

Loading

neolit123 Feb 10, 2020 •

edited

Loading

ereslibre Feb 11, 2020 •

edited

Loading