-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: deprecate the ClusterStatus
dependency
#87656
kubeadm: deprecate the ClusterStatus
dependency
#87656
Conversation
/assign @fabriziopandini @neolit123 @rosti |
applying hold for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ereslibre !
Overall, I like how this is going.
Thanks @rosti for your review!, this is ready for another pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ereslibre !
I feel that we need to focus our unit tests on the utility funcs. That way we can make more thorough and easy to follow spec as some problems may be hidden by our broad spec tests.
ClusterStatus
dependencyClusterStatus
dependency
e, ok := clusterStatus.APIEndpoints[nodeName] | ||
if !ok { | ||
return errors.New("failed to get APIEndpoint information for this node") | ||
func getRawAPIEndpointFromPodAnnotationWithoutRetry(client clientset.Interface, nodeName string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the ...WithoutRetry
here, given there are no other getRawAPIEndpointFromPodAnnotation
variants?
i think i saw comments on this topic before.
EDIT: ok i see there is also getRawEtcdEndpointsFromPodAnnotation (plural) and getRawEtcdEndpointsFromPodAnnotationWithoutRetry.
IMO the default should be "without-retry".
i must admit the interface is becoming a bit confusing to me with some many func. variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way functions were splitted were to ease unit testing for each one:
-
getAPIEndpoint
just calls togetAPIEndpointWithBackoff
withconstants.StaticPodMirroringDefaultRetry
: no explicit testing needed here. -
getAPIEndpointWithBackoff
: first tries to retrieve the endpoint withgetAPIEndpointFromPodAnnotation
, if it's not present, try to use the cluster status withgetAPIEndpointFromClusterStatus
. Tests ensure that this behavior happens. -
getAPIEndpointFromPodAnnotation
: tries to fetch the api endpoint from pod annotations with a backoff. Tests ensure that this behavior happens. Calls togetAPIEndpointFromPodAnnotationWithoutRetry
. -
getAPIEndpointFromPodAnnotationWithoutRetry
: tries to fetch the api endpoint from pod annotations without any kind of retry. Tests ensure that this behavior happens. -
getAPIEndpointFromClusterStatus
: already existed.
Please, note that this will be very simplified when we remove the ClusterStatus. I tried to separate all concerns on this interim while we must fallback to the cluster status if pod annotations are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you are going, but can't we merge getAPIEndpointWithBackoff
and getAPIEndpoint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the code base we already have the "WithRetry" naming, but not "WithoutRetry"
it seems a bit odd that we now have a default that always retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested here: #87656 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ereslibre !
Looks good to me! Only minor naming nits at this point.
I'll nevertheless hold for a review by @fabriziopandini as he is the KEP author and may spot some detail that I've missed.
/lgtm
/hold
e, ok := clusterStatus.APIEndpoints[nodeName] | ||
if !ok { | ||
return errors.New("failed to get APIEndpoint information for this node") | ||
func getRawAPIEndpointFromPodAnnotationWithoutRetry(client clientset.Interface, nodeName string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you are going, but can't we merge getAPIEndpointWithBackoff
and getAPIEndpoint
?
@@ -127,6 +122,95 @@ func NewFromCluster(client clientset.Interface, certificatesDir string) (*Client | |||
return etcdClient, nil | |||
} | |||
|
|||
// getEtcdEndpoints returns the list of etcd endpoints. | |||
func getEtcdEndpoints(client clientset.Interface) ([]string, error) { | |||
return getEtcdEndpointsWithBackoff(client, constants.StaticPodMirroringDefaultRetry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, can we merge getEtcdEndpoints
and getEtcdEndpointsWithBackoff
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for these functions is that getEtcdEndpoints
and getAPIEndpoint
don't need testing (they have no logic). We test their WithBackoff
counterparts, where we can control the backoff on the unit tests, so we have faster unit test execution and controlled with the backoff required depending on the test cases we are stubbing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ereslibre this is turning out well.
The only nit from my side is about avoiding to test the ExponentialBackoff behavior
/retitle kubeadm: deprecate the |
ClusterStatus
dependencyClusterStatus
dependency
This should be ready now @kubernetes/sig-cluster-lifecycle-pr-reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ereslibre !
/lgtm
/retest |
/priority important-longterm |
@ereslibre bazel needs a update:
|
While `ClusterStatus` will be maintained and uploaded, it won't be used by the internal `kubeadm` logic in order to determine the etcd endpoints anymore. The only exception is during the first upgrade cycle (`kubeadm upgrade apply`, `kubeadm upgrade node`), in which we will fallback to the ClusterStatus to let the upgrade path add the required annotations to the newly created static pods.
When doing the very first upgrade from a cluster that contains the source of truth in the ClusterStatus struct, the new kubeadm logic will try to retrieve this information from annotations. This changeset adds to both etcd and apiserver endpoint retrieval the special case in which they won't retry if we are in such cases. The logic will retry if we find any unknown error, but will not retry in the following cases: - etcd annotations do not contain etcd endpoints, but the overall list of etcd pods is greater than 0. This means that we listed at least one etcd pod, but they are missing the annotation. - API server annotation is not found on the api server pod for a given node name, but no errors aside from that one were found. This means that the API server pod is present, but is missing the annotation. In both cases there is no point in retrying, and so, this speeds up the upgrade path when coming from a previous existing cluster.
Ouch, updated and took the opportunity to rebase on top of latest master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ereslibre for addressing all the comments!
Let's keep an eye on the test grid now!
/hold cancel
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ereslibre, fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind feature
What this PR does / why we need it:
While
ClusterStatus
will be maintained and uploaded, it won't beused by the internal
kubeadm
logic in order to determine the etcdendpoints anymore.
The only exception is during the first upgrade cycle (
kubeadm upgrade apply
,kubeadm upgrade node
), in which we will fallback to theClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
Which issue(s) this PR fixes:
Implements kubernetes/enhancements#1380
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
@kubernetes/sig-cluster-lifecycle @kubernetes/sig-cluster-lifecycle-pr-reviews