-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert CoreDNS to 1.3.1 in kube-up #78691
Conversation
/cc @MrHohn @chrisohaver |
/priority critical-urgent |
/hold Hi, See #78562 and #76579 (comment) for context. |
@mborsz: GitHub didn't allow me to request PR reviews from the following users: oxddr. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
CoreDNS 1.5.0 is being reverted due to configuration migration troubles. A non-trivial migration of the CoreDNS configuration is required between 1.3.1 and 1.5.0. This migration step was not approved in kubeadm in 1.15 in time, so it is being reverted. IMO, we should be testing the version of CoreDNS that will be used by most 1.15 deployments, which I think will be 1.3.1, since kubeadm will be installing that way. Not everyone uses kubeadm I realize, but it is I think it is widely used as a reference. Even if we upgrade to CoreDNS in the test framework to make tests pass, I think most people in the field will be using k8s 1.15 + CoreDNS 1.3.1, which potentially has this scaling issue. Is the COS upgrade to the beta version critical? Can that be reverted? Or would that cause more trouble (e.g. is it fixing other more important problems)? If we leave the kube-up tests at CoreDNS 1.5.0, then I think CoreDNS 1.5.0 would be the "recommended" version of CoreDNS to use with K8s 1.15 (since that is what was tested). Perhaps thats the easiest route. Kubeadm would then need to release note that it doesn't upgrade to the version of CoreDNS recommended for 1.15 due to manual configuration migrations required (although it wont install a fresh build with 1.5.0 either). |
Thanks @chrisohaver for the summary. When in doubt, I'm going to err on the side of user upgrade experience and suggest that we revert. Going to read a bit more for context before making a call. That said, in 1.11 kubeadm switched to coredns, while in 1.12 kube-up still used kube-dns due to scalability concerns, finally switching in 1.13... so we have prior art for two different ways of standing up clusters using two different DNS implementations. |
My opinion:
|
@spiffxp this seems like a very reasonable path forward. I also agree that upgrade UX is the most high-impact aspect in play here. |
Sounds good to me. Basically, rollback to CoreDNS 1.3.1, release k8s 1.15 with the failing 5k node test, and issue some kind of scaling exception notice advising use of coredns 1.5.0 for large clusters in the ballpark of 5K nodes. Thanks @spiffxp! |
IIUC from above discussion, it seems like coredns 1.5.0 is having some correctness bugs but scalability improvements. We seem to have been using 1.3.1 even until 3 weeks ago (when it was bumped to 1.5.0 in #78030) and scalability tests were passing for last release where we were still using the old one (though IIRC we manually had to bump the resource requests). So if we're not at least regressing from last release wrt core-dns scalability, staying at old version to avoid correctness bugs seems better IMHO. I might be missing sth here. |
Given release timing, is there a deadline for this decision? |
It's not really a full image. shows number of LIST endpoints issued in load test which represents mostly number of coredns restarts. There are 3 notable builds:
So the fact that in the test was passing before upgrading coredns to 1.5.0 was mostly because we removed the last dns client talking to it. We see that in similar conditions (328 vs 330 builds) coredns 1.3.1 with newer cos is using more memory than before cos upgrade. |
I agree with @spiffxp suggestion. We can also suggest increasing memory limit for coredns in big clusters (assuming the increase isn't that big). This seems to be more safe option than asking for coredns upgrade. Just two concerns:
|
#78851 adds a way to customize coredns version in kube-up scripts. Can we get merged in 1.15? |
Hi all, so what the plan? |
I suggest merging this one and then #78851. |
Awesome, thank you @mborsz ! |
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rajansandeep, spiffxp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/hold cancel |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Reverts CoreDNS to 1.3.1 for k8s 1.15
Reverts changes from #78030 and #78305
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: