Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found #7083

Closed
juliohm1978 opened this issue Dec 25, 2020 · 13 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@juliohm1978
Copy link
Contributor

Environment:

  • Cloud provider or hardware configuration: barebone installation - VMs

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.15.0-128-generic x86_64
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Version of Ansible (ansible --version): 2.9.13

  • Version of Python (python --version): Python 2.7.17

Kubespray version (commit) (git rev-parse --short HEAD): v2.14.2 (75d648c)

Network plugin used: calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

inventory.txt

Command used to invoke ansible:

ansible-playbook cluster.yml -e upgrade_cluster_setup=true --limit=kube-master -b -i inventory/inventory.ini

Output of ansible run:

output.txt

Anything else do we need to know:

I've been trying to upgrade the cluster from 1.18.x to 1.19.x and I keep getting this error message. The isolated kubeadm output follows:

{
  "attempts": 3,
  "changed": true,
  "cmd": ["timeout", "-k", "600s", "600s", "/usr/local/bin/kubeadm", "upgrade", "apply", "-y", "v1.19.5", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--allow-experimental-upgrades", "--etcd-upgrade=false", "--force"],
  "delta": "0:00:08.281467",
  "end": "2020-12-25 05:39:31.849692",
  "failed_when_result": true,
  "msg": "non-zero return code",
  "rc": 1,
  "start": "2020-12-25 05:39:23.568225",
  "stderr": "W1225 05:39:23.639569  112017 common.go:94] WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!\nW1225 05:39:23.649150  112017 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [172.25.231.10]; the provided value is: [169.254.25.10]\nW1225 05:39:23.747340  112017 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]\n[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services \"kube-dns\" not found\nTo see the stack trace of this error execute with --v=5 or higher",
  "stderr_lines": ["W1225 05:39:23.639569  112017 common.go:94] WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!", "W1225 05:39:23.649150  112017 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [172.25.231.10]; the provided value is: [169.254.25.10]", "W1225 05:39:23.747340  112017 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]", "[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services \"kube-dns\" not found", "To see the stack trace of this error execute with --v=5 or higher"],
  "stdout": "[upgrade/config] Making sure the configuration is correct:\n[preflight] Running pre-flight checks.\n[upgrade] Running cluster health checks\n[upgrade/version] You have chosen to change the cluster version to \"v1.19.5\"\n[upgrade/versions] Cluster version: v1.19.5\n[upgrade/versions] kubeadm version: v1.19.5\n[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster\n[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection\n[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'\n[upgrade/apply] Upgrading your Static Pod-hosted control plane to version \"v1.19.5\"...\nStatic pod: kube-apiserver-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 59aa88793d7cee5e566bb613b27db0ba\nStatic pod: kube-controller-manager-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 0c4d03fcda3773af381b016817948374\nStatic pod: kube-scheduler-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: fec5aa3f763b28dec522e4d6718b51fa\n[upgrade/staticpods] Writing new Static Pod manifests to \"/etc/kubernetes/tmp/kubeadm-upgraded-manifests492335740\"\n[upgrade/staticpods] Preparing for \"kube-apiserver\" upgrade\n[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade\n[upgrade/staticpods] Preparing for \"kube-controller-manager\" upgrade\n[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade\n[upgrade/staticpods] Preparing for \"kube-scheduler\" upgrade\n[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade\n[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace\n[kubelet] Creating a ConfigMap \"kubelet-config-1.19\" in namespace kube-system with the configuration for the kubelets in the cluster\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes\n[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials\n[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token\n[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster\n[addons] Applied essential addon: kube-proxy",
  "stdout_lines": ["[upgrade/config] Making sure the configuration is correct:", "[preflight] Running pre-flight checks.", "[upgrade] Running cluster health checks", "[upgrade/version] You have chosen to change the cluster version to \"v1.19.5\"", "[upgrade/versions] Cluster version: v1.19.5", "[upgrade/versions] kubeadm version: v1.19.5", "[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster", "[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection", "[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'", "[upgrade/apply] Upgrading your Static Pod-hosted control plane to version \"v1.19.5\"...", "Static pod: kube-apiserver-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 59aa88793d7cee5e566bb613b27db0ba", "Static pod: kube-controller-manager-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 0c4d03fcda3773af381b016817948374", "Static pod: kube-scheduler-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: fec5aa3f763b28dec522e4d6718b51fa", "[upgrade/staticpods] Writing new Static Pod manifests to \"/etc/kubernetes/tmp/kubeadm-upgraded-manifests492335740\"", "[upgrade/staticpods] Preparing for \"kube-apiserver\" upgrade", "[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade", "[upgrade/staticpods] Preparing for \"kube-controller-manager\" upgrade", "[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade", "[upgrade/staticpods] Preparing for \"kube-scheduler\" upgrade", "[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade", "[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace", "[kubelet] Creating a ConfigMap \"kubelet-config-1.19\" in namespace kube-system with the configuration for the kubelets in the cluster", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes", "[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials", "[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token", "[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster", "[addons] Applied essential addon: kube-proxy"]
}

I tried setting CoreDNS to 1.7.0, which is what kubeadm would support under 1.19.5, but still no luck.

Any ideas what could be causing this?

@juliohm1978 juliohm1978 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 25, 2020
@juliohm1978
Copy link
Contributor Author

juliohm1978 commented Dec 25, 2020

On a side node, Kubespray ran once from 1.18.9 to 1.19.5. Some components, like kube-apiserver, kube-scheduler and kube-controller, were upgraded. But shortly after, kubeadm died with the error message:

FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found

Subsequent Kubespray runs insist on failing with the same error message.

@juliohm1978
Copy link
Contributor Author

Noted from history...

The ansible task that deletes the kube-dns Service was added back in may/2019.

https://github.com/kubernetes-sigs/kubespray/blame/bbab1013c5afd295f2c011fce982f742c2f7c3b7/roles/kubernetes-apps/ansible/tasks/cleanup_dns.yml#L13

Except, as of Feb/2020 kubeadm checks for the existence of this exact Service name during upgrades.

https://github.com/kubernetes/kubernetes/blob/98bc258bf5516b6c60860e06845b899eab29825d/cmd/kubeadm/app/phases/addons/dns/dns.go#L363-L365

One of their issues was closed recently, blaming Kubespray for the error.

kubernetes/kubeadm#2358

It seems that changing the service name from kube-dns to coredns is getting in the way of the kubeadm upgrade process.

@juliohm1978
Copy link
Contributor Author

juliohm1978 commented Dec 25, 2020

I managed to workaround the issue by obtaining a copy of the coredns svc from the cluster:

kubectl get svc -n kube-system coredns -oyaml

... reapply a refurbished copy of the same Yaml to create a service named kube-dns.

With that svc copy present, the upgrade worked out fine.

Merry X-mas!

@dlouks
Copy link
Contributor

dlouks commented Jan 19, 2021

I've run into this problem as well. If you use nodelocaldns, in addition to deleting and renaming coredns service to kube-dns you also need to update the arguments on the nodelocaldns daemonset.

args: [ "-localip", "{{ nodelocaldns_ip }}", "-conf", "/etc/coredns/Corefile", "-upstreamsvc", "coredns" ]

@dlouks
Copy link
Contributor

dlouks commented Jan 19, 2021

If the best path forward is to rename the service back to kube-dns, it looks like the deletion and recreation of the service (which happens today on every run) will need to take place earlier in order to occur prior to running the upgrade.

@dlouks
Copy link
Contributor

dlouks commented Jan 19, 2021

For anyone not experiencing issues on upgrade, I'd be curious what your coreDNS service IP is, and whether kubeadm created a kube-dns service on .10 of your kube_service_addresses. This is what happened to me when my coredns service did not use the X.X.X.10 address.

If coredns is configured to use the X.X.X.10 address and you try @juliohm1978's work around of creating a copy of the service called kube-dns with a new IP, the upgrade will fail because kubeadmin attempts to set the kube-dns IP to X.X.X.10 and fails since that field is immutable.

@dlouks
Copy link
Contributor

dlouks commented Jan 19, 2021

This isn't the most elegant solution, but looks like there is already a PR out there to simply ignore upgrade errors when kube-dns doesn't exist or it wants to change the IP - #6244

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@juliohm1978
Copy link
Contributor Author

The issue persists, even on Kubespray 2.17.x. For anyone bumping into this, recreating the kube-dns Service works.

If you hit @dlouks's problem, where coredns ended up with a ClusterIP other than X.X.X.10, the workaround is a little more contrived, but still possible. A follow up error might look like this:

WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!\nW1024 18:30:42.663590   24273 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [172.25.230.10]; the provided value is: [169.254.25.10]\n[upgrade/postupgrade]

FATAL post-upgrade error: unable to create/update the DNS service: Service \"kube-dns\" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.clusterIPs: Required value] 

In this case, the expected IP would be 172.25.230.10. Replace that with your cluster svc cidr range and keep going:

  1. Check the cluster to see if any other svc already has the X.X.X.10 ClusterIP.

kubectl get svc -A | grep X.X.X.10

  1. If it's already in use, kubectl delete it and recreate it to free up X.X.X.10. Service will be temporarily unavailable, so consider how critical that downtime is for you.

  2. Recreate kube-dns using clusterIP: X.X.X.10 in your manifest to force the correct IP into it.

  3. Re-run Kubespray playbook.

@dlouks
Copy link
Contributor

dlouks commented Nov 16, 2021

@juliohm1978, I think the fix is in #6244. Looks like it needs a little rework now that kubernetes/master role moved to control-plane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants