FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found #7083

juliohm1978 · 2020-12-25T08:54:53Z

Environment:

Cloud provider or hardware configuration: barebone installation - VMs
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.15.0-128-generic x86_64
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Version of Ansible (ansible --version): 2.9.13
Version of Python (python --version): Python 2.7.17

Kubespray version (commit) (git rev-parse --short HEAD): v2.14.2 (75d648c)

Network plugin used: calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

inventory.txt

Command used to invoke ansible:

ansible-playbook cluster.yml -e upgrade_cluster_setup=true --limit=kube-master -b -i inventory/inventory.ini

Output of ansible run:

output.txt

Anything else do we need to know:

I've been trying to upgrade the cluster from 1.18.x to 1.19.x and I keep getting this error message. The isolated kubeadm output follows:

{
  "attempts": 3,
  "changed": true,
  "cmd": ["timeout", "-k", "600s", "600s", "/usr/local/bin/kubeadm", "upgrade", "apply", "-y", "v1.19.5", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--allow-experimental-upgrades", "--etcd-upgrade=false", "--force"],
  "delta": "0:00:08.281467",
  "end": "2020-12-25 05:39:31.849692",
  "failed_when_result": true,
  "msg": "non-zero return code",
  "rc": 1,
  "start": "2020-12-25 05:39:23.568225",
  "stderr": "W1225 05:39:23.639569  112017 common.go:94] WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!\nW1225 05:39:23.649150  112017 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [172.25.231.10]; the provided value is: [169.254.25.10]\nW1225 05:39:23.747340  112017 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]\n[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services \"kube-dns\" not found\nTo see the stack trace of this error execute with --v=5 or higher",
  "stderr_lines": ["W1225 05:39:23.639569  112017 common.go:94] WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!", "W1225 05:39:23.649150  112017 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [172.25.231.10]; the provided value is: [169.254.25.10]", "W1225 05:39:23.747340  112017 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]", "[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services \"kube-dns\" not found", "To see the stack trace of this error execute with --v=5 or higher"],
  "stdout": "[upgrade/config] Making sure the configuration is correct:\n[preflight] Running pre-flight checks.\n[upgrade] Running cluster health checks\n[upgrade/version] You have chosen to change the cluster version to \"v1.19.5\"\n[upgrade/versions] Cluster version: v1.19.5\n[upgrade/versions] kubeadm version: v1.19.5\n[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster\n[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection\n[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'\n[upgrade/apply] Upgrading your Static Pod-hosted control plane to version \"v1.19.5\"...\nStatic pod: kube-apiserver-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 59aa88793d7cee5e566bb613b27db0ba\nStatic pod: kube-controller-manager-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 0c4d03fcda3773af381b016817948374\nStatic pod: kube-scheduler-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: fec5aa3f763b28dec522e4d6718b51fa\n[upgrade/staticpods] Writing new Static Pod manifests to \"/etc/kubernetes/tmp/kubeadm-upgraded-manifests492335740\"\n[upgrade/staticpods] Preparing for \"kube-apiserver\" upgrade\n[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade\n[upgrade/staticpods] Preparing for \"kube-controller-manager\" upgrade\n[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade\n[upgrade/staticpods] Preparing for \"kube-scheduler\" upgrade\n[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade\n[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace\n[kubelet] Creating a ConfigMap \"kubelet-config-1.19\" in namespace kube-system with the configuration for the kubelets in the cluster\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes\n[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials\n[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token\n[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster\n[addons] Applied essential addon: kube-proxy",
  "stdout_lines": ["[upgrade/config] Making sure the configuration is correct:", "[preflight] Running pre-flight checks.", "[upgrade] Running cluster health checks", "[upgrade/version] You have chosen to change the cluster version to \"v1.19.5\"", "[upgrade/versions] Cluster version: v1.19.5", "[upgrade/versions] kubeadm version: v1.19.5", "[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster", "[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection", "[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'", "[upgrade/apply] Upgrading your Static Pod-hosted control plane to version \"v1.19.5\"...", "Static pod: kube-apiserver-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 59aa88793d7cee5e566bb613b27db0ba", "Static pod: kube-controller-manager-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: 0c4d03fcda3773af381b016817948374", "Static pod: kube-scheduler-k8s-master01-tst-20190607.dis.tjpr.jus.br hash: fec5aa3f763b28dec522e4d6718b51fa", "[upgrade/staticpods] Writing new Static Pod manifests to \"/etc/kubernetes/tmp/kubeadm-upgraded-manifests492335740\"", "[upgrade/staticpods] Preparing for \"kube-apiserver\" upgrade", "[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade", "[upgrade/staticpods] Preparing for \"kube-controller-manager\" upgrade", "[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade", "[upgrade/staticpods] Preparing for \"kube-scheduler\" upgrade", "[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade", "[upload-config] Storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace", "[kubelet] Creating a ConfigMap \"kubelet-config-1.19\" in namespace kube-system with the configuration for the kubelets in the cluster", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes", "[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials", "[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token", "[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster", "[addons] Applied essential addon: kube-proxy"]
}

I tried setting CoreDNS to 1.7.0, which is what kubeadm would support under 1.19.5, but still no luck.

Any ideas what could be causing this?

The text was updated successfully, but these errors were encountered:

juliohm1978 · 2020-12-25T09:28:16Z

On a side node, Kubespray ran once from 1.18.9 to 1.19.5. Some components, like kube-apiserver, kube-scheduler and kube-controller, were upgraded. But shortly after, kubeadm died with the error message:

FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found

Subsequent Kubespray runs insist on failing with the same error message.

juliohm1978 · 2020-12-25T12:22:40Z

Noted from history...

The ansible task that deletes the kube-dns Service was added back in may/2019.

https://github.com/kubernetes-sigs/kubespray/blame/bbab1013c5afd295f2c011fce982f742c2f7c3b7/roles/kubernetes-apps/ansible/tasks/cleanup_dns.yml#L13

Except, as of Feb/2020 kubeadm checks for the existence of this exact Service name during upgrades.

https://github.com/kubernetes/kubernetes/blob/98bc258bf5516b6c60860e06845b899eab29825d/cmd/kubeadm/app/phases/addons/dns/dns.go#L363-L365

One of their issues was closed recently, blaming Kubespray for the error.

kubernetes/kubeadm#2358

It seems that changing the service name from kube-dns to coredns is getting in the way of the kubeadm upgrade process.

juliohm1978 · 2020-12-25T12:41:20Z

I managed to workaround the issue by obtaining a copy of the coredns svc from the cluster:

kubectl get svc -n kube-system coredns -oyaml

... reapply a refurbished copy of the same Yaml to create a service named kube-dns.

With that svc copy present, the upgrade worked out fine.

Merry X-mas!

dlouks · 2021-01-19T14:38:26Z

I've run into this problem as well. If you use nodelocaldns, in addition to deleting and renaming coredns service to kube-dns you also need to update the arguments on the nodelocaldns daemonset.

kubespray/roles/kubernetes-apps/ansible/templates/nodelocaldns-daemonset.yml.j2

Line 39 in 4ffc106

    
           args: [ "-localip", "{{ nodelocaldns_ip }}", "-conf", "/etc/coredns/Corefile", "-upstreamsvc", "coredns" ]

dlouks · 2021-01-19T20:41:09Z

If the best path forward is to rename the service back to kube-dns, it looks like the deletion and recreation of the service (which happens today on every run) will need to take place earlier in order to occur prior to running the upgrade.

dlouks · 2021-01-19T21:12:59Z

For anyone not experiencing issues on upgrade, I'd be curious what your coreDNS service IP is, and whether kubeadm created a kube-dns service on .10 of your kube_service_addresses. This is what happened to me when my coredns service did not use the X.X.X.10 address.

If coredns is configured to use the X.X.X.10 address and you try @juliohm1978's work around of creating a copy of the service called kube-dns with a new IP, the upgrade will fail because kubeadmin attempts to set the kube-dns IP to X.X.X.10 and fails since that field is immutable.

dlouks · 2021-01-19T21:42:42Z

This isn't the most elegant solution, but looks like there is already a PR out there to simply ignore upgrade errors when kube-dns doesn't exist or it wants to change the IP - #6244

fejta-bot · 2021-04-19T21:55:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-05-19T22:24:17Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-06-18T22:42:10Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-18T22:42:14Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

juliohm1978 · 2021-11-04T00:29:52Z

The issue persists, even on Kubespray 2.17.x. For anyone bumping into this, recreating the kube-dns Service works.

If you hit @dlouks's problem, where coredns ended up with a ClusterIP other than X.X.X.10, the workaround is a little more contrived, but still possible. A follow up error might look like this:

WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!\nW1024 18:30:42.663590   24273 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [172.25.230.10]; the provided value is: [169.254.25.10]\n[upgrade/postupgrade]

FATAL post-upgrade error: unable to create/update the DNS service: Service \"kube-dns\" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.clusterIPs: Required value]

In this case, the expected IP would be 172.25.230.10. Replace that with your cluster svc cidr range and keep going:

Check the cluster to see if any other svc already has the X.X.X.10 ClusterIP.

kubectl get svc -A | grep X.X.X.10

If it's already in use, kubectl delete it and recreate it to free up X.X.X.10. Service will be temporarily unavailable, so consider how critical that downtime is for you.
Recreate kube-dns using clusterIP: X.X.X.10 in your manifest to force the correct IP into it.
Re-run Kubespray playbook.

dlouks · 2021-11-16T15:00:55Z

@juliohm1978, I think the fix is in #6244. Looks like it needs a little rework now that kubernetes/master role moved to control-plane.

juliohm1978 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 25, 2020

dlouks mentioned this issue Jan 26, 2021

Remove deletion of coredns deployment. #7211

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2021

k8s-ci-robot closed this as completed Jun 18, 2021

juliohm1978 mentioned this issue Nov 16, 2021

Tolerate failed coredns svc errors on kubeadm init/upgrade #6244

Closed

wobu mentioned this issue Jun 23, 2022

Removing of "kube-dns" svc in cleanup_dns.yml is a Bug #9028

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found #7083

FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found #7083

juliohm1978 commented Dec 25, 2020

juliohm1978 commented Dec 25, 2020 •

edited

Loading

juliohm1978 commented Dec 25, 2020

juliohm1978 commented Dec 25, 2020 •

edited

Loading

dlouks commented Jan 19, 2021

dlouks commented Jan 19, 2021

dlouks commented Jan 19, 2021

dlouks commented Jan 19, 2021

fejta-bot commented Apr 19, 2021

fejta-bot commented May 19, 2021

fejta-bot commented Jun 18, 2021

k8s-ci-robot commented Jun 18, 2021

juliohm1978 commented Nov 4, 2021

dlouks commented Nov 16, 2021

FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found #7083

FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found #7083

Comments

juliohm1978 commented Dec 25, 2020

juliohm1978 commented Dec 25, 2020 • edited Loading

juliohm1978 commented Dec 25, 2020

juliohm1978 commented Dec 25, 2020 • edited Loading

dlouks commented Jan 19, 2021

dlouks commented Jan 19, 2021

dlouks commented Jan 19, 2021

dlouks commented Jan 19, 2021

fejta-bot commented Apr 19, 2021

fejta-bot commented May 19, 2021

fejta-bot commented Jun 18, 2021

k8s-ci-robot commented Jun 18, 2021

juliohm1978 commented Nov 4, 2021

dlouks commented Nov 16, 2021

juliohm1978 commented Dec 25, 2020 •

edited

Loading

juliohm1978 commented Dec 25, 2020 •

edited

Loading