-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet Failed to Start After Node Restart (cri_dockerd_enabled: true) #8734
Comments
I managed to manually get my cluster running again by enabling and starting the cri-dockerd service in systemd. sudo systemctl enable cri-dockerd.service I will try to take a look at the ansible roles and see why it wasn't by default |
This is an issue with the reset play in general, when resetting services it masks them. There was a fix proposed for containerd a few days ago but I'm guessing the issue is more widespread and should be addressed generically. The PR in question: #8726 |
The issue is I think specific to apt-based systems and for runtimes where the Ansible for the container runtime does some variant of
I added an unmask for cri-docker and docker to #8726 which will probably fix this. The container runtime install plays aren't all that cookie cutter so there may be additional tweaks needed.... |
JFYI, #8726 did not help. Locally made below changes which seems to be helping, please take a look and suggest if its fine.
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Environment:
Bare metal, amdx64
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):Linux 5.13.0-39-generic x86_64
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
Version of Ansible (
ansible --version
):ansible [core 2.12.3]
config file = None
configured module search path = ['/home/alexander/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/alexander/.local/lib/python3.10/site-packages/ansible
ansible collection location = /home/alexander/.ansible/collections:/usr/share/ansible/collections
executable location = /home/alexander/.local/bin/ansible
python version = 3.10.4 (main, Mar 23 2022, 23:05:40) [GCC 11.2.0]
jinja version = 2.11.3
libyaml = True
Version of Python (
python --version
):Python 3.10.4
Kubespray version (commit) (
git rev-parse --short HEAD
):dc0dfad4
Network plugin used:
flannel
Full inventory with variables (
ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"
):https://gist.github.com/sashokbg/cafb0c6b1a264d72febbc797b865afd8
Command used to invoke ansible:
ansible-playbook -i inventory/home_cloud_cluster/hosts.yaml --become --become-user=root cluster.yml --private-key ~/.ssh/id_rsa --become-user=root --user home-cloud-user
Output of ansible run:
Unfortunately I don't have it, but script finished with no errors.
Should I reinstall it for the purpose of obtaining more info ?
Anything else do we need to know:
I have cri_dockerd_enabled: true in my inventory and after restarting my control plane node the kubelet service is unable to restart.
Journalctl logs of kubelet
The text was updated successfully, but these errors were encountered: