Kube Service: preparing server: failed to get CA certs #235

bsodmike · 2023-02-16T08:25:16Z

bsodmike
Feb 16, 2023

Hi all,

I'm testing a very basic clone of this playbook, with a few basics changed. The error I'm seeing is this. It seems the Jinja templating is breaking at {.items[*].metadata.name} which is here https://github.com/techno-tim/k3s-ansible/blob/master/roles/k3s/master/tasks/main.yml#L34

TASK [k3s/master : Verify that all nodes actually joined (check k3s-init.service if this fails)] ***
FAILED - RETRYING: [10.0.3.79]: Verify that all nodes actually joined (check k3s-init.service if this fails) (20 retries left).
...
FAILED - RETRYING: [10.0.3.81]: Verify that all nodes actually joined (check k3s-init.service if this fails) (1 retries left).
FAILED - RETRYING: [10.0.3.79]: Verify that all nodes actually joined (check k3s-init.service if this fails) (1 retries left).
fatal: [10.0.3.81]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["k3s", "kubectl", "get", "nodes", "-l", "node-role.kubernetes.io/master=true", "-o=jsonpath={.items[*].metadata.name}"], "delta": "0:00:00.104705", "end": "2023-02-16 13:44:24.915011", "msg": "non-zero return code", "rc": 1, "start": "2023-02-16 13:44:24.810306", "stderr": "The connection to the server localhost:8080 was refused - did you specify the right host or port?", "stderr_lines": ["The connection to the server localhost:8080 was refused - did you specify the right host or port?"], "stdout": "", "stdout_lines": []}

fatal: [10.0.3.79]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["k3s", "kubectl", "get", "nodes", "-l", "node-role.kubernetes.io/master=true", "-o=jsonpath={.items[*].metadata.name}"], "delta": "0:00:00.101943", "end": "2023-02-16 13:44:24.929091", "msg": "", "rc": 0, "start": "2023-02-16 13:44:24.827148", "stderr": "", "stderr_lines": [], "stdout": "k3s-1.debian11.homelab.com", "stdout_lines": ["k3s-1.debian11.homelab.com"]}

I can confirm that the kube-vip instance is running and the script fails due to the issue above.

Answered by bornav

Feb 17, 2023

In my case had the same fail point, steps that helped me:
make sure each host has a unique hostname,
make sure that hosts do not have any firewall rules blocking traffic(on all ports)

View full answer

bsodmike · 2023-02-16T09:11:44Z

bsodmike
Feb 16, 2023
Author

Dug a bit deeper and the issue is elsewhere, this is on one of the master nodes:

Feb 16 14:40:11 k3s-3.debian11.homelab.com python3[22103]: ansible-ansible.legacy.command Invoked with _raw_params=k3s kubectl get nodes -l "node-role.kubernetes.io/master=true" -o=jsonpath="{.items[*].metadata.name}" _uses_shell=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Feb 16 14:40:11 k3s-3.debian11.homelab.com k3s[22038]: time="2023-02-16T14:40:11+05:30" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://10.0.3.79:6443/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Failed with result 'exit-code'.

0 replies

timothystewart6 · 2023-02-16T21:12:02Z

timothystewart6
Feb 16, 2023
Maintainer

Hi can you please fill out the issue template that was supplied when you created an issue? Thank you!

0 replies

bsodmike · 2023-02-17T02:58:10Z

bsodmike
Feb 17, 2023
Author

Expected Behavior

According to the YouTube video, at least, your master nodes joined the main node which runs kube-vip.

Current Behavior

This does not happen, instead the 2nd and 3rd master nodes are unable to connect to the main (primary) master node as CA certs are missing.

Feb 16 14:40:11 k3s-3.debian11.homelab.com python3[22103]: ansible-ansible.legacy.command Invoked with _raw_params=k3s kubectl get nodes -l "node-role.kubernetes.io/master=true" -o=jsonpath="{.items[*].metadata.name}" _uses_shell=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Feb 16 14:40:11 k3s-3.debian11.homelab.com k3s[22038]: time="2023-02-16T14:40:11+05:30" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://10.0.3.79:6443/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Failed with result 'exit-code'.

Steps to Reproduce

Run the playbook by default, this error should take place.

Context (variables)

Operating system: Debian 11

Hardware: VM: 16GB RAM / 2vcpu / 40GB disk

Variables Used

all.yml

k3s_version: v1.24.10+k3s1
ansible_user: NA
systemd_dir: /etc/systemd/system

# interface which will be used for flannel
flannel_iface: "eth0"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "10.0.3.85"

k3s_token: "NA"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.5.7"

# image tag for metal lb
metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.7"
metal_lb_controller_tag_version: "v0.13.7"

# metallb ip range for load balancer
metal_lb_ip_range: "10.0.3.90-10.0.3.100"

Hosts

host.ini

[master]
10.0.3.79
10.0.3.80
10.0.3.81

[node]
10.0.3.82
10.0.3.83

# only required if proxmox_lxc_configure: true
# must contain all proxmox instances that have a master or worker node
# [proxmox]
# 192.168.30.43

[k3s_cluster:children]
master
node

Possible Solution

I was planning on setting up self-signed certs and seeing if that would work, but I'm just confused as why this wasn't experienced when you made the Video :). Thanks Tim!

Observations

FYI, I also noticed another error and fixed this by running /usr/local/bin/k3s kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)" - without this, there were metallb errors in the logs.

I've checked the General Troubleshooting Guide

0 replies

bsodmike · 2023-02-17T05:54:29Z

bsodmike
Feb 17, 2023
Author

If they do not match, create one master / server node and add additional servers outside of this playbook

Removing the 2nd/3rd master and trying this now. This passed the initial failure point

TASK [k3s/master : Verify that all nodes actually joined (check k3s-init.service if this fails)] ***
FAILED - RETRYING: [10.0.3.79]: Verify that all nodes actually joined (check k3s-init.service if this fails) (20 retries left).
ok: [10.0.3.79]

However it is now failing at

TASK [k3s/node : Copy K3s service file] **************************************************
changed: [10.0.3.83]
changed: [10.0.3.82]

TASK [k3s/node : Enable and check K3s service] *******************************************

I find it strange that it is trying to fetch the CA cert (which doesn't exist anyway, as far as I'm aware), from the localhost address - ideas?

Feb 17 11:27:25 k3s-4.debian11.homelab.com k3s[29019]: time="2023-02-17T11:27:25+05:30" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 17 11:27:27 k3s-4.debian11.homelab.com systemd[1]: Configuration file /etc/systemd/system/k3s-node.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Feb 17 11:27:47 k3s-4.debian11.homelab.com k3s[29019]: time="2023-02-17T11:27:47+05:30" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 17 11:28:09 k3s-4.debian11.homelab.com k3s[29019]: time="2023-02-17T11:28:09+05:30" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

0 replies

bornav · 2023-02-17T12:09:16Z

bornav
Feb 17, 2023

In my case had the same fail point, steps that helped me:
make sure each host has a unique hostname,
make sure that hosts do not have any firewall rules blocking traffic(on all ports)

0 replies

bsodmike · 2023-02-17T13:36:24Z

bsodmike
Feb 17, 2023
Author

Thanks @bornav let me double check on the local firewall.

0 replies

bsodmike · 2023-02-18T03:41:09Z

bsodmike
Feb 18, 2023
Author

@bornav your tip on checking the local fw was spot on. I have another ansible script I run on all my VMs as a "metal prep" type playbook that adds basic security and configs; this enables UFW and the basic config is to lock all ports down other than SSH.

The test cluster boots up just fine now, thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kube Service: preparing server: failed to get CA certs #235

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Kube Service: preparing server: failed to get CA certs #235

bsodmike Feb 16, 2023

Replies: 7 comments

bsodmike Feb 16, 2023 Author

timothystewart6 Feb 16, 2023 Maintainer

bsodmike Feb 17, 2023 Author

Expected Behavior

Current Behavior

Steps to Reproduce

Context (variables)

Variables Used

Hosts

Possible Solution

Observations

bsodmike Feb 17, 2023 Author

bornav Feb 17, 2023

bsodmike Feb 17, 2023 Author

bsodmike Feb 18, 2023 Author

bsodmike
Feb 16, 2023

bsodmike
Feb 16, 2023
Author

timothystewart6
Feb 16, 2023
Maintainer

bsodmike
Feb 17, 2023
Author

bsodmike
Feb 17, 2023
Author

bornav
Feb 17, 2023

bsodmike
Feb 17, 2023
Author

bsodmike
Feb 18, 2023
Author