Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upon installation, k3s-agent on nodes giving "failed to get CA certs" #11

Closed
geerlingguy opened this issue Mar 6, 2024 · 6 comments
Closed

Comments

@geerlingguy
Copy link
Owner

Mar 06 16:34:33 deskpi2 k3s[1282]: time="2024-03-06T16:34:33-06:00" level=info msg="Starting k3s agent v1.28.7+k3s1 (051b14b2)"
Mar 06 16:34:33 deskpi2 k3s[1282]: time="2024-03-06T16:34:33-06:00" level=info msg="Adding server to load balancer k3s-agent-load-balancer: deskpi1.local:6443"
Mar 06 16:34:33 deskpi2 k3s[1282]: time="2024-03-06T16:34:33-06:00" level=info msg="Running load balancer k3s-agent-load-balancer 127.0.0.1:6444 -> [deskpi1.local:6443] [default: deskpi1.local:6443]"
Mar 06 16:34:33 deskpi2 k3s[1282]: time="2024-03-06T16:34:33-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:58660->127.0.0.1:6444: read: connection res>
Mar 06 16:34:35 deskpi2 k3s[1282]: time="2024-03-06T16:34:35-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
Mar 06 16:34:37 deskpi2 k3s[1282]: time="2024-03-06T16:34:37-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:58122->127.0.0.1:6444: read: connection res>
Mar 06 16:34:39 deskpi2 k3s[1282]: time="2024-03-06T16:34:39-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"

I'm guessing it's a DNS issue, because of course it's a DNS issue. I'm trying to use the mDNS names like deskpi1.local, deskpi2.local, and from deskpi2, I can ping deskpi1:

pi@deskpi2:~ $ ping deskpi1.local
PING deskpi1.local (10.0.2.90) 56(84) bytes of data.
64 bytes from cam01.mmoffice.net (10.0.2.90): icmp_seq=1 ttl=64 time=0.260 ms
64 bytes from cam01.mmoffice.net (10.0.2.90): icmp_seq=2 ttl=64 time=0.397 ms
64 bytes from cam01.mmoffice.net (10.0.2.90): icmp_seq=3 ttl=64 time=0.347 ms
64 bytes from cam01.mmoffice.net (10.0.2.90): icmp_seq=4 ttl=64 time=0.353 ms

Gah... now I realize I had reserved that IP earlier for some camera testing, but since it's not active right now, it's reassigning the IP to one of the Pis?

@geerlingguy
Copy link
Owner Author

I got that fixed, but now I'm trying to figure out why the nodes are still trying to connect to localhost.

Here are the contents of /var/lib/rancher/k3s/agent/etc/k3s-agent-load-balancer.json:

{
  "ServerURL": "https://deskpi1.local:6443",
  "ServerAddresses": [
    "deskpi1.local:6443"
  ],
  "Listener": null
}

There is no /var/lib/rancher/k3s/agent/kubelet.kubeconfig as of yet.

@geerlingguy
Copy link
Owner Author

The first node is set up, and is just sitting there cranking out a ton of containers, at least:

pi@deskpi1:~ $ sudo kubectl get pods --all-namespaces
NAMESPACE     NAME                                                     READY   STATUS              RESTARTS        AGE
kube-system   helm-install-traefik-wz62q                               0/1     Completed           1               61m
kube-system   helm-install-traefik-crd-j86ss                           0/1     Completed           0               61m
kube-system   svclb-traefik-03607eea-j78w2                             2/2     Running             4 (5m35s ago)   59m
kube-system   local-path-provisioner-6c86858495-k47zq                  1/1     Running             2 (5m35s ago)   61m
kube-system   coredns-6799fbcd5-cssw4                                  1/1     Running             2 (5m35s ago)   61m
kube-system   traefik-f4564c4f4-mxvzm                                  1/1     Running             2 (5m35s ago)   59m
kube-system   metrics-server-67c658944b-tqwg7                          1/1     Running             2 (5m35s ago)   61m
default       nfs-subdir-external-provisioner-77975c6697-kbsvp         1/1     Running             0               2m54s
default       cluster-monitoring-kube-pr-operator-79bf9cc856-nkvxp     1/1     Running             0               2m7s
default       cluster-monitoring-prometheus-node-exporter-smbtj        1/1     Running             0               2m6s
default       cluster-monitoring-kube-state-metrics-5fcdfcfc9c-rkwpq   1/1     Running             0               2m7s
drupal        drupal-79946c978b-rh9gz                                  0/1     ContainerCreating   0               86s
default       prometheus-cluster-monitoring-kube-pr-prometheus-0       2/2     Running             0               111s
default       cluster-monitoring-grafana-74b99fd4c5-7dfnc              2/3     Running             0               2m7s
drupal        mariadb-748b666569-xzvjx                                 1/1     Running             0               91s
pi@deskpi1:~ $ sudo kubectl get nodes
NAME      STATUS   ROLES                  AGE   VERSION
deskpi1   Ready    control-plane,master   61m   v1.28.7+k3s1

But everything's running on that first node, oops :D

@geerlingguy
Copy link
Owner Author

Drupal's even working there, too...

Screenshot 2024-03-06 at 5 33 34 PM

I'm wondering if I just need to go to static IPs instead of trying to use mDNS. I figured I'd get lucky and it would 'just work' but maybe something inside Docker/containerd's config just won't work with it out of the box.

@geerlingguy
Copy link
Owner Author

Hooray! I configured static IPs (had to rework the static-networking playbook a bit), and now it's all working:

pi@deskpi1:~ $ sudo kubectl get nodes
NAME      STATUS   ROLES                  AGE   VERSION
deskpi1   Ready    control-plane,master   17h   v1.28.7+k3s1
deskpi3   Ready    <none>                 21s   v1.28.7+k3s1
deskpi6   Ready    <none>                 18s   v1.28.7+k3s1
deskpi2   Ready    <none>                 18s   v1.28.7+k3s1
deskpi4   Ready    <none>                 21s   v1.28.7+k3s1
deskpi5   Ready    <none>                 21s   v1.28.7+k3s1
pi@deskpi1:~ $ sudo kubectl get pods --all-namespaces
NAMESPACE     NAME                                                     READY   STATUS      RESTARTS      AGE
kube-system   helm-install-traefik-wz62q                               0/1     Completed   1             17h
kube-system   helm-install-traefik-crd-j86ss                           0/1     Completed   0             17h
drupal        mariadb-748b666569-xzvjx                                 1/1     Running     1 (16h ago)   16h
kube-system   local-path-provisioner-6c86858495-k47zq                  1/1     Running     3 (16h ago)   17h
default       cluster-monitoring-kube-pr-operator-79bf9cc856-nkvxp     1/1     Running     1 (16h ago)   16h
default       nfs-subdir-external-provisioner-77975c6697-kbsvp         1/1     Running     1 (16h ago)   16h
kube-system   svclb-traefik-03607eea-j78w2                             2/2     Running     6 (16h ago)   17h
default       cluster-monitoring-prometheus-node-exporter-smbtj        1/1     Running     1 (16h ago)   16h
kube-system   coredns-6799fbcd5-cssw4                                  1/1     Running     3 (16h ago)   17h
kube-system   metrics-server-67c658944b-tqwg7                          1/1     Running     3 (16h ago)   17h
default       cluster-monitoring-kube-state-metrics-5fcdfcfc9c-rkwpq   1/1     Running     1 (16h ago)   16h
kube-system   traefik-f4564c4f4-mxvzm                                  1/1     Running     3 (16h ago)   17h
default       prometheus-cluster-monitoring-kube-pr-prometheus-0       2/2     Running     2 (16h ago)   16h
default       cluster-monitoring-grafana-74b99fd4c5-7dfnc              3/3     Running     3 (16h ago)   16h
drupal        drupal-79946c978b-rh9gz                                  1/1     Running     1 (16h ago)   16h
default       cluster-monitoring-prometheus-node-exporter-l55z6        1/1     Running     0             3m18s
kube-system   svclb-traefik-03607eea-wc6vh                             2/2     Running     0             3m17s
default       cluster-monitoring-prometheus-node-exporter-99z2n        1/1     Running     0             3m17s
kube-system   svclb-traefik-03607eea-ntw45                             2/2     Running     0             3m16s
default       cluster-monitoring-prometheus-node-exporter-vvjmw        1/1     Running     0             3m17s
kube-system   svclb-traefik-03607eea-pltdc                             2/2     Running     0             3m16s
default       cluster-monitoring-prometheus-node-exporter-r8mvr        1/1     Running     0             3m15s
kube-system   svclb-traefik-03607eea-7t79z                             2/2     Running     0             3m14s
kube-system   svclb-traefik-03607eea-62pzx                             2/2     Running     0             3m14s
default       cluster-monitoring-prometheus-node-exporter-678pc        1/1     Running     0             3m15s

I'll push the updates once I validate it a little further.

@geerlingguy
Copy link
Owner Author

Screenshot 2024-03-07 at 10 38 21 AM

Installed and operational!

@geerlingguy
Copy link
Owner Author

lol, ran into this again today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant