Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k0s controller node not coming up (metrics-server CrashLoop) #1047

Closed
Tokynet opened this issue Aug 10, 2021 · 2 comments
Closed

k0s controller node not coming up (metrics-server CrashLoop) #1047

Tokynet opened this issue Aug 10, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@Tokynet
Copy link

Tokynet commented Aug 10, 2021

First attempt to install k0s, the process was pretty simple, the outcome not so much.

I searched other issues and it seems I have hit the same issue as #451.
FWIW, I looked for v0.9.1 but I can't find it on the releases page.

I'm going to clobber the servers and install CentOS Stream and try again later.

Version

> k0sctl-linux-x64 version
version: v0.9.0
commit: 6d364ff
> k0s version
v1.21.3+k0s.0

Platform
Which platform did you run k0s on?

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:          18.04
Codename:     bionic

What happened?
After finishing installation, control-plane will not come-up.

How To Reproduce
Install with the k0sconfig file below.

Expected behavior
Control-plane comes-up and becomes available as part of the cluster.

k get nodes
NAME         STATUS   ROLES    AGE   VERSION
k8sworker1   Ready    <none>   28m   v1.21.3+k0s
k8sworker2   Ready    <none>   28m   v1.21.3+k0s

I expected to have a 3rd entry for the k8smaster (its hostname) .

Screenshots & Logs
From k0scontroller.service (status):

k0s[2692]: time="2021-08-09 20:18:43" level=info msg="E0809 20:18:43.879468    2774 resource_quota_controller.go:409] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request" component=kube-controller-manager
k0s[2692]: time="2021-08-09 20:18:44" level=info msg="W0809 20:18:44.337108    2774 garbagecollector.go:703] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]" component=kube-controller-manager
A

From k get pods -A:

NAMESPACE     NAME                             READY   STATUS             RESTARTS   AGE
kube-system   coredns-5ccbdcc4c4-gk865         0/1     CrashLoopBackOff   9          24m
kube-system   coredns-5ccbdcc4c4-v4khj         0/1     CrashLoopBackOff   9          25m
kube-system   konnectivity-agent-dm9hw         0/1     CrashLoopBackOff   9          24m
kube-system   konnectivity-agent-j69jd         0/1     CrashLoopBackOff   9          24m
kube-system   kube-proxy-8l6ch                 1/1     Running            0          25m
kube-system   kube-proxy-dvxpp                 1/1     Running            0          25m
kube-system   kube-router-5mb2r                1/1     Running            0          25m
kube-system   kube-router-gcsgh                1/1     Running            0          25m
kube-system   metrics-server-59d8698d9-rq8h2   0/1     CrashLoopBackOff   7          11m

From k describe pods:

Name:                 metrics-server-59d8698d9-rq8h2
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 k8sworker2/192.168.0.3
Start Time:           Mon, 09 Aug 2021 20:09:23 -0400
Labels:               k8s-app=metrics-server
                      pod-template-hash=59d8698d9
Annotations:          kubernetes.io/psp: 00-k0s-privileged
Status:               Running
IP:                   10.244.1.4
IPs:
  IP:           10.244.1.4
Controlled By:  ReplicaSet/metrics-server-59d8698d9
Containers:
  metrics-server:
    Container ID:  containerd://eae6a19890fe71b1701760d5dd3f3f681a0f237ca27f1fa9533fd7954c5510e1
    Image:         gcr.io/k8s-staging-metrics-server/metrics-server:v0.3.7
    Image ID:      gcr.io/k8s-staging-metrics-server/metrics-server@sha256:6f3dd4a92838c82e1a253f8838db33432d79bc7d1c57c4cc5f31501487d271c5
    Port:          4443/TCP
    Host Port:     0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=4443
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --kubelet-insecure-tls
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Mon, 09 Aug 2021 20:20:36 -0400
      Finished:     Mon, 09 Aug 2021 20:20:40 -0400
    Ready:          False
    Restart Count:  7
    Requests:
      cpu:        10m
      memory:     30M
    Readiness:    http-get https://:https/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dxxcq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-dxxcq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  12m                   default-scheduler  Successfully assigned kube-system/metrics-server-59d8698d9-rq8h2 to k8sworker2
  Normal   Pulling    12m                   kubelet            Pulling image "gcr.io/k8s-staging-metrics-server/metrics-server:v0.3.7"
  Normal   Pulled     12m                   kubelet            Successfully pulled image "gcr.io/k8s-staging-metrics-server/metrics-server:v0.3.7" in 1.915383035s
  Warning  Unhealthy  12m (x2 over 12m)     kubelet            Readiness probe failed: Get "https://10.244.1.4:4443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Created    11m (x4 over 12m)     kubelet            Created container metrics-server
  Normal   Started    11m (x4 over 12m)     kubelet            Started container metrics-server
  Normal   Pulled     11m (x3 over 12m)     kubelet            Container image "gcr.io/k8s-staging-metrics-server/metrics-server:v0.3.7" already present on machine
  Warning  Unhealthy  11m (x3 over 12m)     kubelet            Readiness probe failed: Get "https://10.244.1.4:4443/healthz": context deadline exceeded
  Warning  BackOff    2m36s (x46 over 12m)  kubelet            Back-off restarting failed container

If applicable, add screenshots to help explain your problem.
Also add any output from kubectl if applicable:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - ssh:
      address: 192.168.0.1
      user: root
      port: 22
    role: controller
  - ssh:
      address: 192.168.0.2
      user: root
      port: 22
    role: worker
  - ssh:
      address: 192.168.0.3
      user: root
      port: 22
    role: worker
  k0s:
    version: 1.21.3+k0s.0
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: k0s
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        images:
          calico:
            cni:
              image: docker.io/calico/cni
              version: v3.18.1
            kubecontrollers:
              image: docker.io/calico/kube-controllers
              version: v3.18.1
            node:
              image: docker.io/calico/node
              version: v3.18.1
          coredns:
            image: docker.io/coredns/coredns
            version: 1.7.0
          default_pull_policy: IfNotPresent
          konnectivity:
            image: us.gcr.io/k8s-artifacts-prod/kas-network-proxy/proxy-agent
            version: v0.0.16
          kubeproxy:
            image: k8s.gcr.io/kube-proxy
            version: v1.21.1
          kuberouter:
            cni:
              image: docker.io/cloudnativelabs/kube-router
              version: v1.2.1
            cniInstaller:
              image: quay.io/k0sproject/cni-node
              version: 0.1.0
          metricsserver:
            image: gcr.io/k8s-staging-metrics-server/metrics-server
            version: v0.3.7
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          kuberouter:
            autoMTU: true
          podCIDR: 10.244.0.0/16
          provider: kuberouter
          serviceCIDR: 10.96.0.0/12
        podSecurityPolicy:
          defaultPolicy: 00-k0s-privileged
        storage:
          type: etcd
        telemetry:
          enabled: false
@Tokynet Tokynet added the bug Something isn't working label Aug 10, 2021
@jnummelin
Copy link
Member

After finishing installation, control-plane will not come-up.

It is up-and-running as you're able to hit the API etc.

I expected to have a 3rd entry for the k8smaster (its hostname) .

Nope, the controller node is never registered as a Node object into the kube API. And there's no kubelet or container runtime running on the controller node thus it's not really part of the cluster from workload and networking point of view.

The output from describing metrics pod kinda hints that the networking for the pods in not functioning properly:

Readiness probe failed: Get "https://10.244.1.4:4443/healthz": context deadline exceeded

So kubelet cannot run the probes succesfully thus its restarting the pod(s). Maybe check if the other Crashloop pods are showing similar symptoms.

As it seems to be networking related, I'd advice to check kube-router logs whether it has any clues why the networking cannot be setup properly.

@jnummelin
Copy link
Member

Closing as in-active, re-open if still an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants