Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics server keeps crashing #451

Closed
trawler opened this issue Nov 24, 2020 · 7 comments
Closed

Metrics server keeps crashing #451

trawler opened this issue Nov 24, 2020 · 7 comments
Assignees
Labels
bug Something isn't working component/metrics-server need more info Information insufficient priority/P2
Milestone

Comments

@trawler
Copy link
Contributor

trawler commented Nov 24, 2020

Version
v0.7.0

Platform
Ubuntu/Debian

What happened?
metrics-server runs for a while, then dies, and I have to remove it and let it be restarted.

Screenshots & Logs

2020-11-24T17:53:16.825277598Z E1124 17:53:16.825059       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
2020-11-24T17:53:16.861120713Z E1124 17:53:16.860900       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
2020-11-24T17:53:17.638004313Z E1124 17:53:17.637791       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:node1: unable to fetch metrics from Kubelet node1 (192.168.1.38): request failed - "401 Unauthorized", response: "Unauthorized"
2020-11-24T17:53:17.836393190Z E1124 17:53:17.836165       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
2020-11-24T17:53:17.871119858Z E1124 17:53:17.870759       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
2020-11-24T17:53:18.845991472Z E1124 17:53:18.845620       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
2020-11-24T17:53:18.887209123Z E1124 17:53:18.887017       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
2020-11-24T17:53:19.863798632Z E1124 17:53:19.863614       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
2020-11-24T17:53:19.893362556Z E1124 17:53:19.893051       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
2020-11-24T17:53:20.877012059Z E1124 17:53:20.876595       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
2020-11-24T17:53:20.900295358Z E1124 17:53:20.900076       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
2020-11-24T17:53:21.883037436Z E1124 17:53:21.882851       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
2020-11-24T17:53:21.914752090Z E1124 17:53:21.914567       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
2020-11-24T17:53:22.895909073Z E1124 17:53:22.895670       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
2020-11-24T17:53:22.926172679Z E1124 17:53:22.925858       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
2020-11-24T17:53:23.154460040Z E1124 17:53:23.153562       1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
2020-11-24T17:53:23.154511607Z E1124 17:53:23.153992       1 errors.go:77] Unauthorized

Additional context
Add any other context about the problem here.

@trawler trawler added bug Something isn't working component/metrics-server labels Nov 24, 2020
@trawler trawler added this to the 0.8.0 milestone Nov 24, 2020
@jnummelin jnummelin modified the milestones: 0.8.0, 0.9.0 Dec 2, 2020
@trawler trawler self-assigned this Dec 16, 2020
@trawler trawler added the need more info Information insufficient label Dec 16, 2020
@trawler
Copy link
Contributor Author

trawler commented Dec 17, 2020

So far, have been unable to reproduce (metrics server been running for 24 hours now, without errors). Will try to keep test env running for longer.

@trawler
Copy link
Contributor Author

trawler commented Dec 18, 2020

After 40+ hours of run, I see these logs for the metrics server:

E1217 16:10:39.004809       1 errors.go:77] Post https://10.96.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:39.912741       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:39.913666       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:40.913710       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:40.914716       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:41.914912       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:41.915629       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:42.916547       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:42.916937       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:43.917752       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1217 16:10:43.918611       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused

which indicate that the metrics-server is having trouble accessing the API

@jnummelin
Copy link
Member

That's slightly weird. :/

On my longest living k0s "cluster", I see this:

/ # kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-5f6546844f-gj2gv   1/1     Running   0          39d
kube-system   calico-node-plsng                          1/1     Running   0          39d
kube-system   coredns-5c98d7d4d8-qnblr                   1/1     Running   0          39d
kube-system   konnectivity-agent-z88fn                   1/1     Running   0          39d
kube-system   kube-proxy-rp2dp                           1/1     Running   0          39d
kube-system   metrics-server-7d4bcb75dd-pq2zl            1/1     Running   0          39d
/ # kubectl logs -n kube-system metrics-server-7d4bcb75dd-pq2zl
I1111 15:53:22.984735       1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I1111 15:53:23.437211       1 secure_serving.go:116] Serving securely on [::]:4443

So no issues in metrics-server whatsoever during 39 days.

@trawler
Copy link
Contributor Author

trawler commented Dec 21, 2020

I suspect this might be related to this? projectcalico/calico#2322 but it's only a hunch...

@jnummelin jnummelin modified the milestones: 0.9.0, 0.9.1 Dec 23, 2020
@anibal-aguila
Copy link

anibal-aguila commented Jan 5, 2021

Here same output as @trawler
Confirm the issue is present on FCOS as well.
Version
v0.9.1

Platform

NAME=Fedora
VERSION="33.20201201.3.0 (CoreOS)"
ID=fedora
VERSION_ID=33
VERSION_CODENAME=""
PLATFORM_ID="platform:f33"
PRETTY_NAME="Fedora CoreOS 33.20201201.3.0"

k0s output

❯ kubectl get pod -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-5f6546844f-dh2v2   1/1     Running   0          10h
calico-node-77g5g                          1/1     Running   0          10h
calico-node-dxhjt                          1/1     Running   0          10h
calico-node-zcdq4                          1/1     Running   0          10h
coredns-5c98d7d4d8-64mtw                   1/1     Running   0          10h
konnectivity-agent-8wtg5                   1/1     Running   0          10h
konnectivity-agent-9fnfp                   1/1     Running   0          10h
konnectivity-agent-qg6dz                   1/1     Running   0          10h
kube-proxy-fqcjs                           1/1     Running   0          10h
kube-proxy-j8nrm                           1/1     Running   0          10h
kube-proxy-msfcw                           1/1     Running   0          10h
metrics-server-7d4bcb75dd-zbcmb            0/1     Running   0          10h

pod output

E0105 08:45:47.865044       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
E0105 08:45:48.874434       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Pod: Unauthorized
E0105 08:45:48.874449       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Node: Unauthorized
E0105 08:45:49.677704       1 webhook.go:196] Failed to make webhook authorizer request: Unauthorized
E0105 08:45:49.678618       1 errors.go:77] Unauthorized

@sacesare
Copy link

Alpine 3.13(k0s 0.9.1), same issue, metrics server down after some time with authorization error mentioned above.

@jasmingacic
Copy link
Contributor

I believe this has been fixed in some of the releases after 0.9.1
Closing it for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/metrics-server need more info Information insufficient priority/P2
Projects
None yet
Development

No branches or pull requests

5 participants