Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: "invalid memory address or nil pointer dereference" in daemon_controller.go #698

Closed
sonyafenge opened this issue Sep 9, 2020 · 6 comments
Milestone

Comments

@sonyafenge
Copy link
Collaborator

What happened:
run perf-tests with 5K nodes and disable workload-controller-manager, get the panic:

E0909 14:14:34.420852       1 runtime.go:73] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 12740 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x378f120, 0x7207f30)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:69 +0x7b
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 +0x82
panic(0x378f120, 0x7207f30)
        /usr/local/go/src/runtime/panic.go:522 +0x1b5
k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*PodSpec).Workloads(0xc02589fd38, 0x8, 0x989680, 0xc0391b8a20)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/api/core/v1/types.go:3286 +0x89
k8s.io/kubernetes/pkg/scheduler/nodeinfo.calculateResource(0xc02589fc00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x1)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/scheduler/nodeinfo/node_info.go:583 +0x6a
k8s.io/kubernetes/pkg/scheduler/nodeinfo.(*NodeInfo).AddPod(0xc0390e5520, 0xc02589fc00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/scheduler/nodeinfo/node_info.go:502 +0x40
k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).simulate(0xc0070898c0, 0xc0391a1000, 0xc0083d1b00, 0xc016f6e000, 0x0, 0x0, 0xc038f895e0, 0xc02b0bd688, 0x4483b2, 0x19)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:1329 +0x203
k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).nodeShouldRunDaemonPod(0xc0070898c0, 0xc0083d1b00, 0xc016f6e000, 0x0, 0x1, 0xc0390664a0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:1360 +0xbe
k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).podsShouldBeOnNode(0xc0070898c0, 0xc0083d1b00, 0xc045eb5800, 0xc016f6e000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:874 +0x5d
k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).manage(0xc0070898c0, 0xc016f6e000, 0xc03e032000, 0x1389, 0x1400, 0xc0064dabd0, 0x9, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:964 +0x12f
k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).syncDaemonSet(0xc0070898c0, 0xc00e216b20, 0x19, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:1286 +0x8a3
k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).processNextWorkItem(0xc0070898c0, 0xc0059a0b00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:297 +0xea
k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).runWorker(0xc0070898c0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:285 +0x2b
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc04acbc310)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc04acbc310, 0x3b9aca00, 0x0, 0x1, 0xc000deaa20)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc04acbc310, 0x3b9aca00, 0xc000deaa20)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).Run
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:276 +0x1d4

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Arktos version (use kubectl version):
commit 7983fde257f77bdb178197517ae0269711c68f27
Author: Yunwen Bai <yunwen.bai@futurewei.com>
Date:   Tue Sep 8 10:07:28 2020 -0700

    A simple end to end pod with vm type in E2E (#638)

    * add a simple e2e test for vm type
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@sonyafenge
Copy link
Collaborator Author

logs can be found under GCP project: workload-controller-manager:
sonya-uswest21: /home/sonyali/logs/perf-test/gce-5000/arktos/0908-debug-1a1w1e
sonya-useast1: /home/sonyali/logs/perf-test/gce-5000/arktos/0908-etcd344-1a1w1e

@sonyafenge
Copy link
Collaborator Author

9/10/2020 [perf-tests][arktos][node5000][workloadDisabled] openfiles-3a0w1e

Another repro for this panic in kube-controller-manager.log
logs can be found under GCP project: workload-controller-manager:
sonya-uswest2: /home/sonyali/logs/perf-test/gce-5000/arktos/0910-openfiles-3a0w1e

@sonyafenge
Copy link
Collaborator Author

9/11/2020 [perf-tests][arktos][node5000] load1st-3a0w1e

5K runs started from load testing has a repro for this issue, Logs can be can be found under GCP project: workload-controller-manager: sonya-useast1: home/sonyali/logs/perf-test/gce-5000/arktos/0911-load1st-3a0w1e

@sonyafenge
Copy link
Collaborator Author

sonyafenge commented Sep 16, 2020

9/15/2020 [perf-tests][arktos][node5000] issue645-3a0w1e

5k runs yesterday to verify different issue, all of them using community ETCD3.4.4 with workload-controller disabled:

  1. issue645-3a0w1e with PR: Aggregated watcher recover Aggregated watcher recover #709 and PR: Initialize WorkloadInfo during defaulting (Initialize WorkloadInfo during defaulting #706)

a. Started from load testing, cluster crashed with “killing connection/stream” and panic in item b
b. panic in etcd.log: etcdserver/api/v3rpc/watch.go panic: runtime error: invalid memory address or nil pointer dereference #737
c. no repro for send on closed channel #645 and Panic: "invalid memory address or nil pointer dereference" in daemon_controller.go #698

  1. logs can be found under GCP project: workload-controller-manager on sonya-uswest2: /home/sonyali/logs/perf-test/gce-5000/arktos/0915-issue645-3a0w1e

@sonyafenge
Copy link
Collaborator Author

9/15/2020 [perf-tests][arktos][node5000] baseline-1a0w1e

5k runs yesterday to verify different issue, all of them using community ETCD3.4.4 with workload-controller disabled:

  1. baseline-1a0w1e with PR: Aggregated watcher recover Aggregated watcher recover #709 and PR: Initialize WorkloadInfo during defaulting (Initialize WorkloadInfo during defaulting #706)

a. Started from load testing, cluster crashed with “killing connection/stream”
b. no repro for send on closed channel #645 and Panic: "invalid memory address or nil pointer dereference" in daemon_controller.go #698

  1. logs can be found under GCP project: workload-controller-manager on sonya-useast1: /home/sonyali/logs/perf-test/gce-5000/arktos/0915-baseline-1a0w1e

@sonyafenge
Copy link
Collaborator Author

close this issue because we are replacing workload-controler with scale-out solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants