Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm panic during phase based init #1382

Closed
tcurdt opened this issue Feb 2, 2019 · 38 comments
Closed

kubeadm panic during phase based init #1382

tcurdt opened this issue Feb 2, 2019 · 38 comments
Assignees
Labels
area/UX kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@tcurdt
Copy link

tcurdt commented Feb 2, 2019

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:05:53Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/arm"}

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/arm"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
  • Cloud provider or hardware configuration:
    RPi3B+

  • OS (e.g. from /etc/os-release):

PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
HYPRIOT_OS="HypriotOS/armhf"
HYPRIOT_OS_VERSION="v2.0.1"
HYPRIOT_DEVICE="Raspberry Pi"
HYPRIOT_IMAGE_VERSION="v1.9.0"
  • Kernel (e.g. uname -a):
Linux km01 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux
  • Others:

What happened?

I was trying to work around the race conditions mentioned in #413 and #1380, executing the kubeadm init in phases. Instead it crashed on the second call to init.

What you expected to happen?

I should see the join information.

How to reproduce it (as minimally and precisely as possible)?

Fresh install of hypriotos-rpi-v1.9.0 then:

sudo bash <<EOF
curl -sSL https://packagecloud.io/Hypriot/rpi/gpgkey | apt-key add -
curl -sSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
apt-get update && apt-get install -y kubeadm
sudo kubeadm reset
sudo kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16
sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
sudo kubeadm init --skip-phases=control-plane --ignore-preflight-errors=all --pod-network-cidr 10.244.0.0/16
EOF

Anything else we need to know?

This is output with the panic information

$ sudo kubeadm init --skip-phases=control-plane --ignore-preflight-errors=all --pod-network-cidr 10.244.0.0/16
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
	[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
	[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
	[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.04.0-ce. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [km01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.178.43]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [km01 localhost] and IPs [192.168.178.43 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [km01 localhost] and IPs [192.168.178.43 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xaab708]

goroutine 1 [running]:
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.validateKubeConfig(0xfb953a, 0xf, 0xfc3e7a, 0x17, 0x3034540, 0x68f, 0x7bc)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:236 +0x120
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.createKubeConfigFileIfNotExists(0xfb953a, 0xf, 0xfc3e7a, 0x17, 0x3034540, 0x0, 0xf8160)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:257 +0x90
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.createKubeConfigFiles(0xfb953a, 0xf, 0x3144b40, 0x3527c60, 0x1, 0x1, 0x0, 0x0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:120 +0xf4
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.CreateKubeConfigFile(0xfc3e7a, 0x17, 0xfb953a, 0xf, 0x3144b40, 0x99807501, 0xb9bfcc)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:93 +0xe8
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases.runKubeConfigFile.func1(0xf76bc8, 0x32f2ff0, 0x0, 0x0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/kubeconfig.go:155 +0x168
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1(0x336ee80, 0x0, 0x0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235 +0x160
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll(0x34ecbe0, 0x3527d68, 0x32f2ff0, 0x0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:416 +0x5c
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run(0x34ecbe0, 0x24, 0x35bbdb4)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:208 +0xc8
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1(0x3513400, 0x3227560, 0x0, 0x4)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:141 +0xfc
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0x3513400, 0x32274c0, 0x4, 0x4, 0x3513400, 0x32274c0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:760 +0x20c
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x3512140, 0x3513400, 0x35123c0, 0x300c8e0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:846 +0x210
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(0x3512140, 0x300c0d8, 0x117dec0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:794 +0x1c
k8s.io/kubernetes/cmd/kubeadm/app.Run(0x3034060, 0x0)
	/workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:48 +0x1b0
main.main()
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:29 +0x20

It seems like some nil checks are missing https://github.com/kubernetes/kubernetes/blob/v1.13.3/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L236

this can only happen if the kubeconfig phase reads a corrupted kubeconfig file with missing cluster context. possible a side effect of not calling reset.

But the kubeconfig file looks OK to me

$ sudo cat /etc/kubernetes/admin.conf 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNU1ESXdNakF4TXpBeU5Wb1hEVEk1TURFek1EQXhNekF5TlZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTUIrCmJ1ckREUGtBR1V5NmM3ZkY5VWI5UlMyRWVqdmx2aHRwRlBGVzVPRGtRL3ZsbFU5b05MR0txdXZjRVVFekJmTnkKcURzQzBsVktoTkFBMWl6TnplZVJEWlRDZ2ZFYitxa3Zib0xGd25hd1A0ZkRKKzVnUndxN0JEM0xYdWNsTFNmeApmczkwb05RaXdYL0hXSjBOUkJZRnN6Zk1iaXZaUSsrRDJjS0FOZm9qSGx2Rm9oU1BqZkVlWmp1NnBtTEhXNlMyCmY4NjJGcnhwSEdOWmhmR3JaTmd1YUFkK0tIM1BCc1IxTThpUFpiMnFjTEN0LzNmMHY2ejc4bUVoL294UC9oUjEKdWVGWmZJWCtpbmxzVXZDM2N3WXZ3VFd6ZnlOT0NSMUJCcUNHRmd4bmt0VVRJd0M3Szc3VHZUcGpnazd5NnAzSQpHMVd3SmVUUERYRXRleGhFTDQwQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFId2NhY0RuK0ZhaldKekJpVExlZmxLeVRSNTgKVm9yKzluZmtMRkNrTWZBcC94b2pwekVKbEJEM0didmh5V21tS2tJNDgvSHZabml1c1g1THNRdXhzamV4bnZTYwppMG9keFZOMTdSOFFiWVovZ0hsREdRanlnYXhvUWN6M1J5MFU3NmJ0U0toQ1VTTko2NEZqeGp1bU9MemVYbkRLCjlsRElPZHZ4VXRXZDVaajc1YmZFRmNyNHJKbEJTK0dZRi9Da2RrdzZtUlpXNCsrYkNPd3RBUGVUemd6bEZtQ1EKZmptM28wQUlNSitvMk9YUjFrRXFlTXo2VDM3b2FsYWNNU1hEeHh1cjBZUmw3NUJ2M2lBOGk0NE5Oei9tNzhOdQpPaW1ONnBVMDFyUWJEVjVBRzJmbndwaURBcGxNbkQ2R0FyZ3R5b3VUREs2ZmlWOXpZaVdkQlBLeFQ5az0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://192.168.178.43:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lJYk4rZTR0WFh4and3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB4T1RBeU1ESXdNVE13TWpWYUZ3MHlNREF5TURJd01UTXlNRFJhTURReApGekFWQmdOVkJBb1REbk41YzNSbGJUcHRZWE4wWlhKek1Sa3dGd1lEVlFRREV4QnJkV0psY201bGRHVnpMV0ZrCmJXbHVNSUlCSWpBTkJna3Foa2lHOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDQVFFQXBnMzBYR0h4U1IwMWdLMGEKc3FFb1hTanFJNXloZVErZ21YcWNJeDRMYUNIWVZJM2VZc29SbTVSTCtDYTNRblJ5aE4vSHVvMkJYUE1MdGlIZwpIR1BlL3VKRkRHOHJxa2xVbHZZSXZDMkE4QVpLVENEUzBFRmNoQ0RhOHhDMGVQUG9jbXdLWTdVRHFkWGIvY2RHCk8yZG9LaWJLeGtGM3dEWjVCUXR4VXgzTDB1bWZDVFFNOWlQYk00aHF3N3N0Rzc5SXE1dUZXU1VxMFNRb0tad0oKbDFzRXpCQ3kveGV2bWIvTG1jLzR5QTVRVGNPK09yejFTdUZReVRxN0NIb1g1T1ZadDRqbk9jQUZpdFhDbWFROAp3OW0wRERvanJLakVrWlZNL1M4aWY3T3hUZ1d5MDVkaGE4VWZ2TStHaFBmVTF1cEJqdFJGcXB2VUIySkp6UEFmCnl4cFJCUUlEQVFBQm95Y3dKVEFPQmdOVkhROEJBZjhFQkFNQ0JhQXdFd1lEVlIwbEJBd3dDZ1lJS3dZQkJRVUgKQXdJd0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFFcmRYbEhyb2w1a0NQR1UyYUNJOUE5a3NuL0k2a1l3RlR6RAo5NFprNVFSQVhUNjlqSy9namY3c3dNL1JxY1RpRnNFQnY0bXpzeGRjNnBydXdHbytab1o3V2VGTTAvNFJNcFZJCm1qVitKbWdlNk14WUkyMWhOZnMydjNNN2RnbVpMRjJsN25yRTNMTVpiMHZMdUJuN2ZKZWxXb0lGSDd3WWFnQnIKeFlWVzZjYzJtWkkzWHVxYTcraWpjNHpJdmpDSjR6cTFiRUdSUlNEQWNwbjhnQjFXWXRoUWd2cHV0cGZGTGlDTApIK0dya1ZCR3FEY3VVbFRJMkJlZXVMMUduRXJsQzYremhDZnY1VStGR2pwS2RwaVN6UkV4T1F4bEJJOEYzQnZVCnp0VTRVVkR3S24vYUFOUm01N3d6dHlTL0FyOGJQUlRrR0psbGpOZVE0bEd1cWtFQTJKaz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBcGczMFhHSHhTUjAxZ0swYXNxRW9YU2pxSTV5aGVRK2dtWHFjSXg0TGFDSFlWSTNlCllzb1JtNVJMK0NhM1FuUnloTi9IdW8yQlhQTUx0aUhnSEdQZS91SkZERzhycWtsVWx2WUl2QzJBOEFaS1RDRFMKMEVGY2hDRGE4eEMwZVBQb2Ntd0tZN1VEcWRYYi9jZEdPMmRvS2liS3hrRjN3RFo1QlF0eFV4M0wwdW1mQ1RRTQo5aVBiTTRocXc3c3RHNzlJcTV1RldTVXEwU1FvS1p3Smwxc0V6QkN5L3hldm1iL0xtYy80eUE1UVRjTytPcnoxClN1RlF5VHE3Q0hvWDVPVlp0NGpuT2NBRml0WENtYVE4dzltMEREb2pyS2pFa1pWTS9TOGlmN094VGdXeTA1ZGgKYThVZnZNK0doUGZVMXVwQmp0UkZxcHZVQjJKSnpQQWZ5eHBSQlFJREFRQUJBb0lCQURsWFlrV3drS2lsekg3MQp4OTFkWjFuY01oWkFGVVovemY2UjUyNzlCZ1ZjZ3A2WUt1NUVSeFpKZkg1aHFERHJrMHd0Rm9SbUx3RFE4UDlnCjdVb0FkdFhmZnVhUFVTM0ppc3RpaEp1dXZ2S2p5VzVHZTJYczNDeklSN05kMW1SYUhhKzlmVXozQ2gvUXVOb0cKd1Vyc0ozMCt6aER1TkpNTWZIZndmcDZzRUdGeE9yYnN5WWE3S0l1RWxuQ0FHWXQwalpjcmw2OENKcVJnZEhEbwpwRFZCL2Zub0ZBZi82Ym9Ga1JTckJkeUM5clpqYlZRbmtwT0VpQ0JONCtMS3RIRjlhUXhELytJWXRVeWFrb2tLClNJNWVTZEhhbkl0U2hxaTVCQmtjV3c5cmdhZDJjYWE5TjRMR1Q1N29LSFJmSFV2T1phTDlGZC9xbjJlb285RlAKTXplcVdCVUNnWUVBeGk5Y3FIOEo1eHFmWUlpRXU2Ly9HTjZETXo0ZWlaSUZNRGdzWjhRR21Fc0hhd1ZZWHlRRwpQNjVES0p1ZUI3SWk1eHV2Z2tGR2JJVnBBMmRleEo2YzhtQmo4N2Zoa2s5SGxDb1VpcDBEdU9uVnJJTVR5Uk02CkR5QWNQaUw2MEY4cGFoU2htZ21USHdXYS81N1Vscllxc1N6RW4vVDBuVFFwQ09uYVJFTlVvTzhDZ1lFQTFuOE8Kdkk1OGx4MzBOSTg3UXV3eVNoMmxKUG04bnJUc0ZBbXRNNXB6Z1ovaUc5TUVGU0V0RzZLbnBuNlRrQjR0YzEzQgpiN01SVWZWY0RIQTRwS09TNk1DZHVoTmJLN3pmNjNOMFpMeWtMdzN2aExRYlhrRlBScEtEQm0rc3J2M0V1MEVnCnQwODNSKzdaMjV1aGhYa2I4WU9kaTZpQXk1VytMS2FPRzh0OWhVc0NnWUVBc2dDeUdZalk3U0NsUzMveXI5ejQKbzI2ZnFyTzltOVJ5SW9naG9pV1h3c3VJNHgvTzZzMGhhNnJxR1J3RWlXYi9JRkptaGZoNDkxbXdJMldCNGRtUQpuOFhob0hKbEFST0I5OXIveml3T3Z0UVBuYjJ4VktXWFBTU2JHVmd6ckZuOGlaSDBQN1VmMWZvajZEblJPWGh1CnllbXF4UHl2aEU3b0dHQnFNV3ZFSkRNQ2dZQVYxV01ib0dsZ1BJVlNJRTVJOXAvNzJWNnBEOTY2VFBKRzYrRTgKZ25sRmRZL2ZnekJFTWxkVUc4OXk3Q2w3SHdkRFdnVEpxUEdYWlNGVWhzdk5QblZDeWZDRU0xb3hibzFnZXlVYQo1L1RTY1ZtektWNHJ6dndSMC9JUVlxZXlQRlNkTnZqc2o5eXhyc2R3U2p3N3lPTW1SMTV2Qzl6b1hEcTZjczIrCldJMVRWd0tCZ1FDbWRpeG9nTXM0WkR3dFl4c3BBZzRqYUdRMURJck52bWJEcEl4eFhYNXBqSmFSWXU2VnljZk0KQkZJZmFtTkJpODNadDBIWkdERkdOdUdCSEJFTys4L1k4NWZDNWgySlM0MTBjUGhoVkoxWSs5Q0NpOGgzREZ2Swo5SWRzNkR0MUlCRFlsejFuV2p4cVcyb01zaGxZSy9BSkpYbGxRVXR3ZEJhczc4bkRvdkplYWc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
@neolit123 neolit123 added this to the v1.14 milestone Feb 2, 2019
@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/UX labels Feb 2, 2019
@neolit123 neolit123 self-assigned this Feb 2, 2019
@neolit123
Copy link
Member

i cannot reproduce this problem on a Ubuntu 17.10 x86_64 setup:

sudo kubeadm reset
sudo kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16
sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
sudo kubeadm init --skip-phases=control-plane --ignore-preflight-errors=all --pod-network-cidr 10.244.0.0/16

execution continues fine after writing this file:

[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
...
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3-beta.0", GitCommit:"c6d339953bd4fd8c021a6b5fb46d7952b30be9f9", GitTreeState:"clean", BuildDate:"2019-02-02T02:09:01Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"linux/amd64"}

init succeeds:

...
You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.0.102:6443 --token gzdxwi.7kvz73t7xjphd8fa --discovery-token-ca-cert-hash sha256:bb2a181b6cebd44482fb14a061acd7c2f3984425be0f5d5a52527ccda70aa0d3

some questions:

  • do you have enough disk space, 4 GB RAM and 2 Cores of CPU on this RPI?
  • are you facing the same issue if you remove etc/kubernetes completely?
  • do you have another RPI setup where you can try the same?

please note that we don't have test signal for ARM, so the support is experimental.

@tcurdt
Copy link
Author

tcurdt commented Feb 2, 2019

Hmm. What's the content of the file for you?

As for the questions:

  • Disk space is more than enough still free (10GB+).
  • The RPi3B+ is a quad core but only has 1GB of RAM.
  • I've run this again while monitoring the RAM usage. It never even got remotely close to running OOM.
  • I can try with another RPi3 (but I highly doubt that will change much).

@neolit123
Copy link
Member

What's the content of the file for you?

a properly populated kubeconfig file for the scheduler.

@tcurdt
Copy link
Author

tcurdt commented Feb 2, 2019

I am going through the code but so far I just don't see how it could fail writing after https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L93

@tcurdt
Copy link
Author

tcurdt commented Feb 2, 2019

Hmmmmmmmm. It worked on the other RPi

$ sudo kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
$ sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
$ sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
$ sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
$ sudo kubeadm init --v=1 --skip-phases=control-plane --ignore-preflight-errors=all --pod-network-cidr 10.244.0.0/16
I0202 04:26:49.363801    3742 feature_gate.go:206] feature gates: &{map[]}
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
I0202 04:26:49.364763    3742 checks.go:572] validating Kubernetes and kubeadm version
I0202 04:26:49.364880    3742 checks.go:171] validating if the firewall is enabled and active
I0202 04:26:49.409686    3742 checks.go:208] validating availability of port 6443
I0202 04:26:49.410032    3742 checks.go:208] validating availability of port 10251
I0202 04:26:49.410148    3742 checks.go:208] validating availability of port 10252
I0202 04:26:49.410256    3742 checks.go:283] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
  [WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
I0202 04:26:49.410444    3742 checks.go:283] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
  [WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
I0202 04:26:49.410539    3742 checks.go:283] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
  [WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
I0202 04:26:49.410630    3742 checks.go:283] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0202 04:26:49.410671    3742 checks.go:430] validating if the connectivity type is via proxy or direct
I0202 04:26:49.410761    3742 checks.go:466] validating http connectivity to first IP address in the CIDR
I0202 04:26:49.410829    3742 checks.go:466] validating http connectivity to first IP address in the CIDR
I0202 04:26:49.410883    3742 checks.go:104] validating the container runtime
I0202 04:26:49.862102    3742 checks.go:130] validating if the service is enabled and active
I0202 04:26:49.934970    3742 checks.go:332] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0202 04:26:49.935189    3742 checks.go:332] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0202 04:26:49.935285    3742 checks.go:644] validating whether swap is enabled or not
I0202 04:26:49.935387    3742 checks.go:373] validating the presence of executable ip
I0202 04:26:49.935512    3742 checks.go:373] validating the presence of executable iptables
I0202 04:26:49.935611    3742 checks.go:373] validating the presence of executable mount
I0202 04:26:49.935767    3742 checks.go:373] validating the presence of executable nsenter
I0202 04:26:49.935893    3742 checks.go:373] validating the presence of executable ebtables
I0202 04:26:49.936003    3742 checks.go:373] validating the presence of executable ethtool
I0202 04:26:49.936107    3742 checks.go:373] validating the presence of executable socat
I0202 04:26:49.936196    3742 checks.go:373] validating the presence of executable tc
I0202 04:26:49.936332    3742 checks.go:373] validating the presence of executable touch
I0202 04:26:49.936409    3742 checks.go:515] running all checks
  [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.04.0-ce. Latest validated version: 18.06
I0202 04:26:50.116640    3742 checks.go:403] checking whether the given node name is reachable using net.LookupHost
I0202 04:26:50.116718    3742 checks.go:613] validating kubelet version
I0202 04:26:50.434161    3742 checks.go:130] validating if the service is enabled and active
I0202 04:26:50.484312    3742 checks.go:208] validating availability of port 10250
I0202 04:26:50.484604    3742 checks.go:208] validating availability of port 2379
I0202 04:26:50.484717    3742 checks.go:208] validating availability of port 2380
I0202 04:26:50.484824    3742 checks.go:245] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
I0202 04:26:50.913249    3742 checks.go:839] pulling k8s.gcr.io/kube-apiserver:v1.13.3
I0202 04:27:29.700381    3742 checks.go:839] pulling k8s.gcr.io/kube-controller-manager:v1.13.3
I0202 04:27:53.618297    3742 checks.go:839] pulling k8s.gcr.io/kube-scheduler:v1.13.3
I0202 04:28:10.541033    3742 checks.go:839] pulling k8s.gcr.io/kube-proxy:v1.13.3
I0202 04:28:19.680453    3742 checks.go:839] pulling k8s.gcr.io/pause:3.1
I0202 04:28:22.317787    3742 checks.go:839] pulling k8s.gcr.io/etcd:3.2.24
I0202 04:29:12.484547    3742 checks.go:839] pulling k8s.gcr.io/coredns:1.2.6
I0202 04:29:23.426057    3742 kubelet.go:71] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
I0202 04:29:23.958235    3742 kubelet.go:89] Starting the kubelet
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0202 04:29:24.414430    3742 certs.go:113] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [km01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.178.41]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0202 04:29:39.845477    3742 certs.go:113] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
I0202 04:30:06.572764    3742 certs.go:113] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [km01 localhost] and IPs [192.168.178.41 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [km01 localhost] and IPs [192.168.178.41 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0202 04:30:40.419908    3742 certs.go:72] creating a new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0202 04:30:50.639994    3742 kubeconfig.go:92] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0202 04:30:54.436081    3742 kubeconfig.go:92] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0202 04:31:00.520396    3742 kubeconfig.go:92] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0202 04:31:04.669076    3742 kubeconfig.go:92] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0202 04:31:07.085092    3742 local.go:60] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0202 04:31:07.085194    3742 waitcontrolplane.go:89] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 93.513465 seconds
I0202 04:32:40.622238    3742 uploadconfig.go:114] [upload-config] Uploading the kubeadm ClusterConfiguration to a ConfigMap
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0202 04:32:40.758253    3742 uploadconfig.go:128] [upload-config] Uploading the kubelet component config to a ConfigMap
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
I0202 04:32:40.848825    3742 uploadconfig.go:133] [upload-config] Preserving the CRISocket information for the control-plane node
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "km01" as an annotation
[mark-control-plane] Marking the node km01 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node km01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: f36l6v.10hbkeh7af1vtane
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
I0202 04:32:42.041777    3742 clusterinfo.go:46] [bootstraptoken] loading admin kubeconfig
I0202 04:32:42.045796    3742 clusterinfo.go:54] [bootstraptoken] copying the cluster from admin.conf to the bootstrap kubeconfig
I0202 04:32:42.047641    3742 clusterinfo.go:66] [bootstraptoken] creating/updating ConfigMap in kube-public namespace
I0202 04:32:42.058852    3742 clusterinfo.go:80] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.178.41:6443 --token f36l6v.10hbkeh7af1vtane --discovery-token-ca-cert-hash sha256:e38e0f37463915d5d30a846d4cdc0d2ea2b0abefed9c98ce692ab367555ec7ea

I'll try another fresh install on the other one.

@tcurdt
Copy link
Author

tcurdt commented Feb 2, 2019

Still the same on the other one. Crazy!
The order in the log is slightly different.

$ sudo kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
$ sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
$ sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
$ sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
$ sudo kubeadm init --v=1 --skip-phases=control-plane --ignore-preflight-errors=all --pod-network-cidr 10.244.0.0/16
I0202 05:06:45.782055    3351 feature_gate.go:206] feature gates: &{map[]}
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
I0202 05:06:45.784710    3351 checks.go:572] validating Kubernetes and kubeadm version
I0202 05:06:45.785010    3351 checks.go:171] validating if the firewall is enabled and active
I0202 05:06:45.872786    3351 checks.go:208] validating availability of port 6443
I0202 05:06:45.873764    3351 checks.go:208] validating availability of port 10251
I0202 05:06:45.874075    3351 checks.go:208] validating availability of port 10252
I0202 05:06:45.874415    3351 checks.go:283] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
  [WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
I0202 05:06:45.874950    3351 checks.go:283] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
  [WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
I0202 05:06:45.875295    3351 checks.go:283] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
  [WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
I0202 05:06:45.875571    3351 checks.go:283] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0202 05:06:45.875709    3351 checks.go:430] validating if the connectivity type is via proxy or direct
I0202 05:06:45.875902    3351 checks.go:466] validating http connectivity to first IP address in the CIDR
I0202 05:06:45.876126    3351 checks.go:466] validating http connectivity to first IP address in the CIDR
I0202 05:06:45.876304    3351 checks.go:104] validating the container runtime
I0202 05:06:46.673342    3351 checks.go:130] validating if the service is enabled and active
I0202 05:06:46.811415    3351 checks.go:332] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0202 05:06:46.811814    3351 checks.go:332] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0202 05:06:46.811992    3351 checks.go:644] validating whether swap is enabled or not
I0202 05:06:46.812215    3351 checks.go:373] validating the presence of executable ip
I0202 05:06:46.812439    3351 checks.go:373] validating the presence of executable iptables
I0202 05:06:46.812607    3351 checks.go:373] validating the presence of executable mount
I0202 05:06:46.812907    3351 checks.go:373] validating the presence of executable nsenter
I0202 05:06:46.813138    3351 checks.go:373] validating the presence of executable ebtables
I0202 05:06:46.813358    3351 checks.go:373] validating the presence of executable ethtool
I0202 05:06:46.813630    3351 checks.go:373] validating the presence of executable socat
I0202 05:06:46.813826    3351 checks.go:373] validating the presence of executable tc
I0202 05:06:46.814069    3351 checks.go:373] validating the presence of executable touch
I0202 05:06:46.814219    3351 checks.go:515] running all checks
  [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.04.0-ce. Latest validated version: 18.06
I0202 05:06:47.048462    3351 checks.go:403] checking whether the given node name is reachable using net.LookupHost
I0202 05:06:47.048592    3351 checks.go:613] validating kubelet version
I0202 05:06:47.595796    3351 checks.go:130] validating if the service is enabled and active
I0202 05:06:47.651204    3351 checks.go:208] validating availability of port 10250
I0202 05:06:47.651472    3351 checks.go:208] validating availability of port 2379
I0202 05:06:47.651615    3351 checks.go:208] validating availability of port 2380
I0202 05:06:47.651727    3351 checks.go:245] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
I0202 05:06:48.016166    3351 checks.go:839] pulling k8s.gcr.io/kube-apiserver:v1.13.3
I0202 05:07:31.930706    3351 checks.go:839] pulling k8s.gcr.io/kube-controller-manager:v1.13.3
I0202 05:07:53.557135    3351 checks.go:839] pulling k8s.gcr.io/kube-scheduler:v1.13.3
I0202 05:08:02.157734    3351 checks.go:839] pulling k8s.gcr.io/kube-proxy:v1.13.3
I0202 05:08:12.517983    3351 checks.go:839] pulling k8s.gcr.io/pause:3.1
I0202 05:08:15.114442    3351 checks.go:839] pulling k8s.gcr.io/etcd:3.2.24
I0202 05:09:22.376566    3351 checks.go:839] pulling k8s.gcr.io/coredns:1.2.6
I0202 05:09:32.810381    3351 kubelet.go:71] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
I0202 05:09:33.227122    3351 kubelet.go:89] Starting the kubelet
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0202 05:09:33.884529    3351 certs.go:113] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [km01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.178.43]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0202 05:10:16.286602    3351 certs.go:113] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
I0202 05:10:51.727753    3351 certs.go:113] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [km01 localhost] and IPs [192.168.178.43 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [km01 localhost] and IPs [192.168.178.43 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
I0202 05:11:37.842439    3351 certs.go:72] creating a new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0202 05:11:53.841429    3351 kubeconfig.go:92] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0202 05:11:59.272573    3351 kubeconfig.go:92] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0202 05:12:04.958869    3351 kubeconfig.go:92] creating kubeconfig file for controller-manager.conf
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xaab708]

goroutine 1 [running]:
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.validateKubeConfig(0xfb953a, 0xf, 0xfc3e7a, 0x17, 0x24943f0, 0x68b, 0x7bc)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:236 +0x120
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.createKubeConfigFileIfNotExists(0xfb953a, 0xf, 0xfc3e7a, 0x17, 0x24943f0, 0x0, 0x26d0000)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:257 +0x90
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.createKubeConfigFiles(0xfb953a, 0xf, 0x27f38c0, 0x2a99c60, 0x1, 0x1, 0x0, 0x0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:120 +0xf4
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.CreateKubeConfigFile(0xfc3e7a, 0x17, 0xfb953a, 0xf, 0x27f38c0, 0x7211a101, 0xb9bfcc)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:93 +0xe8
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases.runKubeConfigFile.func1(0xf76bc8, 0x25f4820, 0x0, 0x0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/kubeconfig.go:155 +0x168
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1(0x26a6800, 0x0, 0x0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235 +0x160
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll(0x26b1270, 0x2a99d68, 0x25f4820, 0x0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:416 +0x5c
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run(0x26b1270, 0x24, 0x2933db4)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:208 +0xc8
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1(0x2689b80, 0x249c5d0, 0x0, 0x5)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:141 +0xfc
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0x2689b80, 0x249c5a0, 0x5, 0x6, 0x2689b80, 0x249c5a0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:760 +0x20c
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x2688140, 0x2689b80, 0x2688780, 0x281e3b0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:846 +0x210
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(0x2688140, 0x240c0d8, 0x117dec0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:794 +0x1c
k8s.io/kubernetes/cmd/kubeadm/app.Run(0x2494000, 0x0)
  /workspace/anago-v1.13.3-beta.0.37+721bfa751924da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:48 +0x1b0
main.main()
  _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:29 +0x20

@timothysc timothysc added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Feb 7, 2019
@RA489
Copy link
Contributor

RA489 commented Feb 14, 2019

I tried this on centos but execution continues after [kubeconfig] and init suceeded.

@tcurdt
Copy link
Author

tcurdt commented Feb 14, 2019

@RA489 Thanks for trying - but this seems highly sensitive to the environment running in/on.
It failed on a RPi3B+ but succeeded on an RPI3B.

It even worked even on the RPi3B+ when I manually execute the init in the individual init phases. (see issue #1380 )

@bart0sh
Copy link

bart0sh commented Feb 14, 2019

@tcurdt @neolit123 I was not able to reproduce it in my environment.

After looking at the traces I can see the difference. It breaks here https://github.com/kubernetes/kubernetes/blob/v1.13.3/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L235 which means that it was able to load configuration file.

However, on my machine it always exists here https://github.com/kubernetes/kubernetes/blob/v1.13.3/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L224 which means that configuration file doesn't exist.

@tcurdt if you still have this issue reproducible can you do the following:

  • run these commands:
sudo kubeadm reset
sudo kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16
sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
  • run sudo ls /etc/kubernetes/ and show the output here
  • run sudo cat /etc/kubernetes/*.conf and show the output here

P.S. This is very interesting issue. It would be great to find out its reason and fix it.

@neolit123
Copy link
Member

@bart0sh i suspect memory corruption on ARM - but this could be isolated to the RPI CPU.
we've seen similar problems caused by the go compiler for ARM in older versions - e.g. illegal instructions.

i still have a plan to send one minor patch related to this.

@bart0sh
Copy link

bart0sh commented Feb 14, 2019

@neolit123 it could be so, but it doesn't explain the fact that it was able to load config file, does it?

@neolit123
Copy link
Member

it loads a zero byte config, which is a valid operation.
but from there interacting with this config is a panic.

@bart0sh
Copy link

bart0sh commented Feb 14, 2019

@neolit123 where zero byte config comes from? I don't see it happening in my setup.

@bart0sh
Copy link

bart0sh commented Feb 14, 2019

reproduced with empty config:

[kubeconfig] Writing "controller-manager.conf" kubeconfig file
>>> runKubeConfigFile 2 scheduler.conf
>>> kubeConfigFileName: scheduler.conf
>>> createKubeConfigFileIfNotExists: /etc/kubernetes/scheduler.conf
>>>> validateKubeConfig: /etc/kubernetes/scheduler.conf &{  {false map[]} map[] map[] map[]  map[]} 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x108702b]

goroutine 1 [running]:
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.validateKubeConfig(0x1705755, 0xf, 0x1705215, 0xe, 0xc0000387e0, 0x0, 0x41e)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:240 +0x2cb
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.createKubeConfigFileIfNotExists(0x1705755, 0xf, 0x1705215, 0xe, 0xc0000387e0, 0x0, 0xc0005399b0)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:261 +0x18f
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.createKubeConfigFiles(0x1705755, 0xf, 0xc0001aefc0, 0xc0005b3830, 0x1, 0x1, 0x0, 0x0)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:121 +0x1f1
k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig.CreateKubeConfigFile(0x1705215, 0xe, 0x1705755, 0xf, 0xc0001aefc0, 0x0, 0x0)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go:93 +0x130
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases.runKubeConfigFile.func1(0x16bef40, 0xc0003a6510, 0x0, 0x0)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/kubeconfig.go:157 +0x24f
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1(0xc000474f00, 0x0, 0x0)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235 +0x1e9
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll(0xc00057e510, 0xc0005b3a90, 0xc0003a6510, 0x0)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:416 +0x6e
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run(0xc00057e510, 0x1, 0x1)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:208 +0x107
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1(0xc000382000, 0xc00037db80, 0x0, 0x4)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:142 +0x1c3
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc000382000, 0xc00037db00, 0x4, 0x4, 0xc000382000, 0xc00037db00)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:760 +0x2cc
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc000382500, 0xc000382000, 0xc0003aa000, 0xc0000c4540)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:846 +0x2fd
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(0xc000382500, 0xc00000c010, 0x18c9f40)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:794 +0x2b
k8s.io/kubernetes/cmd/kubeadm/app.Run(0xc000038240, 0x18b)
	/home/ed/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:48 +0x202
main.main()
	_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:29 +0x33

@bart0sh
Copy link

bart0sh commented Feb 14, 2019

@neolit123 Do you have any idea why there is an empty config in @tcurdt's setup?

@neolit123
Copy link
Member

@bart0sh

Do you have any idea why there is an empty config in @tcurdt's setup?

from my earlier comment:

i suspect memory corruption on ARM - but this could be isolated to the RPI CPU.

we need to reproduce and debug this on Raspberry PI. if you don't have one, just leave this issue for now.
AFAIK, ARM desktop machines are not affected by this bug or at least we haven't seen reports.

@bart0sh
Copy link

bart0sh commented Feb 14, 2019

@neolit123 Would it make sense to fix it by checking if config.Contexts map has the expectedCtx/currentCtx before using it?

we can actually investigate it further with @tcurdt's help and find out where those empty or broken files come from.

@neolit123
Copy link
Member

Would it make sense to fix it by checking if config.Contexts map has the expectedCtx/currentCtx before using it?

yes, my idea was to apply a similar change across kubeadm (AFAIK we assume a valid config like that in more than one place). but as you can understand this will not fix the problem, only the panic.

please feel free to send a patch for the above if you'd like.

@timothysc timothysc modified the milestones: v1.14, Next Feb 14, 2019
@tcurdt
Copy link
Author

tcurdt commented Feb 14, 2019

@bart0sh Have a look here #1380 (comment)

...and also read the progress leading up to it.

Unfortunately I don't have the full output of sudo ls /etc/kubernetes/ and sudo cat /etc/kubernetes/*.conf. I would have to re-do that. But the interesting part from the initial issue was that /etc/kubernetes/scheduler.conf had a size of 0.

Of course there is no way to rule out an ARM memory corruption - but given the fact that it is very reproducible (at least on my RPi) and it works when all phases are run separately, I'd rather bet on some kind of a race condition. I'd just wouldn't expect a memory corruption to be that reproducible. But I don't know the go compiler/runtime well enough to make an educated guess.

The time I can spend on this is limited but I am happy to help to further dig into this.

@neolit123
Copy link
Member

neolit123 commented Feb 14, 2019

@tcurdt can you please run kubeadm trough something like delve or your debugger of choice:
https://github.com/go-delve/delve

also please see if you RPI distro has valgrind and run the binary through that.

@bart0sh
Copy link

bart0sh commented Feb 14, 2019

@tcurdt Yes, the issue is that after kubeadm reset and kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16 0 size conf files appear in /etc/kubernetes. In my setup I don't see them there. So, the question is when they're created? kubeadm reset should remove all of them I believe. Can you confirm that they're created by kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16 ?

@gbailey46
Copy link

Works for me:

sudo kubeadm init phase control-plane all --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.86.47

sudo sed -i 's/failureThreshold: 8/failureThreshold: 20/g’ /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/initialDelaySeconds: [0-9]\+/initialDelaySeconds: 720/' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml

sudo kubeadm init skip-phases=control-plane --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.86.47 --ignore-preflight-errors=all --dry-run

sudo cp -dpR /tmp/kubeadm-init-dryrun707341788/controller-manager.conf /etc/kubernetes/.
sudo cp -dpR /tmp/kubeadm-init-dryrun707341788/scheduler.conf /etc/kubernetes/.
sudo cp -dpR /tmp/kubeadm-init-dryrun707341788/ca.key /etc/kubernetes/pki/.
sudo cp -dpR /tmp/kubeadm-init-dryrun707341788/ca.crt /etc/kubernetes/pki/.


sudo kubeadm init skip-phases=control-plane --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.86.47 --ignore-preflight-errors=all

@tcurdt
Copy link
Author

tcurdt commented Feb 18, 2019

@gbailey46 can you please also add the exact environment you ran this on. Otherwise "works for me" is not exactly helpful.

@gbailey46
Copy link

Rpi3B

pi@pi3:~ $ sudo cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@pi3:~ $ uname -a
Linux pi3 4.14.79-v7+ #1159 SMP Sun Nov 4 17:50:20 GMT 2018 armv7l GNU/Linux
pi@pi3:~ $

@neolit123
Copy link
Member

@gbailey46
does it consistently work for you - e.g. multiple consecutive times?

@tcurdt
what differences do you have with the above setup?

@tcurdt
Copy link
Author

tcurdt commented Feb 18, 2019

Well, it worked for me on a RPi3B, too. It did not work on a Rpi3B+.

@gbailey46
Copy link

Yes it works repeatedly.
I realised that the controller-manager.conf and scheduler.conf file were 0 bytes and that the init would use an existing file (and CA) if one already existed, hence potentially skipping the file create/write that seems to segfault.
On a whim I tried --dry-run and it completed without segfault. So I used the relevant CA and .conf from the --dry-run.

@neolit123
Copy link
Member

neolit123 commented Feb 18, 2019

so it technically exhibits the same SIGSEGV behavior.

i also see the CPUs for the two boards are the same;
https://www.datenreise.de/en/raspberry-pi-3b-and-3b-in-comparison/

can someone test on a non-ARM Cortex-A53 board (if there is such a RPI even)?

also we still need help with someone debugging the root of the problem.

@tcurdt
Copy link
Author

tcurdt commented Feb 18, 2019

I was out of action the past few days but it's on my todo to have another closer look.

@bart0sh
Copy link

bart0sh commented Feb 18, 2019

@gbailey46

I realised that the controller-manager.conf and scheduler.conf file were 0

Can you tell when they're created? Was it after running kubeadm init phase control-plane all ?

@gbailey46
Copy link

They are created when you execute:

sudo kubeadm init skip-phases=control-plane --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.86.47 --ignore-preflight-errors=all

@bart0sh
Copy link

bart0sh commented Feb 18, 2019

@gbailey46 thanks. That's very interesting. Looks like a race condition to me. I don't see where in the code that could happen. will look again.

@tcurdt
Copy link
Author

tcurdt commented Mar 7, 2019

@neolit123 I was just trying to give it another shot but apparently delve does not support ARM and gdb seems also to give problems. I did get valgrind installed though.

Anyone willing to pair on this via IRC/discord/whatever?

@neolit123
Copy link
Member

neolit123 commented Mar 7, 2019

hi, we are prepping for the 1.14 release and i won't have time anytime soon.
but please do post updates.

@tcurdt
Copy link
Author

tcurdt commented Mar 7, 2019

@neolit123 too bad. Someone else? Some other suggestion for a debugger?

In theory this should just work:

sudo kubeadm reset
sudo kubeadm init --pod-network-cidr 10.244.0.0/16

So shall I run this through valgrind

sudo kubeadm reset
sudo valgrind --leak-check=yes kubeadm init --pod-network-cidr 10.244.0.0/16

or is the primary objective to find when the file is created with size 0? So I run through all the phases and list the content?

@neolit123
Copy link
Member

if the debuggers are not very useful i would start adding fmt.Print(...) calls in a lot of places, until i find where/why that config ends up being zero.

@bart0sh
Copy link

bart0sh commented Mar 7, 2019

@tcurdt

Someone else?

I'd be happy to help you with this. I'm Ed@kubernetes.slack.com

@tcurdt
Copy link
Author

tcurdt commented Mar 26, 2019

Big thanks to @bart0sh. Unfortunately we could no longer reproduce it. Neither with the current version (maybe I should have done a full re-install instead of just a reset) nor with the latest master.
I hate it when bugs just "disappear" but well - closing for now.
Thanks for all the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/UX kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

6 participants