-
Notifications
You must be signed in to change notification settings - Fork 678
Kube-dns crashed when weave plugin used #3239
Comments
Does anything work? Can you reach the outside world, or ping one pod from another by its pod IP address? I'll note you have a lot of these warnings, going on for ~30 minutes:
|
@bboreham Thanks for the fast answer I created two test pods
Give me know if you will need additional information. |
@cynepco3hahue You have specified Mind trying to bootstrap the cluster with kudeadm without passing |
@brb I tried to run
But I have the same result. |
@cynepco3hahue Thanks for the experiments. Is it possible to get a ssh access to your machine? I'm If not, I'm interested in the following:
|
@brb It is locally running
|
TCP dumps
I do not sure why, but when I used |
This suggests that ICMP traffic is enabled after the iface entered the promiscuous mode (due to tcpdump). Could you verify whether some HTTP traffic is enabled as well by creating a nginx Pod and trying to curl it? If it fails, then it is a good indicator that you might be suffering from leaking netns (#2842) and could explain why the promisc mode enabled the traffic. |
I create simple
[root@master ~]# curl http://10.32.0.2:80 |
Thanks. Is it the same when tcpdump is running? |
With tcpdump all works fine
|
Thanks. I was expecting curl to fail, but it didn't. Could you share the VM image or point me to Vagrantfile that I could run and debug myself? |
You can check it via GitHub https://github.com/kubevirt/kubevirt/blob/master/Vagrantfile |
Hello, I've met a similar problem, when I've tried to use a Kubernetes cluster with Weave Net plugin.
In both of cases I 've gotten a network between containers and containers could see each other and communicate via his weave IPs.
ARP with tcpdump:
I've solved the issue by using another network plugin: Flannel. |
Hi @dostoevskoy, Thanks for the info. What is your distro, kernel vsn and weave vsn? |
Distributive: SLES 12.0 |
Any Update? |
@zhoulouzi What exactly is the issue you are running into? You see similar symptoms with kube-dns crashing? |
My hypothesis is that on old kernels (< v4.0) the promiscuous mode setting of the weave bridge gets reset. However, I haven't had a chance to validate it. |
@brb I just encountered the same issue described above. My kernel is I'm running kubernetes inside a DinD container so it's easy for me to tear everything down and start from scratch. Here's the dmesg output following after the deployment of weave:
At this point all pods relying on 10.96.0.1:443 to access the kubernetes api will fail (such as the kubernetes dashboard, CoreDNS or kube-dns) The actual device 'weave' is not set in promisc mode, if I then run:
I get the following in dmesg
and the 10.96.0.1:443 endpoint becomes accessible from the pods (as well as any ips assigned to the host through ping) |
@drake7707 Thanks for the info. |
@drake7707 Can you reliable reproduce the issue? Do you use https://github.com/kubernetes-sigs/kubeadm-dind-cluster? Which version of k8s and Weave Net? |
@brb Yes I encountered it each time I set it up. By now I'm using a heavily altered fork of kubeadm-dind-cluster with a lot of things tacked on but I think with the original script it should occur as well. I noticed that other hosts I provisioned were of varying kernels versions. Those with Kubernetes version: v1.11.0 |
@drake7707 Did you just enable the promiscuous mode for the |
@brb Nope, just the master and only the weave bridge (though the master is the one that had an older Linux kernel). Technically I didn't even need any worker nodes. When just deploying the master I tried both CoreDNS and Kube-DNS pods and both still failed to connect to the kubernetes API (10.96.0.1). As soon as I enabled promisc on of the weave interface inside the DinD container they could connect. |
Fixed by #3442 |
Thanks @drake7707 for all the info. It was very helpful to diagnose and fix the problem. |
What you expected to happen?
Expect that kube-dsn is up after deployment of weave plugin.
What happened?
How to reproduce it?
Deploy k8s via
kubeadm
# kubeadm init --pod-network-cidr=10.244.0.0/16 --token abcdef.1234567890123456
# export KUBECONFIG=/etc/kubernetes/admin.conf
# kubever=$(kubectl version | base64 | tr -d '\n')
# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"
Anything else we need to know?
Versions:
Logs:
weave_0.log
weave-npc_0.log
Network:
The text was updated successfully, but these errors were encountered: