Kube-dns crashed when weave plugin used #3239

cynepco3hahue · 2018-02-05T15:37:51Z

What you expected to happen?

Expect that kube-dsn is up after deployment of weave plugin.

What happened?

E0205 15:05:28.519183       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0205 15:05:28.519486       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0205 15:05:29.018365       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:29.518414       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:30.018394       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:30.518408       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:31.018377       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:31.518468       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:32.018393       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:32.518415       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:33.018413       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:33.518439       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:34.018393       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:34.518386       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:35.018409       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:35.518429       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:36.018444       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:36.518384       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:37.018413       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:37.518474       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:38.018362       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:38.518355       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:39.018387       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:39.518432       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:40.018414       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:40.518397       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:41.018433       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:41.518866       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:42.018433       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:42.518424       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:43.018396       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:43.518485       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:44.018403       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:44.518406       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:45.018414       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:45.518460       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:46.018373       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:46.518308       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:47.018392       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:47.518437       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:48.018438       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:48.518390       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:49.018386       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:49.518444       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:50.018406       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:50.518386       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:51.018364       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:51.518433       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:52.018375       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:52.518384       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:53.018387       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:53.518378       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:54.018393       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:54.518388       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:55.018362       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:55.520461       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:56.018395       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:56.518402       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:57.018370       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:57.518387       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0205 15:05:58.018384       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
F0205 15:05:58.518384       1 dns.go:167] Timeout waiting for initialization

How to reproduce it?

Deploy k8s via kubeadm
# kubeadm init --pod-network-cidr=10.244.0.0/16 --token abcdef.1234567890123456
# export KUBECONFIG=/etc/kubernetes/admin.conf
# kubever=$(kubectl version | base64 | tr -d '\n')
# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

Anything else we need to know?

Versions:

weave version
weaveworks/weave-kube:2.2.0
weaveworks/weave-npc:2.2.0

$ docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-71.git3e8e77d.el7.centos.1.x86_64
 Go version:      go1.8.3
 Git commit:      3e8e77d/1.12.6
 Built:           Tue Jan 30 09:17:00 2018
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-71.git3e8e77d.el7.centos.1.x86_64
 Go version:      go1.8.3
 Git commit:      3e8e77d/1.12.6
 Built:           Tue Jan 30 09:17:00 2018
 OS/Arch:         linux/amd64

$ uname -a
Linux master 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

Logs:

weave_0.log
weave-npc_0.log

Network:

$ ip route
# ip route
default via 192.168.121.1 dev eth0  proto static  metric 100 
10.32.0.0/12 dev weave  proto kernel  scope link  src 10.32.0.1 
169.254.0.0/16 dev eth1  scope link  metric 1003 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 
192.168.121.0/24 dev eth0  proto kernel  scope link  src 192.168.121.52  metric 100 
192.168.200.0/24 dev eth1  proto kernel  scope link  src 192.168.200.2

$ ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: eth0    inet 192.168.121.52/24 brd 192.168.121.255 scope global dynamic eth0\       valid_lft 1442sec preferred_lft 1442sec
3: eth1    inet 192.168.200.2/24 brd 192.168.200.255 scope global eth1\       valid_lft forever preferred_lft forever
4: docker0    inet 172.17.0.1/16 scope global docker0\       valid_lft forever preferred_lft forever
6: weave    inet 10.32.0.1/12 brd 10.47.255.255 scope global weave\       valid_lft forever preferred_lft forever

$ sudo iptables-save
# Generated by iptables-save v1.4.21 on Mon Feb  5 15:31:55 2018
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [2:120]
:POSTROUTING ACCEPT [2:120]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-P2TFMN4YKJKFJKNH - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
:WEAVE - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -j WEAVE
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-P2TFMN4YKJKFJKNH -s 192.168.121.52/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-P2TFMN4YKJKFJKNH -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-P2TFMN4YKJKFJKNH --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 192.168.121.52:6443
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-P2TFMN4YKJKFJKNH --mask 255.255.255.255 --rsource -j KUBE-SEP-P2TFMN4YKJKFJKNH
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-P2TFMN4YKJKFJKNH
-A WEAVE -s 10.32.0.0/12 -d 224.0.0.0/4 -j RETURN
-A WEAVE ! -s 10.32.0.0/12 -d 10.32.0.0/12 -j MASQUERADE
-A WEAVE -s 10.32.0.0/12 ! -d 10.32.0.0/12 -j MASQUERADE
COMMIT
# Completed on Mon Feb  5 15:31:55 2018
# Generated by iptables-save v1.4.21 on Mon Feb  5 15:31:55 2018
*filter
:INPUT ACCEPT [239:39497]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [227:39605]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-SERVICES - [0:0]
:WEAVE-NPC - [0:0]
:WEAVE-NPC-DEFAULT - [0:0]
:WEAVE-NPC-INGRESS - [0:0]
-A INPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -o weave -m comment --comment "NOTE: this must go before \'-j KUBE-FORWARD\'" -j WEAVE-NPC
-A FORWARD -o weave -m state --state NEW -j NFLOG --nflog-group 86
-A FORWARD -o weave -j DROP
-A FORWARD -i weave ! -o weave -j ACCEPT
-A FORWARD -o weave -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m comment --comment "kubernetes forward rules" -j KUBE-FORWARD
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp has no endpoints" -m tcp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A WEAVE-NPC -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -d 224.0.0.0/4 -j ACCEPT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
-A WEAVE-NPC -m set ! --match-set weave-local-pods dst -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-?b%zl9GIe0AET1(QI^7NWe*fO dst -m comment --comment "DefaultAllow isolation for namespace: kube-system" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-0EHD/vdN#O4]V?o4Tx7kS;APH dst -m comment --comment "DefaultAllow isolation for namespace: kube-public" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-E.1.0W^NGSp]0_t5WwH/]gX@L dst -m comment --comment "DefaultAllow isolation for namespace: default" -j ACCEPT
COMMIT

The text was updated successfully, but these errors were encountered:

bboreham · 2018-02-05T16:03:34Z

Does anything work? Can you reach the outside world, or ping one pod from another by its pod IP address?

I'll note you have a lot of these warnings, going on for ~30 minutes:

{"log":"WARN: 2018/02/05 14:59:17.548655 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: fe:57:00:5f:b9:aa, dst: 16:6b:33:12:99:e0} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}\n","stream":"stderr","time":"2018-02-05T14:59:17.549006348Z"}
{"log":"WARN: 2018/02/05 14:59:17.548748 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: fe:57:00:5f:b9:aa, dst: 16:6b:33:12:99:e0} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}\n","stream":"stderr","time":"2018-02-05T14:59:17.549039384Z"}
{"log":"WARN: 2018/02/05 14:59:18.550029 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: fe:57:00:5f:b9:aa, dst: 16:6b:33:12:99:e0} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}\n","stream":"stderr","time":"2018-02-05T14:59:18.550310826Z"}

cynepco3hahue · 2018-02-05T17:18:49Z

@bboreham Thanks for the fast answer

I created two test pods

# kubectl get pods
NAME         READY     STATUS    RESTARTS   AGE
test-pod-1   1/1       Running   0          12m
test-pod-2   1/1       Running   0          12m

ping from pod to another pod IP - PASS
ping from node to pod IP - FAILED

# ping 10.32.0.3
PING 10.32.0.3 (10.32.0.3) 56(84) bytes of data.
From 10.32.0.1 icmp_seq=1 Destination Host Unreachable
From 10.32.0.1 icmp_seq=2 Destination Host Unreachable
From 10.32.0.1 icmp_seq=3 Destination Host Unreachable
From 10.32.0.1 icmp_seq=4 Destination Host Unreachable

ping from pod to internal node IP - FAILED

# ping 192.168.121.236
PING 192.168.121.236 (192.168.121.236) 56(84) bytes of data.
--- 192.168.121.236 ping statistics ---
16 packets transmitted, 0 received, 100% packet loss, time 14999ms

Give me know if you will need additional information.

brb · 2018-02-06T17:20:20Z

@cynepco3hahue You have specified --pod-network-cidr=10.244.0.0/16 which is different from the default of Weave - 10.32.0.0/12.

Mind trying to bootstrap the cluster with kudeadm without passing --pod-network-cidr?

cynepco3hahue · 2018-02-07T07:58:29Z

@brb I tried to run

kubeadm init --pod-network-cidr=10.32.0.0/12 --token abcdef.1234567890123456
kubeadm init --token abcdef.1234567890123456

But I have the same result.

brb · 2018-02-07T10:19:23Z

@cynepco3hahue Thanks for the experiments.

Is it possible to get a ssh access to your machine? I'm martynas at the Weave community Slack.

If not, I'm interested in the following:

Do you have the br_netfilter module compiled in? Is it enabled for iptables (cat /proc/sys/net/bridge/bridge-nf-call-iptables)?
In your previous experiments, is fe:57:00:5f:b9:aa a MAC addr of eth0 belonging to kubedns?
Is your node a VM? If yes, what kind of?
Output of weave report.
Output of dmesg from the node.
Output of ip link from the node.
PCAPs of tcpdump -i weave from the node and tcpdump -i eth0 from the pod when pinging the pod from the node and vice versa.

cynepco3hahue · 2018-02-07T16:05:14Z

@brb It is locally running vagrant machine, so I can not provide access to it, but will try to give much info as I can 😄
0. br_netfilter

# cat /proc/sys/net/bridge/bridge-nf-call-iptables 
1
# cat /proc/sys/net/bridge/bridge-nf-call-ip6tables 
1

I run each time new Vagrant machine so it hard to say for me, will provide new logs and ip addr info

[root@master ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:4d:81:d8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.121.67/24 brd 192.168.121.255 scope global dynamic eth0
       valid_lft 3080sec preferred_lft 3080sec
    inet6 fe80::5054:ff:fe4d:81d8/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:f9:9e:21 brd ff:ff:ff:ff:ff:ff
    inet 192.168.200.2/24 brd 192.168.200.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fef9:9e21/64 scope link 
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 02:42:80:3a:c9:80 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
5: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN 
    link/ether 26:7a:d2:96:ae:87 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::247a:d2ff:fe96:ae87/64 scope link 
       valid_lft forever preferred_lft forever
6: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP 
    link/ether ae:f6:9e:b9:f8:fd brd ff:ff:ff:ff:ff:ff
    inet 10.32.0.1/12 brd 10.47.255.255 scope global weave
       valid_lft forever preferred_lft forever
    inet6 fe80::acf6:9eff:feb9:f8fd/64 scope link 
       valid_lft forever preferred_lft forever
7: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN 
    link/ether 7a:aa:1b:f2:1f:dd brd ff:ff:ff:ff:ff:ff
9: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP 
    link/ether e6:f8:86:00:7d:22 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::e4f8:86ff:fe00:7d22/64 scope link 
       valid_lft forever preferred_lft forever
10: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP 
    link/ether ce:83:63:c7:5e:5f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::cc83:63ff:fec7:5e5f/64 scope link 
       valid_lft forever preferred_lft forever
12: vethweple1854e6@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP 
    link/ether ce:0a:91:ea:a3:b0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::cc0a:91ff:feea:a3b0/64 scope link 
       valid_lft forever preferred_lft forever

weave_0.log
weave-npc_0.log

Yes it vagrant VM
weave_report.log
dmesg.log
# ip link

# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 52:54:00:4d:81:d8 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 52:54:00:f9:9e:21 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT 
    link/ether 02:42:80:3a:c9:80 brd ff:ff:ff:ff:ff:ff
5: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/ether 26:7a:d2:96:ae:87 brd ff:ff:ff:ff:ff:ff
6: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP mode DEFAULT 
    link/ether ae:f6:9e:b9:f8:fd brd ff:ff:ff:ff:ff:ff
7: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT 
    link/ether 7a:aa:1b:f2:1f:dd brd ff:ff:ff:ff:ff:ff
9: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP mode DEFAULT 
    link/ether e6:f8:86:00:7d:22 brd ff:ff:ff:ff:ff:ff
10: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT 
    link/ether ce:83:63:c7:5e:5f brd ff:ff:ff:ff:ff:ff
12: vethweple1854e6@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP mode DEFAULT 
    link/ether ce:0a:91:ea:a3:b0 brd ff:ff:ff:ff:ff:ff link-netnsid 0

I will add tcpdump later, I need to configure environment first

cynepco3hahue · 2018-02-07T16:38:29Z

TCP dumps

ping from node to pod
ping_node_to_pod.tar.gz
ping pod to node
ping_pod_to_node.tar.gz

I do not sure why, but when I used tcpdump ping started work fine and when I stopped tcpdump ping failed again.

brb · 2018-02-08T11:48:55Z

I do not sure why, but when I used tcpdump ping started work fine and when I stopped tcpdump ping failed again.

This suggests that ICMP traffic is enabled after the iface entered the promiscuous mode (due to tcpdump).

Could you verify whether some HTTP traffic is enabled as well by creating a nginx Pod and trying to curl it?

If it fails, then it is a good indicator that you might be suffering from leaking netns (#2842) and could explain why the promisc mode enabled the traffic.

cynepco3hahue · 2018-02-11T13:57:04Z

I create simple nginx with the command
kubectl run my-nginx --image=nginx --replicas=1 --port=80

# kubectl get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP          NODE
my-nginx-9d5677d94-j6kvs   1/1       Running   0          4m        10.32.0.2   master

[root@master ~]# curl http://10.32.0.2:80
curl: (7) Failed connect to 10.32.0.2:80; No route to host
[root@master ~]# curl http://10.32.0.2
curl: (7) Failed connect to 10.32.0.2:80; No route to host

brb · 2018-02-12T10:47:02Z

Thanks. Is it the same when tcpdump is running?

cynepco3hahue · 2018-02-18T08:27:35Z

With tcpdump all works fine

# kubectl get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP          NODE
my-nginx-9d5677d94-d76fx   1/1       Running   0          2m        10.32.0.3   master
[root@master ~]# tcpdump -i weave -w ping &
[1] 18608
[root@master ~]# tcpdump: listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes

[root@master ~]# curl http://10.32.0.3
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

brb · 2018-02-21T11:24:05Z

Thanks. I was expecting curl to fail, but it didn't.

Could you share the VM image or point me to Vagrantfile that I could run and debug myself?

cynepco3hahue · 2018-02-21T11:31:00Z

You can check it via GitHub https://github.com/kubevirt/kubevirt/blob/master/Vagrantfile

dostoevskoy · 2018-02-22T16:02:46Z

Hello,

I've met a similar problem, when I've tried to use a Kubernetes cluster with Weave Net plugin.
I've configured weave network by two ways:

via yaml file and applying this file to kubernetes using kubeclt apply -f command
via weave installation like a service according Weave documentation.

In both of cases I 've gotten a network between containers and containers could see each other and communicate via his weave IPs.
But I've gotten an error from kube-dns service, where kubedns container was not able to reach 10.96.0.1:443 (standart "kubernetes" service for apiserver endpoint). Also any NodePort of any service have not available for me from a host machine.
A NodePort (e.g. 30101) became available only when I've run a "tcpdump -i weave" command.
I've tried to investigate it and found that weave plugin shown a strange behavior and sometimes can't create a route record. Also in ARP table also something was wrong and weave rows was in incomplete status. But, after running tcpdump, these lines has became completed and port has became available.
ARP without tcpdump:

Address                  HWtype  HWaddress           Flags Mask            Iface
10.32.0.3                        (incomplete)                              weave
10.32.0.2                        (incomplete)                              weave

ARP with tcpdump:

Address                  HWtype  HWaddress           Flags Mask            Iface
10.32.0.3                ether   c2:7d:41:6d:33:50   C                     weave
10.32.0.2                ether   2a:bf:4d:95:3a:e5   C                     weave

I've solved the issue by using another network plugin: Flannel.

brb · 2018-02-22T16:08:37Z

Hi @dostoevskoy,

Thanks for the info.

What is your distro, kernel vsn and weave vsn?

dostoevskoy · 2018-02-22T16:12:13Z

Distributive: SLES 12.0
Kernel version: 3.12.61-52.106-default
Weave version: Version: 2.2.0

zhoulouzi · 2018-08-16T04:44:08Z

Any Update?

murali-reddy · 2018-08-16T07:41:45Z

@zhoulouzi What exactly is the issue you are running into? You see similar symptoms with kube-dns crashing?

brb · 2018-08-17T07:26:44Z

My hypothesis is that on old kernels (< v4.0) the promiscuous mode setting of the weave bridge gets reset. However, I haven't had a chance to validate it.

drake7707 · 2018-10-02T09:50:55Z

@brb I just encountered the same issue described above. My kernel is 3.13.0-33-generic #58-Ubuntu.

I'm running kubernetes inside a DinD container so it's easy for me to tear everything down and start from scratch. Here's the dmesg output following after the deployment of weave:

Oct  2 03:44:18 master0 kernel: [939208.830660] br-480f57c2d95b: port 6(veth91e2cd9) entered forwarding state
Oct  2 03:44:26 master0 kernel: [939216.353176] device datapath entered promiscuous mode
Oct  2 03:44:26 master0 kernel: [939216.354305] device vethwedu entered promiscuous mode
Oct  2 03:44:26 master0 kernel: [939216.354601] device vethwedu left promiscuous mode
Oct  2 03:44:26 master0 kernel: [939216.354613] weave: port 1(vethwedu) entered disabled state
Oct  2 03:44:26 master0 kernel: [939216.376595] device vethwe-bridge entered promiscuous mode
Oct  2 03:44:26 master0 kernel: [939216.376843] IPv6: ADDRCONF(NETDEV_UP): vethwe-datapath: link is not ready
Oct  2 03:44:26 master0 kernel: [939216.377151] device vethwe-datapath entered promiscuous mode
Oct  2 03:44:26 master0 kernel: [939216.377277] IPv6: ADDRCONF(NETDEV_CHANGE): vethwe-datapath: link becomes ready
Oct  2 03:44:26 master0 kernel: [939216.409769] IPv6: ADDRCONF(NETDEV_UP): weave: link is not ready
Oct  2 03:44:27 master0 kernel: [939217.375560] weave: port 1(vethwe-bridge) entered forwarding state
Oct  2 03:44:27 master0 kernel: [939217.375590] weave: port 1(vethwe-bridge) entered forwarding state
Oct  2 03:44:27 master0 kernel: [939217.375648] IPv6: ADDRCONF(NETDEV_CHANGE): weave: link becomes ready
Oct  2 03:44:36 master0 kernel: [939226.716896] IPVS: Creating netns size=2048 id=622
Oct  2 03:44:36 master0 kernel: [939226.716903] ip_set: protocol 6
Oct  2 03:44:36 master0 kernel: [939226.719557] IPVS: Creating netns size=2048 id=623
Oct  2 03:44:36 master0 kernel: [939226.719561] ip_set: protocol 6
Oct  2 03:44:37 master0 kernel: [939227.122286] IPVS: Creating netns size=2048 id=624
Oct  2 03:44:37 master0 kernel: [939227.122291] ip_set: protocol 6
Oct  2 03:44:37 master0 kernel: [939227.218483] device vethwepld768548 entered promiscuous mode
Oct  2 03:44:37 master0 kernel: [939227.297445] IPv6: ADDRCONF(NETDEV_UP): vethwepld768548: link is not ready
Oct  2 03:44:37 master0 kernel: [939227.323747] IPv6: ADDRCONF(NETDEV_CHANGE): vethwepld768548: link becomes ready
Oct  2 03:44:37 master0 kernel: [939227.323772] weave: port 2(vethwepld768548) entered forwarding state
Oct  2 03:44:37 master0 kernel: [939227.323785] weave: port 2(vethwepld768548) entered forwarding state
Oct  2 03:44:37 master0 kernel: [939227.414281] device vethwepl61fd29e entered promiscuous mode
Oct  2 03:44:37 master0 kernel: [939227.505344] IPv6: ADDRCONF(NETDEV_UP): vethwepl61fd29e: link is not ready
Oct  2 03:44:37 master0 kernel: [939227.505354] weave: port 3(vethwepl61fd29e) entered forwarding state
Oct  2 03:44:37 master0 kernel: [939227.505361] weave: port 3(vethwepl61fd29e) entered forwarding state
Oct  2 03:44:37 master0 kernel: [939227.523509] IPv6: ADDRCONF(NETDEV_CHANGE): vethwepl61fd29e: link becomes ready
Oct  2 03:44:37 master0 kernel: [939227.868768] device vethwepldd0c839 entered promiscuous mode
Oct  2 03:44:37 master0 kernel: [939227.921777] IPv6: ADDRCONF(NETDEV_UP): vethwepldd0c839: link is not ready
Oct  2 03:44:37 master0 kernel: [939227.921787] weave: port 4(vethwepldd0c839) entered forwarding state
Oct  2 03:44:37 master0 kernel: [939227.921798] weave: port 4(vethwepldd0c839) entered forwarding state
Oct  2 03:44:37 master0 kernel: [939227.944009] IPv6: ADDRCONF(NETDEV_CHANGE): vethwepldd0c839: link becomes ready
Oct  2 03:44:38 master0 kernel: [939228.181736] IPv6: eth0: IPv6 duplicate address fe80::1ca1:3aff:fedf:d782 detected!
Oct  2 03:44:38 master0 kernel: [939228.209851] IPv6: eth0: IPv6 duplicate address fe80::a4b8:71ff:fe41:707a detected!
Oct  2 03:44:38 master0 kernel: [939228.394079] IPv6: eth0: IPv6 duplicate address fe80::981e:6ff:fee4:6d1 detected!
Oct  2 03:44:42 master0 kernel: [939232.396132] weave: port 1(vethwe-bridge) entered forwarding state
Oct  2 03:44:52 master0 kernel: [939242.385756] weave: port 2(vethwepld768548) entered forwarding state
Oct  2 03:44:52 master0 kernel: [939242.545986] weave: port 3(vethwepl61fd29e) entered forwarding state
Oct  2 03:44:52 master0 kernel: [939242.962095] weave: port 4(vethwepldd0c839) entered forwarding state
Oct  2 03:45:11 master0 kernel: [939261.254871] weave: port 2(vethwepld768548) entered disabled state
Oct  2 03:45:11 master0 kernel: [939261.255375] device vethwepld768548 left promiscuous mode
Oct  2 03:45:11 master0 kernel: [939261.255385] weave: port 2(vethwepld768548) entered disabled state

At this point all pods relying on 10.96.0.1:443 to access the kubernetes api will fail (such as the kubernetes dashboard, CoreDNS or kube-dns)

The actual device 'weave' is not set in promisc mode, if I then run:

ip link set dev weave promisc on

I get the following in dmesg

Oct 2 03:49:13 master0 kernel: [939503.961032] device weave entered promiscuous mode

and the 10.96.0.1:443 endpoint becomes accessible from the pods (as well as any ips assigned to the host through ping)

brb · 2018-10-08T14:04:09Z

@drake7707 Thanks for the info.

brb · 2018-10-10T11:25:47Z

@drake7707 Can you reliable reproduce the issue? Do you use https://github.com/kubernetes-sigs/kubeadm-dind-cluster? Which version of k8s and Weave Net?

drake7707 · 2018-10-10T13:25:31Z

@brb Yes I encountered it each time I set it up. By now I'm using a heavily altered fork of kubeadm-dind-cluster with a lot of things tacked on but I think with the original script it should occur as well.

I noticed that other hosts I provisioned were of varying kernels versions. Those with 4.4.0-34-generic #53-Ubuntu did not have this issue, so your hypothesis from before seems correct. I originally thought it was only the kubernetes master that had this issue and not the worker nodes but both workers were running 4.4.0 (in the original kubeadm-dind-cluster script both master and workers are spawned on the same host, in my modified version I can put the master on one host and worker nodes on other hosts).

Kubernetes version: v1.11.0
Weave version: 2.4.1

brb · 2018-10-16T13:54:51Z

@drake7707 Did you just enable the promiscuous mode for the weave bridge on each node or did you enable for any other interface as well to make it work?

drake7707 · 2018-10-16T15:41:06Z

@brb Nope, just the master and only the weave bridge (though the master is the one that had an older Linux kernel). Technically I didn't even need any worker nodes. When just deploying the master I tried both CoreDNS and Kube-DNS pods and both still failed to connect to the kubernetes API (10.96.0.1). As soon as I enabled promisc on of the weave interface inside the DinD container they could connect.

bboreham · 2018-11-01T21:10:21Z

Fixed by #3442

brb · 2018-11-02T13:07:33Z

Thanks @drake7707 for all the info. It was very helpful to diagnose and fix the problem.

cynepco3hahue mentioned this issue Feb 5, 2018

Change default network provider kubevirt/kubevirt#710

Merged

brb added the state/investigating label Feb 7, 2018

brb mentioned this issue Mar 21, 2018

No internet connection from any pod #3262

Closed

brb mentioned this issue Nov 1, 2018

net: Do not set bridge hw addr when creating it #3442

Merged

bboreham closed this as completed Nov 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kube-dns crashed when weave plugin used #3239

Kube-dns crashed when weave plugin used #3239

cynepco3hahue commented Feb 5, 2018

bboreham commented Feb 5, 2018

cynepco3hahue commented Feb 5, 2018

brb commented Feb 6, 2018

cynepco3hahue commented Feb 7, 2018 •

edited

Loading

brb commented Feb 7, 2018 •

edited

Loading

cynepco3hahue commented Feb 7, 2018

cynepco3hahue commented Feb 7, 2018

brb commented Feb 8, 2018

cynepco3hahue commented Feb 11, 2018

brb commented Feb 12, 2018

cynepco3hahue commented Feb 18, 2018

brb commented Feb 21, 2018

cynepco3hahue commented Feb 21, 2018

dostoevskoy commented Feb 22, 2018

brb commented Feb 22, 2018

dostoevskoy commented Feb 22, 2018

zhoulouzi commented Aug 16, 2018

murali-reddy commented Aug 16, 2018 •

edited

Loading

brb commented Aug 17, 2018

drake7707 commented Oct 2, 2018

brb commented Oct 8, 2018

brb commented Oct 10, 2018

drake7707 commented Oct 10, 2018

brb commented Oct 16, 2018

drake7707 commented Oct 16, 2018

bboreham commented Nov 1, 2018

brb commented Nov 2, 2018

Kube-dns crashed when weave plugin used #3239

Kube-dns crashed when weave plugin used #3239

Comments

cynepco3hahue commented Feb 5, 2018

What you expected to happen?

What happened?

How to reproduce it?

Anything else we need to know?

Versions:

Logs:

Network:

bboreham commented Feb 5, 2018

cynepco3hahue commented Feb 5, 2018

brb commented Feb 6, 2018

cynepco3hahue commented Feb 7, 2018 • edited Loading

brb commented Feb 7, 2018 • edited Loading

cynepco3hahue commented Feb 7, 2018

cynepco3hahue commented Feb 7, 2018

brb commented Feb 8, 2018

cynepco3hahue commented Feb 11, 2018

brb commented Feb 12, 2018

cynepco3hahue commented Feb 18, 2018

brb commented Feb 21, 2018

cynepco3hahue commented Feb 21, 2018

dostoevskoy commented Feb 22, 2018

brb commented Feb 22, 2018

dostoevskoy commented Feb 22, 2018

zhoulouzi commented Aug 16, 2018

murali-reddy commented Aug 16, 2018 • edited Loading

brb commented Aug 17, 2018

drake7707 commented Oct 2, 2018

brb commented Oct 8, 2018

brb commented Oct 10, 2018

drake7707 commented Oct 10, 2018

brb commented Oct 16, 2018

drake7707 commented Oct 16, 2018

bboreham commented Nov 1, 2018

brb commented Nov 2, 2018

cynepco3hahue commented Feb 7, 2018 •

edited

Loading

brb commented Feb 7, 2018 •

edited

Loading

murali-reddy commented Aug 16, 2018 •

edited

Loading