Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave worker nodes in two different subnets - strange behavior #3297

Closed
paphillon opened this issue May 10, 2018 · 7 comments
Closed

Weave worker nodes in two different subnets - strange behavior #3297

paphillon opened this issue May 10, 2018 · 7 comments

Comments

@paphillon
Copy link

paphillon commented May 10, 2018

** Is this a REQUEST FOR HELP? **
Yes.

In my kubernetes setup, there are two sets of worker nodes, one in 10.51.0.0 subnet and other are in 10.50.0.0 subnet. There is no firewall between these nodes and ping, including weave net ports tcp/udp are checked out to be ok.

Service IP: 172.32.0.0/24
Cluster IP: 172.200.0.0/24

What happened?

Here is the strange behavior. When I ping a pod running on 10.51.0.x host with the pod IP say (172.200.0.25) from the 10.50.0.0 host, it results in the destination unreachable. I can ping without problems from the same subnet host 10.51.0.x. However, when I curl the service IP, 172.32.0.1 it's reachable.

To add to this, when I do all of the above from a container running on 10.50.0.x, the behavior is reversed. I cannot reach to service IP but can reach to pod IP.

What you expected to happen?

Expected that both pod/service IP to be reachable from the container as well hosts

Anything else we need to know?

On-prem installation, the only difference I see is the kernel version in one set of worker node (10.50.x.x)

Versions:

$ weave version
2.3.0, tried with 2.20 still the same.
$ docker version
Docker version 18.03.1-ce, build 9ee9f40
$ uname -a
10.50.0.x pool - 
Linux worker-x.my.k8s.com 3.10.0-327.4.4.el7.x86_64 #1 SMP Thu Dec 17 15:51:24 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
10.51.0.x pool - 
Linux worker-z.my.k8s.com 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
1.10.0

Logs:

$ kubectl logs -n kube-system weave


## Network:

$ ip route

default via 10.50.212.1 dev eno16777984
10.50.212.0/22 dev eno16777984 proto kernel scope link src 10.50.212.140
169.254.0.0/16 dev eno16777984 scope link metric 1002
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.200.0.0/24 dev weave proto kernel scope link src 172.200.0.40
From 2nd pool
default via 10.51.32.1 dev ens160
10.51.32.0/22 dev ens160 proto kernel scope link src 10.51.35.10
169.254.0.0/16 dev ens160 scope link metric 1002
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.200.0.0/24 dev weave proto kernel scope link src 172.200.0.160

$ ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eno16777984 inet 10.50.212.140/22 brd 10.50.215.255 scope global eno16777984\ valid_lft forever preferred_lft forever
3: docker0 inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\ valid_lft forever preferred_lft forever
7: weave inet 172.200.0.40/24 brd 172.200.0.255 scope global weave\ valid_lft forever preferred_lft forever

1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: ens160 inet 10.51.35.10/22 brd 10.51.35.255 scope global ens160\ valid_lft forever preferred_lft forever
3: docker0 inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\ valid_lft forever preferred_lft forever
6: weave inet 172.200.0.160/24 brd 172.200.0.255 scope global weave\ valid_lft forever preferred_lft forever

$ Weave connection status
-> 10.50.212.142:6783 established fastdp 7a:cc:70:08:75:f4(a.corp.com) mtu=1337
<- 10.50.212.141:50919 established fastdp ba:bb:dd:d8:ae:0e(b.corp.com) mtu=1337
<- 10.51.35.10:55966 established fastdp 7a:6e:bb:9b:50:de(c.corp.com) mtu=1337
-> 10.51.35.6:6783 established fastdp 4e:4c:4b:cb:d6:ff(e.corp.com) mtu=1337
<- 10.50.212.140:37708 established fastdp 82:b1:3f:68:58:96(f.corp.com) mtu=1337
-> 10.51.35.8:6783 established fastdp 36:26:8b:a2:44:bd(g.corp.com) mtu=1337
<- 10.51.35.166:59164 established fastdp 2a:f8:1d:6f:b6:ce(h.corp.com) mtu=1337
<- 10.51.35.11:45225 established fastdp a6:7d:e7:8b:e3:20(i.corp.com) mtu=1337
-> 10.51.35.5:6783 failed cannot connect to ourself, retry: never

$ sudo iptables-save
Did not really find any major difference.

@brb
Copy link
Contributor

brb commented May 14, 2018

@paphillon Can you please provide weave logs (kubectl logs -n kube-system weave)? Or is it empty?

@paphillon
Copy link
Author

@brb
Log from 10.51.x.x node --

Ignore the error if "br_netfilter" is built-in in the kernel
DEBU: 2018/05/10 22:44:24.901326 [kube-peers] Checking peer "82:b1:3f:68:58:96" against list &{[{46:0e:40:03:33:10 or1010051035005.mycorp.com} {4e:4c:4b:cb:d6:ff or1010051035006.mycorp.com} {36:26:8b:a2:44:bd or1010051035008.mycorp.com} {7a:6e:bb:9b:50:de or1010051035010.mycorp.com} {a6:7d:e7:8b:e3:20 or1010051035011.mycorp.com} {2a:f8:1d:6f:b6:ce or1010051035166.mycorp.com} {82:b1:3f:68:58:96 or1010050212140.mycorp.com} {ba:bb:dd:d8:ae:0e or1010050212141.mycorp.com} {7a:cc:70:08:75:f4 or1010050212142.mycorp.com}]}
INFO: 2018/05/10 22:44:25.195496 Command line options: map[docker-api: mtu:1337 conn-limit:30 db-prefix:/weavedb/weave-net nickname:or1010050212140.mycorp.com datapath:datapath expect-npc:true ipalloc-init:consensus=9 name:82:b1:3f:68:58:96 no-dns:true port:6783 host-root:/host http-addr:127.0.0.1:6784 ipalloc-range:172.200.0.0/24 metrics-addr:0.0.0.0:6782]
INFO: 2018/05/10 22:44:25.195586 weave  2.3.0
INFO: 2018/05/10 22:44:26.223100 Bridge type is bridged_fastdp
INFO: 2018/05/10 22:44:26.223122 Communication between peers is unencrypted.
WARN: 2018/05/10 22:44:26.223741 Address already in use creating vxlan vport 6784 - retrying
INFO: 2018/05/10 22:44:26.283840 Our name is 82:b1:3f:68:58:96(or1010050212140.mycorp.com)
INFO: 2018/05/10 22:44:26.283920 Launch detected - using supplied peer list: [10.50.212.140 10.50.212.141 10.50.212.142 10.51.35.5 10.51.35.6 10.51.35.8 10.51.35.10 10.51.35.11 10.51.35.166]
INFO: 2018/05/10 22:44:26.283958 Checking for pre-existing addresses on weave bridge
INFO: 2018/05/10 22:44:26.284183 weave bridge has address 172.200.0.40/24
INFO: 2018/05/10 22:44:26.483573 [allocator 82:b1:3f:68:58:96] Initialising with persisted data
INFO: 2018/05/10 22:44:26.483625 Sniffing traffic on datapath (via ODP)
INFO: 2018/05/10 22:44:26.484089 ->[10.51.35.6:6783] attempting connection
INFO: 2018/05/10 22:44:26.484196 ->[10.51.35.5:6783] attempting connection
INFO: 2018/05/10 22:44:26.484246 ->[10.51.35.10:6783] attempting connection
INFO: 2018/05/10 22:44:26.484295 ->[10.50.212.142:6783] attempting connection
INFO: 2018/05/10 22:44:26.484361 ->[10.50.212.140:6783] attempting connection
INFO: 2018/05/10 22:44:26.484447 ->[10.51.35.166:6783] attempting connection
INFO: 2018/05/10 22:44:26.484550 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/10 22:44:26.484600 ->[10.51.35.8:6783] attempting connection
INFO: 2018/05/10 22:44:26.484645 ->[10.50.212.141:6783] attempting connection
INFO: 2018/05/10 22:44:26.485021 ->[10.50.212.140:47764] connection accepted
INFO: 2018/05/10 22:44:26.485455 ->[10.50.212.140:47764|82:b1:3f:68:58:96(or1010050212140.mycorp.com)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/05/10 22:44:26.485705 ->[10.50.212.140:6783|82:b1:3f:68:58:96(or1010050212140.mycorp.com)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/05/10 22:44:26.485932 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(or1010050212142.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.486012 overlay_switch ->[7a:cc:70:08:75:f4(or1010050212142.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.486045 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(or1010050212142.mycorp.com)]: connection added (new peer)
INFO: 2018/05/10 22:44:26.586869 ->[10.51.35.11:60388] connection accepted
INFO: 2018/05/10 22:44:26.587249 ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(or1010051035006.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.587346 overlay_switch ->[4e:4c:4b:cb:d6:ff(or1010051035006.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.587401 ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(or1010051035006.mycorp.com)]: connection added (new peer)
INFO: 2018/05/10 22:44:26.588342 ->[10.51.35.5:6783|46:0e:40:03:33:10(or1010051035005.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.588433 overlay_switch ->[46:0e:40:03:33:10(or1010051035005.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.588566 ->[10.51.35.5:6783|46:0e:40:03:33:10(or1010051035005.mycorp.com)]: connection added (new peer)
INFO: 2018/05/10 22:44:26.588812 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.588908 overlay_switch ->[7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.588963 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection added (new peer)
INFO: 2018/05/10 22:44:26.589760 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:26.589899 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(or1010050212142.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.590934 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(or1010050212141.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.591023 overlay_switch ->[ba:bb:dd:d8:ae:0e(or1010050212141.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.591100 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(or1010050212141.mycorp.com)]: connection added (new peer)
INFO: 2018/05/10 22:44:26.683462 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/05/10 22:44:26.683603 Listening for metrics requests on 0.0.0.0:6782
INFO: 2018/05/10 22:44:26.683856 overlay_switch ->[7a:cc:70:08:75:f4(or1010050212142.mycorp.com)] using sleeve
INFO: 2018/05/10 22:44:26.683920 sleeve ->[10.50.212.142:6783|7a:cc:70:08:75:f4(or1010050212142.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:26.684614 ->[10.51.35.11:60388|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.684650 overlay_switch ->[a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.684665 ->[10.51.35.11:60388|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection added (new peer)
INFO: 2018/05/10 22:44:26.685448 ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.686237 overlay_switch ->[2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.687965 ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)]: connection added
INFO: 2018/05/10 22:44:26.782707 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:26.782909 overlay_switch ->[7a:cc:70:08:75:f4(or1010050212142.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.783799 overlay_switch ->[ba:bb:dd:d8:ae:0e(or1010050212141.mycorp.com)] using sleeve
INFO: 2018/05/10 22:44:26.783831 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(or1010050212141.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.783962 overlay_switch ->[ba:bb:dd:d8:ae:0e(or1010050212141.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.784156 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.784306 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:26.785648 ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(or1010051035006.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.786017 ->[10.51.35.5:6783|46:0e:40:03:33:10(or1010051035005.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.887949 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:26.888170 sleeve ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(or1010050212141.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:26.888204 ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.888284 overlay_switch ->[a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.888307 ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection shutting down due to error: Multiple connections to a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com) added to 82:b1:3f:68:58:96(or1010050212140.mycorp.com)
INFO: 2018/05/10 22:44:26.888317 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:26.888355 ->[10.51.35.8:6783|36:26:8b:a2:44:bd(or1010051035008.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:44:26.888382 overlay_switch ->[36:26:8b:a2:44:bd(or1010051035008.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.888392 ->[10.51.35.8:6783|36:26:8b:a2:44:bd(or1010051035008.mycorp.com)]: connection added
INFO: 2018/05/10 22:44:26.889606 sleeve ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:26.890694 sleeve ->[10.51.35.5:6783|46:0e:40:03:33:10(or1010051035005.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:26.891072 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:26.891130 overlay_switch ->[36:26:8b:a2:44:bd(or1010051035008.mycorp.com)] using sleeve
INFO: 2018/05/10 22:44:26.891151 ->[10.51.35.8:6783|36:26:8b:a2:44:bd(or1010051035008.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.891189 overlay_switch ->[36:26:8b:a2:44:bd(or1010051035008.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.891813 sleeve ->[10.51.35.8:6783|36:26:8b:a2:44:bd(or1010051035008.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:26.891936 sleeve ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(or1010051035006.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:26.892058 ->[10.51.35.11:60388|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.982757 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:26.982841 overlay_switch ->[2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)] using sleeve
INFO: 2018/05/10 22:44:26.982860 overlay_switch ->[2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)] using fastdp
INFO: 2018/05/10 22:44:26.983386 ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:44:26.983412 sleeve ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:27.285051 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:44:27.285710 sleeve ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:44:28.686751 [kube-peers] Added myself to peer list &{[{46:0e:40:03:33:10 or1010051035005.mycorp.com} {4e:4c:4b:cb:d6:ff or1010051035006.mycorp.com} {36:26:8b:a2:44:bd or1010051035008.mycorp.com} {7a:6e:bb:9b:50:de or1010051035010.mycorp.com} {a6:7d:e7:8b:e3:20 or1010051035011.mycorp.com} {2a:f8:1d:6f:b6:ce or1010051035166.mycorp.com} {82:b1:3f:68:58:96 or1010050212140.mycorp.com} {ba:bb:dd:d8:ae:0e or1010050212141.mycorp.com} {7a:cc:70:08:75:f4 or1010050212142.mycorp.com}]}
DEBU: 2018/05/10 22:44:28.689170 [kube-peers] Nodes that have disappeared: map[]
172.200.0.40
INFO: 2018/05/10 22:45:58.843513 Discovered remote MAC 06:95:78:c2:0e:70 at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/10 22:46:16.467300 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection shutting down due to error: read tcp4 10.50.212.140:56059->10.51.35.10:6783: read: connection reset by peer
INFO: 2018/05/10 22:46:16.467373 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection deleted
INFO: 2018/05/10 22:46:16.467751 ->[10.51.35.10:6783] attempting connection
INFO: 2018/05/10 22:46:16.468595 ->[10.51.35.10:6783] error during connection attempt: dial tcp4 :0->10.51.35.10:6783: connect: connection refused
INFO: 2018/05/10 22:46:16.469814 Removed unreachable peer 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/10 22:46:19.047892 ->[10.51.35.10:6783] attempting connection
INFO: 2018/05/10 22:46:19.048650 ->[10.51.35.10:6783] error during connection attempt: dial tcp4 :0->10.51.35.10:6783: connect: connection refused
INFO: 2018/05/10 22:46:21.167393 ->[10.51.35.10:43175] connection accepted
INFO: 2018/05/10 22:46:21.169106 ->[10.51.35.10:43175|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/10 22:46:21.169182 overlay_switch ->[7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)] using fastdp
INFO: 2018/05/10 22:46:21.169202 ->[10.51.35.10:43175|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection added (new peer)
INFO: 2018/05/10 22:46:21.266675 ->[10.51.35.10:43175|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: connection fully established
INFO: 2018/05/10 22:46:21.270946 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/10 22:46:21.465122 sleeve ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/10 22:46:22.967885 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/10 22:58:25.742985 Discovered remote MAC 02:18:80:0f:c2:45 at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/10 22:58:32.309617 Discovered remote MAC 46:0e:40:03:33:10 at 46:0e:40:03:33:10(or1010051035005.mycorp.com)
INFO: 2018/05/10 23:40:06.203769 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/10 23:51:31.024366 Discovered remote MAC a6:7d:e7:8b:e3:20 at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 01:38:49.413237 Discovered remote MAC 02:18:80:0f:c2:45 at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 01:45:34.491479 Discovered remote MAC 1e:a1:37:f2:d9:86 at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 01:45:34.495139 Discovered remote MAC be:e8:b4:d1:d1:14 at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 01:45:37.572435 Discovered remote MAC 2e:ec:eb:37:e3:fa at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 01:46:18.618747 Discovered remote MAC 46:0e:40:03:33:10 at 46:0e:40:03:33:10(or1010051035005.mycorp.com)
INFO: 2018/05/11 06:39:47.101891 Discovered remote MAC 2a:f8:1d:6f:b6:ce at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 06:43:15.675622 Discovered remote MAC 5a:76:8d:9e:9d:1a at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 06:43:26.285930 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 06:58:29.739473 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 07:00:07.995905 Discovered remote MAC 5a:76:8d:9e:9d:1a at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 07:12:10.343295 Discovered remote MAC fa:80:3a:77:80:bf at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 07:14:31.724311 Discovered remote MAC a6:7d:e7:8b:e3:20 at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 07:23:02.844704 Discovered remote MAC ca:47:9d:ca:f9:90 at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 18:01:49.859375 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 18:02:05.182549 Discovered remote MAC 4e:4c:4b:cb:d6:ff at 4e:4c:4b:cb:d6:ff(or1010051035006.mycorp.com)
INFO: 2018/05/11 18:02:26.686043 ->[10.51.35.11:60388|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection shutting down due to error: read tcp4 10.50.212.140:6783->10.51.35.11:60388: i/o timeout
INFO: 2018/05/11 18:02:26.686133 ->[10.51.35.11:60388|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection deleted
INFO: 2018/05/11 18:02:26.686161 Removed unreachable peer a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 18:02:26.686362 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:04:27.451694 Discovered remote MAC 06:95:78:c2:0e:70 at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 18:04:34.006660 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:04:38.406273 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:06:34.454662 Discovered remote MAC a2:e1:38:f7:b5:7b at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 18:06:34.894806 Discovered remote MAC 76:17:1e:4c:61:5a at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 18:06:35.075252 Discovered remote MAC ae:23:74:87:ba:ee at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 18:06:35.376587 Discovered remote MAC 36:cc:f7:75:40:6e at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 18:06:35.791252 Discovered remote MAC f2:3d:f4:45:36:a2 at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 18:06:37.238673 Discovered remote MAC 2a:f8:1d:6f:b6:ce at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 18:06:45.590634 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:06:46.659575 Discovered remote MAC 02:18:80:0f:c2:45 at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)
INFO: 2018/05/11 18:06:49.371881 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:08:56.662633 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:09:03.016158 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:11:10.294613 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:11:18.564936 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:13:25.846643 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:13:37.723278 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:15:44.982650 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:16:17.780304 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:18:24.982634 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:19:07.307242 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:21:14.582653 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:21:47.436415 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:23:54.710615 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:24:54.736500 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:27:01.974642 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:29:18.201672 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:31:25.526612 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:33:44.994722 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:35:52.278623 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:41:35.051505 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:43:42.294634 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:50:40.101991 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 18:52:47.318635 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 18:59:50.820879 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 19:01:58.102640 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 19:05:53.591146 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 19:08:00.854723 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 19:15:56.733393 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 19:18:03.990627 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 19:24:50.459023 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 19:26:57.750649 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 19:35:55.924016 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 19:38:03.222619 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 19:41:55.207700 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 19:44:02.390607 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection timed out
INFO: 2018/05/11 19:51:28.730313 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/11 19:52:31.831238 ->[10.51.35.11:6783] error during connection attempt: dial tcp4 :0->10.51.35.11:6783: connect: connection refused
INFO: 2018/05/11 19:52:56.042488 ->[10.51.35.11:51350] connection accepted
INFO: 2018/05/11 19:52:56.137390 ->[10.51.35.11:51350|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection ready; using protocol version 2
INFO: 2018/05/11 19:52:56.137484 overlay_switch ->[a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)] using fastdp
INFO: 2018/05/11 19:52:56.137509 ->[10.51.35.11:51350|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection added (new peer)
INFO: 2018/05/11 19:52:56.340226 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/11 19:52:56.340247 overlay_switch ->[a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)] using sleeve
INFO: 2018/05/11 19:52:56.340270 ->[10.51.35.11:51350|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: connection fully established
INFO: 2018/05/11 19:52:56.439695 sleeve ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)]: Effective MTU verified at 1438
INFO: 2018/05/11 19:52:56.539289 Discovered remote MAC 3e:12:3f:d1:19:59 at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 19:52:56.639636 overlay_switch ->[a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)] using fastdp
INFO: 2018/05/11 19:52:56.740769 Discovered remote MAC 42:8c:c2:eb:37:82 at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 19:52:57.137867 Discovered remote MAC a6:7d:e7:8b:e3:20 at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 19:58:07.765166 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(or1010051035010.mycorp.com)
INFO: 2018/05/11 20:21:31.966285 Discovered remote MAC 46:0e:40:03:33:10 at 46:0e:40:03:33:10(or1010051035005.mycorp.com)
INFO: 2018/05/11 20:54:36.468443 Discovered remote MAC 46:0e:40:03:33:10 at 46:0e:40:03:33:10(or1010051035005.mycorp.com)
INFO: 2018/05/11 21:00:06.120998 Discovered remote MAC 3e:2d:92:bf:1f:77 at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 21:00:15.966547 Discovered remote MAC 82:38:c6:10:4e:3a at a6:7d:e7:8b:e3:20(or1010051035011.mycorp.com)
INFO: 2018/05/11 21:05:54.967285 Discovered remote MAC 2a:f8:1d:6f:b6:ce at 2a:f8:1d:6f:b6:ce(or1010051035166.mycorp.com)

@bboreham
Copy link
Contributor

I find it hard to imagine what could be going wrong - all your nodes seem to be connected.
Your description is a bit abstract - could you give a transcript of exactly the commands you tried and the responses, so they can be matched up to the details from weave status and the logs.

@paphillon
Copy link
Author

Here is the test I did.
Pod running with an IP address 172.200.0.198 on the machine 10.51.0.0/22 subnet. Ping from any of the worker nodes in the same subnet. It works.

[root@host1010051035005 ~]$ ping 172.200.0.198
PING 172.200.0.198 (172.200.0.198) 56(84) bytes of data.
64 bytes from 172.200.0.198: icmp_seq=1 ttl=64 time=1.85 ms
64 bytes from 172.200.0.198: icmp_seq=2 ttl=64 time=0.344 ms

Access pods by service IP using curl - Successful.


Ping the same pod from host in 10.50.0.0/22 subnet, it fails.

[root@host1010050212140 ~]# ping 172.200.0.198
PING 172.200.0.198 (172.200.0.198) 56(84) bytes of data.
From 172.200.0.40 icmp_seq=1 Destination Host Unreachable
From 172.200.0.40 icmp_seq=2 Destination Host Unreachable
From 172.200.0.40 icmp_seq=3 Destination Host Unreachable
From 172.200.0.40 icmp_seq=4 Destination Host Unreachable

Access pods by service IP using curl - Successful

Now If I do the same from within a pod running on host1010050212140 (10.50.32.0/22) the ping is successful, however, the service ips are not reachable.

In addition, yesterday when I restarted kubelet on one of the 10.50.x.x nodes, not sure if this could be the cause of it, but I did check there is no other service running on port 6784.

May 23 05:09:04 host1010050212140 kernel: vxlan: Cannot bind port 58762, err=-98
May 23 05:09:04 host1010050212140 kernel: device vethwedu entered promiscuous mode
May 23 05:09:04 host1010050212140 kernel: device vethwedu left promiscuous mode
May 23 05:09:04 host1010050212140 kernel: weave: port 2(vethwedu) entered disabled state
May 23 05:09:04 host1010050212140 kernel: vxlan: Cannot bind port 6784, err=-98

@bboreham
Copy link
Contributor

Can you show the logs from the weave container on the host where that Cannot bind port 6784 came from?

Also run ip route and weave status connections at that point.

@paphillon
Copy link
Author

weave status connections

<- 10.50.212.141:33002   established fastdp ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com) mtu=1337
<- 10.51.35.166:59164    established fastdp 2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com) mtu=1337
<- 10.50.212.142:59140   established fastdp 7a:cc:70:08:75:f4(host1010050212142.corp.x.com) mtu=1337
<- 10.51.35.10:55966     established fastdp 7a:6e:bb:9b:50:de(host1010051035010.corp.x.com) mtu=1337
-> 10.51.35.6:6783       established fastdp 4e:4c:4b:cb:d6:ff(host1010051035006.corp.x.com) mtu=1337
<- 10.50.212.140:41096   established fastdp 82:b1:3f:68:58:96(host1010050212140.corp.x.com) mtu=1337
-> 10.51.35.8:6783       established fastdp 36:26:8b:a2:44:bd(host1010051035008.corp.x.com) mtu=1337
<- 10.51.35.11:56050     established fastdp a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com) mtu=1337
-> 10.51.35.5:6783       failed      cannot connect to ourself, retry: never

ip route

default via 10.50.212.1 dev eno16777984
10.50.212.0/22 dev eno16777984  proto kernel  scope link  src 10.50.212.140
169.254.0.0/16 dev eno16777984  scope link  metric 1002
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
172.200.0.0/24 dev weave  proto kernel  scope link  src 172.200.0.40

Weave logs

modprobe: module br_netfilter not found in modules.dep
Ignore the error if "br_netfilter" is built-in in the kernel
DEBU: 2018/05/23 05:09:04.501830 [kube-peers] Checking peer "82:b1:3f:68:58:96" against list &{[{46:0e:40:03:33:10 host1010051035005.corp.x.com} {4e:4c:4b:cb:d6:ff host1010051035006.corp.x.com} {36:26:8b:a2:44:bd host1010051035008.corp.x.com} {7a:6e:bb:9b:50:de host1010051035010.corp.x.com} {a6:7d:e7:8b:e3:20 host1010051035011.corp.x.com} {2a:f8:1d:6f:b6:ce host1010051035166.corp.x.com} {82:b1:3f:68:58:96 host1010050212140.corp.x.com} {ba:bb:dd:d8:ae:0e host1010050212141.corp.x.com} {7a:cc:70:08:75:f4 host1010050212142.corp.x.com}]}
INFO: 2018/05/23 05:09:04.693635 Command line options: map[name:82:b1:3f:68:58:96 no-dns:true db-prefix:/weavedb/weave-net docker-api: host-root:/host metrics-addr:0.0.0.0:6782 mtu:1337 expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=9 ipalloc-range:172.200.0.0/24 conn-limit:30 datapath:datapath nickname:host1010050212140.corp.x.com port:6783]
INFO: 2018/05/23 05:09:04.693690 weave  2.3.0
INFO: 2018/05/23 05:09:04.977377 Bridge type is bridged_fastdp
INFO: 2018/05/23 05:09:04.977395 Communication between peers is unencrypted.
WARN: 2018/05/23 05:09:04.977743 Address already in use creating vxlan vport 6784 - retrying
INFO: 2018/05/23 05:09:05.026157 Our name is 82:b1:3f:68:58:96(host1010050212140.corp.x.com)
INFO: 2018/05/23 05:09:05.026213 Launch detected - using supplied peer list: [10.50.212.140 10.50.212.141 10.50.212.142 10.51.35.5 10.51.35.6 10.51.35.8 10.51.35.10 10.51.35.11 10.51.35.166]
INFO: 2018/05/23 05:09:05.026236 Checking for pre-existing addresses on weave bridge
INFO: 2018/05/23 05:09:05.026366 weave bridge has address 172.200.0.40/24
INFO: 2018/05/23 05:09:05.286725 [allocator 82:b1:3f:68:58:96] Initialising with persisted data
INFO: 2018/05/23 05:09:05.286795 Sniffing traffic on datapath (via ODP)
INFO: 2018/05/23 05:09:05.287290 ->[10.51.35.11:6783] attempting connection
INFO: 2018/05/23 05:09:05.287886 ->[10.51.35.8:6783] attempting connection
INFO: 2018/05/23 05:09:05.287981 ->[10.51.35.166:6783] attempting connection
INFO: 2018/05/23 05:09:05.288036 ->[10.50.212.141:6783] attempting connection
INFO: 2018/05/23 05:09:05.288099 ->[10.50.212.142:6783] attempting connection
INFO: 2018/05/23 05:09:05.288168 ->[10.51.35.5:6783] attempting connection
INFO: 2018/05/23 05:09:05.288209 ->[10.50.212.140:6783] attempting connection
INFO: 2018/05/23 05:09:05.288270 ->[10.51.35.6:6783] attempting connection
INFO: 2018/05/23 05:09:05.288323 ->[10.51.35.10:6783] attempting connection
INFO: 2018/05/23 05:09:05.288568 ->[10.50.212.140:50051] connection accepted
INFO: 2018/05/23 05:09:05.289067 ->[10.50.212.140:50051|82:b1:3f:68:58:96(host1010050212140.corp.x.com)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/05/23 05:09:05.289210 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.289320 overlay_switch ->[7a:cc:70:08:75:f4(host1010050212142.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.289348 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.382849 ->[10.50.212.140:6783|82:b1:3f:68:58:96(host1010050212140.corp.x.com)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/05/23 05:09:05.385062 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/05/23 05:09:05.385208 Listening for metrics requests on 0.0.0.0:6782
INFO: 2018/05/23 05:09:05.488435 ->[10.51.35.8:6783|36:26:8b:a2:44:bd(host1010051035008.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.488542 overlay_switch ->[36:26:8b:a2:44:bd(host1010051035008.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.488563 ->[10.51.35.8:6783|36:26:8b:a2:44:bd(host1010051035008.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.488900 ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(host1010051035006.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.488941 overlay_switch ->[4e:4c:4b:cb:d6:ff(host1010051035006.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.488956 ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(host1010051035006.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.489235 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.489346 ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.489791 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.489860 ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.490123 ->[10.51.35.5:6783|46:0e:40:03:33:10(host1010051035005.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:09:05.490466 overlay_switch ->[a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.490496 ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.490751 overlay_switch ->[2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.490803 overlay_switch ->[7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.490854 overlay_switch ->[ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.490880 ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.490890 overlay_switch ->[46:0e:40:03:33:10(host1010051035005.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.490945 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.491102 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.491404 ->[10.51.35.5:6783|46:0e:40:03:33:10(host1010051035005.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:09:05.491868 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.588254 ->[10.51.35.8:6783|36:26:8b:a2:44:bd(host1010051035008.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.589464 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.589565 overlay_switch ->[7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)] using sleeve
INFO: 2018/05/23 05:09:05.589598 ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.683152 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.683274 sleeve ->[10.51.35.10:6783|7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:05.684081 sleeve ->[10.50.212.142:6783|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:05.689006 overlay_switch ->[a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)] using sleeve
INFO: 2018/05/23 05:09:05.689034 ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.689387 overlay_switch ->[ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)] using sleeve
INFO: 2018/05/23 05:09:05.689412 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.782883 overlay_switch ->[ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.783114 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.783361 ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(host1010051035006.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.783444 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.783673 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.883132 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.883194 overlay_switch ->[46:0e:40:03:33:10(host1010051035005.corp.x.com)] using sleeve
INFO: 2018/05/23 05:09:05.883728 ->[10.51.35.5:6783|46:0e:40:03:33:10(host1010051035005.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.884342 ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:09:05.884640 sleeve ->[10.51.35.6:6783|4e:4c:4b:cb:d6:ff(host1010051035006.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:05.885178 overlay_switch ->[7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.885735 sleeve ->[10.51.35.11:6783|a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:05.885837 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.886089 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:09:05.886310 sleeve ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:05.886435 overlay_switch ->[a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.886510 overlay_switch ->[46:0e:40:03:33:10(host1010051035005.corp.x.com)] using fastdp
INFO: 2018/05/23 05:09:05.886985 sleeve ->[10.51.35.5:6783|46:0e:40:03:33:10(host1010051035005.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:05.889595 sleeve ->[10.51.35.8:6783|36:26:8b:a2:44:bd(host1010051035008.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:05.889821 sleeve ->[10.51.35.166:6783|2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:09:07.288466 [kube-peers] Added myself to peer list &{[{46:0e:40:03:33:10 host1010051035005.corp.x.com} {4e:4c:4b:cb:d6:ff host1010051035006.corp.x.com} {36:26:8b:a2:44:bd host1010051035008.corp.x.com} {7a:6e:bb:9b:50:de host1010051035010.corp.x.com} {a6:7d:e7:8b:e3:20 host1010051035011.corp.x.com} {2a:f8:1d:6f:b6:ce host1010051035166.corp.x.com} {82:b1:3f:68:58:96 host1010050212140.corp.x.com} {ba:bb:dd:d8:ae:0e host1010050212141.corp.x.com} {7a:cc:70:08:75:f4 host1010050212142.corp.x.com}]}
DEBU: 2018/05/23 05:09:07.290827 [kube-peers] Nodes that have disappeared: map[]
172.200.0.40
INFO: 2018/05/23 05:39:08.253890 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)
INFO: 2018/05/23 05:43:00.838736 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection shutting down due to error: read tcp4 10.50.212.140:52877->10.50.212.141:6783: read: connection reset by peer
INFO: 2018/05/23 05:43:00.838799 ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection deleted
INFO: 2018/05/23 05:43:00.838966 ->[10.50.212.141:6783] attempting connection
INFO: 2018/05/23 05:43:00.839362 ->[10.50.212.141:6783] error during connection attempt: dial tcp4 :0->10.50.212.141:6783: connect: connection refused
INFO: 2018/05/23 05:43:00.842400 Removed unreachable peer ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)
INFO: 2018/05/23 05:43:02.090519 ->[10.50.212.141:6783] attempting connection
INFO: 2018/05/23 05:43:02.090897 ->[10.50.212.141:6783] error during connection attempt: dial tcp4 :0->10.50.212.141:6783: connect: connection refused
INFO: 2018/05/23 05:43:03.770486 ->[10.50.212.141:6783] attempting connection
INFO: 2018/05/23 05:43:03.770908 ->[10.50.212.141:6783] error during connection attempt: dial tcp4 :0->10.50.212.141:6783: connect: connection refused
INFO: 2018/05/23 05:43:08.231380 ->[10.50.212.141:6783] attempting connection
INFO: 2018/05/23 05:43:08.231668 ->[10.50.212.141:6783] error during connection attempt: dial tcp4 :0->10.50.212.141:6783: connect: connection refused
INFO: 2018/05/23 05:43:18.355590 ->[10.50.212.141:6783] attempting connection
INFO: 2018/05/23 05:43:18.355959 ->[10.50.212.141:6783] error during connection attempt: dial tcp4 :0->10.50.212.141:6783: connect: connection refused
INFO: 2018/05/23 05:43:21.214477 ->[10.50.212.141:54751] connection accepted
INFO: 2018/05/23 05:43:21.413264 ->[10.50.212.141:54751|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:43:21.413354 overlay_switch ->[ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)] using fastdp
INFO: 2018/05/23 05:43:21.413376 ->[10.50.212.141:54751|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:43:21.415173 ->[10.50.212.141:54751|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:43:21.617584 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:43:21.710120 sleeve ->[10.50.212.141:6783|ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:43:23.612096 Discovered remote MAC ba:bb:dd:d8:ae:0e at ba:bb:dd:d8:ae:0e(host1010050212141.corp.x.com)
INFO: 2018/05/23 05:47:17.987136 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection shutting down due to error: read tcp4 10.50.212.140:36445->10.50.212.142:6783: read: connection reset by peer
INFO: 2018/05/23 05:47:17.987192 ->[10.50.212.142:6783|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection deleted
INFO: 2018/05/23 05:47:17.987350 ->[10.50.212.142:6783] attempting connection
INFO: 2018/05/23 05:47:17.987731 ->[10.50.212.142:6783] error during connection attempt: dial tcp4 :0->10.50.212.142:6783: connect: connection refused
INFO: 2018/05/23 05:47:17.989294 Removed unreachable peer 7a:cc:70:08:75:f4(host1010050212142.corp.x.com)
INFO: 2018/05/23 05:47:20.331757 ->[10.50.212.142:6783] attempting connection
INFO: 2018/05/23 05:47:20.332094 ->[10.50.212.142:6783] error during connection attempt: dial tcp4 :0->10.50.212.142:6783: connect: connection refused
INFO: 2018/05/23 05:47:24.030005 ->[10.50.212.142:6783] attempting connection
INFO: 2018/05/23 05:47:24.030336 ->[10.50.212.142:6783] error during connection attempt: dial tcp4 :0->10.50.212.142:6783: connect: connection refused
INFO: 2018/05/23 05:47:27.544040 ->[10.50.212.142:6783] attempting connection
INFO: 2018/05/23 05:47:27.544398 ->[10.50.212.142:6783] error during connection attempt: dial tcp4 :0->10.50.212.142:6783: connect: connection refused
INFO: 2018/05/23 05:47:36.858407 ->[10.50.212.142:6783] attempting connection
INFO: 2018/05/23 05:47:36.858782 ->[10.50.212.142:6783] error during connection attempt: dial tcp4 :0->10.50.212.142:6783: connect: connection refused
INFO: 2018/05/23 05:47:39.411058 ->[10.50.212.142:34541] connection accepted
INFO: 2018/05/23 05:47:39.412666 ->[10.50.212.142:34541|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection ready; using protocol version 2
INFO: 2018/05/23 05:47:39.412753 overlay_switch ->[7a:cc:70:08:75:f4(host1010050212142.corp.x.com)] using fastdp
INFO: 2018/05/23 05:47:39.412778 ->[10.50.212.142:34541|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection added (new peer)
INFO: 2018/05/23 05:47:39.509451 ->[10.50.212.142:34541|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: connection fully established
INFO: 2018/05/23 05:47:39.609645 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/05/23 05:47:39.610837 sleeve ->[10.50.212.142:6783|7a:cc:70:08:75:f4(host1010050212142.corp.x.com)]: Effective MTU verified at 1438
INFO: 2018/05/23 05:47:41.210667 Discovered remote MAC 7a:cc:70:08:75:f4 at 7a:cc:70:08:75:f4(host1010050212142.corp.x.com)
INFO: 2018/05/23 06:09:16.240233 Discovered remote MAC 2a:f8:1d:6f:b6:ce at 2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)
INFO: 2018/05/23 06:26:43.675011 Discovered remote MAC a6:7d:e7:8b:e3:20 at a6:7d:e7:8b:e3:20(host1010051035011.corp.x.com)
INFO: 2018/05/23 06:43:46.743227 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)
INFO: 2018/05/23 06:51:36.880635 Discovered remote MAC 76:17:1e:4c:61:5a at 7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)
INFO: 2018/05/23 06:52:04.516709 Discovered remote MAC 1e:a1:37:f2:d9:86 at 2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)
INFO: 2018/05/23 07:39:09.033744 Discovered remote MAC 2a:f8:1d:6f:b6:ce at 2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)
INFO: 2018/05/23 14:12:00.242566 Discovered remote MAC be:e8:b4:d1:d1:14 at 2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)
INFO: 2018/05/23 15:07:00.243003 Discovered remote MAC be:e8:b4:d1:d1:14 at 2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)
INFO: 2018/05/23 16:08:18.408625 Discovered remote MAC 7a:6e:bb:9b:50:de at 7a:6e:bb:9b:50:de(host1010051035010.corp.x.com)
INFO: 2018/05/24 00:55:00.242122 Discovered remote MAC 1e:a1:37:f2:d9:86 at 2a:f8:1d:6f:b6:ce(host1010051035166.corp.x.com)

@bboreham
Copy link
Contributor

bboreham commented Nov 1, 2018

Fixed by #3442

@bboreham bboreham closed this as completed Nov 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants