Skip to content
This repository was archived by the owner on Jun 20, 2024. It is now read-only.

fix occasional failure of 870_weave_recovers_unreachable_ips_on_relaunch_3_test.sh in CI #3444

Closed
murali-reddy opened this issue Nov 3, 2018 · 4 comments
Milestone

Comments

@murali-reddy
Copy link
Contributor

murali-reddy commented Nov 3, 2018

After merge #3399 test 870_weave_recovers_unreachable_ips_on_relaunch_3_test.sh has been modified to not to explicitly do weave pod restart, instead relies the functionality in kibe-util which perfom weave rmpeer automatically. However its observed test has been failing occasionally. It appears that weave rmpeer does not reclaim IP's (or atlest instantly) leading to test failure.

log

DEBU: 2018/11/03 05:26:48.285669 registering for updates for node delete events
INFO: 2018/11/03 05:26:48.315171 Discovered remote MAC c2:db:37:0a:e3:3d at 0a:e0:a5:d1:8c:2d(test-11349-1-2)
INFO: 2018/11/03 05:26:48.698411 Discovered remote MAC 82:c9:fc:71:9b:41 at 0a:e0:a5:d1:8c:2d(test-11349-1-2)
INFO: 2018/11/03 05:26:49.455204 Discovered remote MAC f2:16:e3:fe:7b:79 at d6:93:32:ad:b8:6d(test-11349-1-1)
INFO: 2018/11/03 05:26:50.452105 Discovered remote MAC a6:05:62:7e:14:67 at d6:93:32:ad:b8:6d(test-11349-1-1)
INFO: 2018/11/03 05:26:54.887989 ->[10.128.0.13:6783|d6:93:32:ad:b8:6d(test-11349-1-1)]: connection shutting down due to error: read tcp4 10.128.0.10:48533->10.128.0.13:6783: read: connection reset by peer
INFO: 2018/11/03 05:26:54.889402 ->[10.128.0.13:6783|d6:93:32:ad:b8:6d(test-11349-1-1)]: connection deleted
INFO: 2018/11/03 05:26:54.895858 Removed unreachable peer d6:93:32:ad:b8:6d(test-11349-1-1)
DEBU: 2018/11/03 05:26:55.507406 [kube-peers] Nodes that have disappeared: map[d6:93:32:ad:b8:6d:{d6:93:32:ad:b8:6d test-11349-1-1}]
DEBU: 2018/11/03 05:26:55.509363 [kube-peers] Preparing to remove disappeared peer d6:93:32:ad:b8:6d
DEBU: 2018/11/03 05:26:55.509389 [kube-peers] Noting I plan to remove d6:93:32:ad:b8:6d
DEBU: 2018/11/03 05:26:55.534310 weave DELETE to http://127.0.0.1:6784/peer/d6:93:32:ad:b8:6d with map[]
INFO: 2018/11/03 05:26:55.549122 [kube-peers] rmpeer of d6:93:32:ad:b8:6d: 131072 IPs taken over from d6:93:32:ad:b8:6d
DEBU: 2018/11/03 05:26:55.569297 [kube-peers] Nodes that have disappeared: map[]
DEBU: 2018/11/03 05:26:55.574779 weave POST to http://127.0.0.1:6784/connect with map[replace:[true] peer:[10.128.0.10 10.128.0.8]]
INFO: 2018/11/03 05:26:55.575602 ->[10.128.0.10:6783] attempting connection
INFO: 2018/11/03 05:26:55.575989 ->[10.128.0.10:60701] connection accepted
INFO: 2018/11/03 05:26:55.576448 ->[10.128.0.10:60701|7e:e5:6d:a3:49:8f(test-11349-1-0)]: connection shutting down due to error: cannot connect to ourself INFO: 2018/11/03 05:26:55.576769 ->[10.128.0.10:6783|7e:e5:6d:a3:49:8f(test-11349-1-0)]: connection shutting down due to error: cannot connect to ourself INFO: 2018/11/03 05:26:55.737140 Discovered remote MAC c6:b1:c6:8b:6f:0d at 0a:e0:a5:d1:8c:2d(test-11349-1-2)
INFO: 2018/11/03 05:26:55.919062 Discovered remote MAC da:3f:74:7f:09:91 at 0a:e0:a5:d1:8c:2d(test-11349-1-2)

IPAM status

7e:e5:6d:a3:49:8f(test-11349-1-0) 524288 IPs (50.0% of total) (1 active)
0a:e0:a5:d1:8c:2d(test-11349-1-2) 393216 IPs (37.5% of total)
d6:93:32:ad:b8:6d(test-11349-1-1) 131072 IPs (12.5% of total) - unreachable!

@murali-reddy murali-reddy changed the title fix occasional failure 870_weave_recovers_unreachable_ips_on_relaunch_3_test.sh test fix occasional failure of 870_weave_recovers_unreachable_ips_on_relaunch_3_test.sh in CI Nov 3, 2018
@bboreham
Copy link
Contributor

bboreham commented Nov 5, 2018

Looking at the test code, it checks on both remaining hosts, so it is possible it checks before the update has been processed on the one that didn't do the rmpeer

unreachable_ip_addresses_count could be run under wait_for_x, so it will retry a few times.

@murali-reddy
Copy link
Contributor Author

unreachable_ip_addresses_count could be run under wait_for_x, so it will retry a few times.

Ok, let me try this and see if it works.

@murali-reddy
Copy link
Contributor Author

It appears occasionally weave rmpeer has no effect. IP range is not claimed

INFO: 2019/03/15 08:27:20.569765 ->[192.168.56.102:42394|62:e8:01:2a:9d:cc(192.168.56.102)]: connection shutting down due to error: read tcp4 192.168.56.100:6783->192.168.56.102:42394: read: connection reset by peer
INFO: 2019/03/15 08:27:20.570291 ->[192.168.56.102:42394|62:e8:01:2a:9d:cc(192.168.56.102)]: connection deleted
INFO: 2019/03/15 08:27:20.571137 Removed unreachable peer 62:e8:01:2a:9d:cc(192.168.56.102)
DEBU: 2019/03/15 08:27:22.386141 [kube-peers] Nodes that have disappeared: map[62:e8:01:2a:9d:cc:{62:e8:01:2a:9d:cc 192.168.56.102}]
DEBU: 2019/03/15 08:27:22.386393 [kube-peers] Preparing to remove disappeared peer 62:e8:01:2a:9d:cc
DEBU: 2019/03/15 08:27:22.386450 [kube-peers] Noting I plan to remove  62:e8:01:2a:9d:cc
DEBU: 2019/03/15 08:27:22.391062 weave DELETE to http://127.0.0.1:6784/peer/62:e8:01:2a:9d:cc with map[]
INFO: 2019/03/15 08:27:22.391793 [kube-peers] rmpeer of 62:e8:01:2a:9d:cc: 0 IPs taken over from 62:e8:01:2a:9d:cc

DEBU: 2019/03/15 08:27:22.416963 [kube-peers] Nodes that have disappeared: map[]
DEBU: 2019/03/15 08:27:22.420989 weave POST to http://127.0.0.1:6784/connect with map[replace:[true] peer:[192.168.56.100 192.168.56.101]]
INFO: 2019/03/15 08:27:22.421377 ->[192.168.56.100:6783] attempting connection
INFO: 2019/03/15 08:27:22.421695 ->[192.168.56.100:39700] connection accepted

    "IPAM": {
        "Paxos": null,
        "Range": "10.32.0.0/12",
        "RangeNumIPs": 1048576,
        "ActiveIPs": 2,
        "DefaultSubnet": "10.32.0.0/12",
        "Entries": [
            {
                "Token": "10.32.0.0",
                "Size": 262144,
                "Peer": "7e:ee:ea:63:fb:d1",
                "Nickname": "192.168.56.100",
                "IsKnownPeer": true,
                "Version": 4
            },
            {
                "Token": "10.36.0.0",
                "Size": 1,
                "Peer": "62:e8:01:2a:9d:cc",
                "Nickname": "192.168.56.102",
                "IsKnownPeer": false,
                "Version": 8
            },
            {
                "Token": "10.36.0.1",
                "Size": 262143,
                "Peer": "7e:ee:ea:63:fb:d1",
                "Nickname": "192.168.56.100",
                "IsKnownPeer": true,
                "Version": 0
            },
            {
                "Token": "10.40.0.0",
                "Size": 393216,
                "Peer": "c6:25:1c:4d:05:67",
                "Nickname": "192.168.56.101",
                "IsKnownPeer": true,
                "Version": 4
            },
            {
                "Token": "10.46.0.0",
                "Size": 1,
                "Peer": "62:e8:01:2a:9d:cc",
                "Nickname": "192.168.56.102",
                "IsKnownPeer": false,
                "Version": 16
            },
            {
                "Token": "10.46.0.1",
                "Size": 131071,
                "Peer": "7e:ee:ea:63:fb:d1",
                "Nickname": "192.168.56.100",
                "IsKnownPeer": true,
                "Version": 0
            }
        ],
        "PendingClaims": null,
        "PendingAllocates": null
    }
/home/weave # ./weave --local status ipam
7e:ee:ea:63:fb:d1(192.168.56.100)       655358 IPs (62.5% of total) (2 active)
62:e8:01:2a:9d:cc(192.168.56.102)            2 IPs (00.0% of total) - unreachable!
c6:25:1c:4d:05:67(192.168.56.101)       393216 IPs (37.5% of total)

@murali-reddy
Copy link
Contributor Author

Root cause for this issue found in #3628 and fixed by #3629

closing this issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants