"all IP ranges owned by unreachable peers" - with no unreachable peers #1875

DanielDent · 2016-01-11T13:20:56Z

weave is no longer assigning IP addresses. When I type weave status, I get "all IP ranges owned by unreachable peers - use 'rmpeer' if they are dead".

On every host:

weave status peers shows a list of all the other peers, all of which are active connections.
weave status targets shows a list of all the other targets, all of which there are connections to
weave status connections shows an active connection to every other host

Individual hosts in the weave mesh have been restarted frequently during various testing & configuration activities, but the mesh has been kept up and running. There aren't any currently inactive peers listed for me to rmpeer.

The text was updated successfully, but these errors were encountered:

DanielDent · 2016-01-11T13:25:11Z

Looks like this may be a dupe of #894 ?

bboreham · 2016-01-11T13:37:38Z

Can you do weave status ipam and post the result please?

DanielDent · 2016-01-11T14:01:39Z

Same output on all 3 hosts:
(redacted MAC-like string)(hosta) 262142 IPs (50.0% of total) - unreachable!
(redacted MAC-like string)(hostb) 1 IPs (00.0% of total) - unreachable!
(redacted MAC-like string)(hostc) 262145 IPs (50.0% of total) - unreachable!

If the MAC-like strings are relevant to debugging, please let me know.

bboreham · 2016-01-11T14:06:10Z

The mac-like string is a randomly-chosen ID for each peer. When you restart a peer it comes back with a new ID - if you look at weave status peers the ones running on hosta, etc., will probably have different IDs.

That being the case, you can you do weave rmpeer MAC-like string for each one and you should see it come back to life.

DanielDent · 2016-01-11T14:23:50Z

Thank you for the explanation & info on how to debug issues. That worked. :)

This does feel horribly broken. It seems to me like:
(A) a consensus protocol (e.g. raft)
(B) the ability for nodes to persist state across reboots
(C) some notion of a garbage collection for nodes which remain uncontactable for too long
(D) nodes being aware of the garbage collection period (so they know if they are out of contact with the quorum, they can no longer count on keeping the IP addresses they think are theirs)

Would help build a more robust system.

bboreham · 2016-01-11T14:45:12Z

Note that #1866 will (once completed) address the proximate cause, by not changing the identity of the peer.

(A) this is a very deliberate choice. See, for instance, http://blog.weave.works/2015/11/03/docker-networking-1-9-technical-deep-dive/
(B) is discussed at #678 and #1859
(C) was implemented but removed; there is no safe period for "too long".
(D) ditto

We are very keen on a robust system, but perhaps select different factors as pre-eminent. Note that consensus protocols simply halt in the presence of (enough) uncontactable nodes.

bboreham · 2016-01-11T17:34:25Z

I'm going to close this as a duplicate of #894

bboreham added the resolution/duplicate label Jan 11, 2016

bboreham closed this as completed Jan 11, 2016

bboreham added this to the n/a milestone Jan 12, 2016

hsteckylf mentioned this issue Mar 8, 2018

kube-dns and dashboard get into a crash loop coreos/coreos-kubernetes#878

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"all IP ranges owned by unreachable peers" - with no unreachable peers #1875

"all IP ranges owned by unreachable peers" - with no unreachable peers #1875

DanielDent commented Jan 11, 2016

DanielDent commented Jan 11, 2016

bboreham commented Jan 11, 2016

DanielDent commented Jan 11, 2016

bboreham commented Jan 11, 2016

DanielDent commented Jan 11, 2016

bboreham commented Jan 11, 2016

bboreham commented Jan 11, 2016

"all IP ranges owned by unreachable peers" - with no unreachable peers #1875

"all IP ranges owned by unreachable peers" - with no unreachable peers #1875

Comments

DanielDent commented Jan 11, 2016

DanielDent commented Jan 11, 2016

bboreham commented Jan 11, 2016

DanielDent commented Jan 11, 2016

bboreham commented Jan 11, 2016

DanielDent commented Jan 11, 2016

bboreham commented Jan 11, 2016

bboreham commented Jan 11, 2016