Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

"all IP ranges owned by unreachable peers" - with no unreachable peers #1875

Closed
DanielDent opened this issue Jan 11, 2016 · 7 comments
Closed

Comments

@DanielDent
Copy link

weave is no longer assigning IP addresses. When I type weave status, I get "all IP ranges owned by unreachable peers - use 'rmpeer' if they are dead".

On every host:

  • weave status peers shows a list of all the other peers, all of which are active connections.
  • weave status targets shows a list of all the other targets, all of which there are connections to
  • weave status connections shows an active connection to every other host

Individual hosts in the weave mesh have been restarted frequently during various testing & configuration activities, but the mesh has been kept up and running. There aren't any currently inactive peers listed for me to rmpeer.

@DanielDent
Copy link
Author

Looks like this may be a dupe of #894 ?

@bboreham
Copy link
Contributor

Can you do weave status ipam and post the result please?

@DanielDent
Copy link
Author

Same output on all 3 hosts:
(redacted MAC-like string)(hosta) 262142 IPs (50.0% of total) - unreachable!
(redacted MAC-like string)(hostb) 1 IPs (00.0% of total) - unreachable!
(redacted MAC-like string)(hostc) 262145 IPs (50.0% of total) - unreachable!

If the MAC-like strings are relevant to debugging, please let me know.

@bboreham
Copy link
Contributor

The mac-like string is a randomly-chosen ID for each peer. When you restart a peer it comes back with a new ID - if you look at weave status peers the ones running on hosta, etc., will probably have different IDs.

That being the case, you can you do weave rmpeer MAC-like string for each one and you should see it come back to life.

@DanielDent
Copy link
Author

Thank you for the explanation & info on how to debug issues. That worked. :)

This does feel horribly broken. It seems to me like:
(A) a consensus protocol (e.g. raft)
(B) the ability for nodes to persist state across reboots
(C) some notion of a garbage collection for nodes which remain uncontactable for too long
(D) nodes being aware of the garbage collection period (so they know if they are out of contact with the quorum, they can no longer count on keeping the IP addresses they think are theirs)

Would help build a more robust system.

@bboreham
Copy link
Contributor

Note that #1866 will (once completed) address the proximate cause, by not changing the identity of the peer.

(A) this is a very deliberate choice. See, for instance, http://blog.weave.works/2015/11/03/docker-networking-1-9-technical-deep-dive/
(B) is discussed at #678 and #1859
(C) was implemented but removed; there is no safe period for "too long".
(D) ditto

We are very keen on a robust system, but perhaps select different factors as pre-eminent. Note that consensus protocols simply halt in the presence of (enough) uncontactable nodes.

@bboreham
Copy link
Contributor

I'm going to close this as a duplicate of #894

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants