Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Rolling restart can leave ipam hung #894

Closed
bboreham opened this issue Jun 11, 2015 · 4 comments
Closed

Rolling restart can leave ipam hung #894

bboreham opened this issue Jun 11, 2015 · 4 comments

Comments

@bboreham
Copy link
Contributor

If you have peers A, B, C running, then you kill A, start a new A', kill B, start a new B', etc., then you will end up with a ring owned by the old A, B, C and none of the running peers have any space. If they try to ask for any space they will only try to contact the now-gone peers.

(To trigger this symptom requires that the new peers have unique PeerName, i.e. the weave bridge has also been recreated)

This can be worked around by calling weave rmpeer on the new peers or (somewhat less reliably) by calling weave reset when shutting down each peer. Or by shutting down all the old ones before starting any new ones.

The weave logs are not very helpful; just a series of messages like:

weave 2015/06/11 12:20:12.412849 [gossip IPallocation]: unknown relay destination: 02:5e:a2:16:13:16
squaremo added a commit to plugins-demo-2015/demo that referenced this issue Jun 12, 2015
If one calls weave reset when reprovisioning, it means the weave node
gets a different identity. This can mean that it will not be able to
allocate IPs, due to weaveworks/weave#894
@rade
Copy link
Member

rade commented Jun 28, 2015

This will be resolved by #678. @bboreham did you have some other, separate solution in mind? Or some stop-gap, such as making it more detectable for users what is happening and what they should do about it?

@bboreham
Copy link
Contributor Author

Mostly wanted to document the problem and current mitigations.
Post-#1010 there is a message to tell you there are no peers available to talk to, but I think it's -debug level.

@rade rade modified the milestones: 1.1.0, current Jul 8, 2015
@rade rade modified the milestones: current, 1.1.0 Jul 17, 2015
@rade rade modified the milestone: 1.1.0 Aug 10, 2015
@rade rade modified the milestones: 1.3.0, n/a Nov 9, 2015
@rade
Copy link
Member

rade commented Nov 9, 2015

This got fixed in #1624. except for the case where the hosts are restarted w/o rmpeer, i.e. w/o invoking weave reset. Which is just not something one should do. We should document the need to invoke weave reset on a rolling restart, or indeed any controlled node shutdown.

@rade rade changed the title Rolling restart can leave ipam hung Rolling restart can leave ipam hung - not if you do it correctly; but need to document this Nov 9, 2015
@rade rade added this to the 1.5.0 milestone Jan 21, 2016
@rade rade removed this from the n/a milestone Jan 21, 2016
@rade
Copy link
Member

rade commented Jan 21, 2016

We should document the need to invoke weave reset on a rolling restart, or indeed any controlled node shutdown.

That is no longer true, post #1866. -> fixed.

@rade rade closed this as completed Jan 21, 2016
@rade rade changed the title Rolling restart can leave ipam hung - not if you do it correctly; but need to document this Rolling restart can leave ipam hung Jan 21, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants