Rolling restart can leave ipam hung #894

bboreham · 2015-06-11T14:31:01Z

If you have peers A, B, C running, then you kill A, start a new A', kill B, start a new B', etc., then you will end up with a ring owned by the old A, B, C and none of the running peers have any space. If they try to ask for any space they will only try to contact the now-gone peers.

(To trigger this symptom requires that the new peers have unique PeerName, i.e. the weave bridge has also been recreated)

This can be worked around by calling weave rmpeer on the new peers or (somewhat less reliably) by calling weave reset when shutting down each peer. Or by shutting down all the old ones before starting any new ones.

The weave logs are not very helpful; just a series of messages like:

weave 2015/06/11 12:20:12.412849 [gossip IPallocation]: unknown relay destination: 02:5e:a2:16:13:16

The text was updated successfully, but these errors were encountered:

If one calls weave reset when reprovisioning, it means the weave node gets a different identity. This can mean that it will not be able to allocate IPs, due to weaveworks/weave#894

rade · 2015-06-28T22:46:15Z

This will be resolved by #678. @bboreham did you have some other, separate solution in mind? Or some stop-gap, such as making it more detectable for users what is happening and what they should do about it?

bboreham · 2015-06-29T06:44:11Z

Mostly wanted to document the problem and current mitigations.
Post-#1010 there is a message to tell you there are no peers available to talk to, but I think it's -debug level.

rade · 2015-11-09T12:18:22Z

This got fixed in #1624. except for the case where the hosts are restarted w/o rmpeer, i.e. w/o invoking weave reset. Which is just not something one should do. We should document the need to invoke weave reset on a rolling restart, or indeed any controlled node shutdown.

rade · 2016-01-21T14:12:16Z

We should document the need to invoke weave reset on a rolling restart, or indeed any controlled node shutdown.

That is no longer true, post #1866. -> fixed.

bboreham added bug [component/ipam] labels Jun 11, 2015

squaremo mentioned this issue Jun 11, 2015

Survive docker cycles by _not_ resetting weave plugins-demo-2015/demo#4

Merged

rade modified the milestones: 1.1.0, current Jul 8, 2015

rade modified the milestones: current, 1.1.0 Jul 17, 2015

rade modified the milestone: 1.1.0 Aug 10, 2015

rade added the [component/docs] label Nov 9, 2015

rade modified the milestones: 1.3.0, n/a Nov 9, 2015

rade changed the title ~~Rolling restart can leave ipam hung~~ Rolling restart can leave ipam hung - not if you do it correctly; but need to document this Nov 9, 2015

This was referenced Jan 10, 2016

retain peer identity on reboot #1865

Closed

retain peer identity across host reboots #901

Closed

DanielDent mentioned this issue Jan 11, 2016

"all IP ranges owned by unreachable peers" - with no unreachable peers #1875

Closed

bboreham mentioned this issue Jan 11, 2016

Derive peer name from product_uuid #1866

Merged

rade removed the [component/docs] label Jan 21, 2016

rade added this to the 1.5.0 milestone Jan 21, 2016

rade removed this from the n/a milestone Jan 21, 2016

rade closed this as completed Jan 21, 2016

rade changed the title ~~Rolling restart can leave ipam hung - not if you do it correctly; but need to document this~~ Rolling restart can leave ipam hung Jan 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling restart can leave ipam hung #894

Rolling restart can leave ipam hung #894

bboreham commented Jun 11, 2015

rade commented Jun 28, 2015

bboreham commented Jun 29, 2015

rade commented Nov 9, 2015

rade commented Jan 21, 2016

Rolling restart can leave ipam hung #894

Rolling restart can leave ipam hung #894

Comments

bboreham commented Jun 11, 2015

rade commented Jun 28, 2015

bboreham commented Jun 29, 2015

rade commented Nov 9, 2015

rade commented Jan 21, 2016