IPAM reassigned same IP address to multiple containers on different hosts #1949

anandkumarpatel · 2016-02-02T23:58:14Z

Hi, I ran into this interesting issue a few times. I am seeing weave allocate the same IP address to two different containers on different hosts. I couldn't get more details at time of incident, but if it happens again I can grab more info. But I wanted to describe my use case to see if I am using weave in a wrong way.

I ran weave status ipam when the issue happened and the numbers of what % of IP's each server had was inconsistent. host_1 saw 30% allocated to host_2 but host_3 saw 20% allocated to host_2.

NOTE: I am only using weave router (weaveDNS and weave proxy not in use)

when I spin up a server, I get the IP's of all the peers, then launch the weave container passing all those peers
when a server gets terminated, I call weave forget <terminatedPeer> and weave rmpeer <terminatedPeer> on all other servers
when a container is started I simply call weave attach <containerId> (letting weave choose ip)

One interesting thing to note is that I can get into cases where all the peers that were passed to a server have been cycled through. But I call forget on them so I that should be fine.

Is this incorrect usage of weave? from the documentation it seemed like hosts should be able to leave the network as long as when a new server comes up, it knows about the current state of the system. Let me know if I am missing an assumption weave makes about a network.

The text was updated successfully, but these errors were encountered:

bboreham · 2016-02-03T10:05:26Z

Yes, calling weave rmpeer on more than one host is definitely wrong. We don't want to leave the address space owned by the old peer dangling, so we assign it somewhere, and specifically we pick the peer that you ran rmpeer on. So doing this on more than one host sets up the failure you have described.

Once the peers detect inconsistency they (should) log the error and drop the connection that supplied the inconsistent data. So multiple peers can each carry on with their own view of the world, but the network is effectively broken.

Some peers will manage to communicate their claim to others before they run their rmpeer (i.e. it's a race), so actually what I would expect is a few cliques of peers that are still talking to each other, but repeatedly dropping attempted connections with peers in other cliques.

I will look into reporting this error condition better; in #1946 the best clue was "Inconsistent state", which is a bit cryptic.

anandkumarpatel · 2016-02-04T01:13:21Z

Thanks for this information!
What is the best practice here then? I know when a node goes out of service, should I:

call weave rmpeer and weave forget only one host
do nothing, weave will handle it itself
call only one of these weave rmpeer and weave forget on one host
call weave rmpeer on one host and weave forget on all host

rade · 2016-02-04T11:38:21Z

4

rade · 2016-02-08T10:24:41Z

4

actually, the best thing to do is call weave reset on the node that goes out of service, and weave forget on the other nodes.

Make error messages on ring merge clearer

anandkumarpatel · 2016-02-08T17:39:49Z

I see. Sadly calling reset on a node going out of service is not an option because the node might just disappear without time to run the command (running on ec2 spot instances)

rade · 2016-02-08T19:50:14Z

I see. In which case 4 remains your best course of action.

bboreham self-assigned this Feb 3, 2016

bboreham mentioned this issue Feb 5, 2016

Make error messages on ring merge clearer #1957

Merged

awh added a commit that referenced this issue Feb 8, 2016

Merge pull request #1957 from /issues/1949-ring-errors

6d1e33b

Make error messages on ring merge clearer

rade added this to the n/a milestone Feb 18, 2016

rade added the resolution/invalid label Feb 18, 2016

rade closed this as completed Feb 18, 2016

anandkumarpatel mentioned this issue Apr 7, 2016

Host unreachable: netlink error response: no such device #2137

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPAM reassigned same IP address to multiple containers on different hosts #1949

IPAM reassigned same IP address to multiple containers on different hosts #1949

anandkumarpatel commented Feb 2, 2016

bboreham commented Feb 3, 2016

anandkumarpatel commented Feb 4, 2016

rade commented Feb 4, 2016

rade commented Feb 8, 2016

anandkumarpatel commented Feb 8, 2016

rade commented Feb 8, 2016

IPAM reassigned same IP address to multiple containers on different hosts #1949

IPAM reassigned same IP address to multiple containers on different hosts #1949

Comments

anandkumarpatel commented Feb 2, 2016

bboreham commented Feb 3, 2016

anandkumarpatel commented Feb 4, 2016

rade commented Feb 4, 2016

rade commented Feb 8, 2016

anandkumarpatel commented Feb 8, 2016

rade commented Feb 8, 2016