Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Indefinite reconnects occur if two peers are launched referencing each other #478

Closed
awh opened this issue Mar 24, 2015 · 13 comments · Fixed by #555
Closed

Indefinite reconnects occur if two peers are launched referencing each other #478

awh opened this issue Mar 24, 2015 · 13 comments · Fixed by #555
Assignees
Labels
Milestone

Comments

@awh
Copy link
Contributor

awh commented Mar 24, 2015

See @nordbergm's comment in issue #341 for original report. If two peers are launched simultaneously referencing each other by IP, they will establish connectivity (confirmed with weave status on each peer) however one of them will also be trying to reconnect to the other periodically, an attempt which fails as there is already a connection between the peers.

@awh awh added the bug label Mar 24, 2015
@awh
Copy link
Contributor Author

awh commented Mar 24, 2015

I suspect this is due to the to the fact that when we construct the ourConnectedTargets map in ConnectionMaker.checkStateAndAttemptConnections we are populating it with a conn.RemoteTCPAddr() that contains the ephermeral port of a connection that was established inbound to us, and therefore the check later on where we filter out targets fails as we're looking for the IP paired with the normalised standard port. Consequently it's only later on when we actually talk to the peer and get its name do we realise it's one we're already connected to...

@awh awh self-assigned this Mar 24, 2015
@fermayo
Copy link

fermayo commented Mar 24, 2015

+1
We run into this at Tutum as each node tries to connect to each other node. They end up in loops attempting connections to already existing peers.

@awh
Copy link
Contributor Author

awh commented Mar 30, 2015

Hi @fermayo - it turns out this is not a bug, rather a consequence of behaviour that is required to make weave work properly in the presence of network address translation. In essence, the weave router cannot assume that if it has an inbound connection from a particular IP from a particular peer, that connecting back to that IP leads back to the same peer. Furthermore, even when it does turn out to be the same peer we cannot cache that information and prevent further reconnects because in the general case we need to be able to support changing NAT mappings and mobile hosts.

One option would be to add a command line flag that instructs weave that NAT isn't being used between any peers in a weave network (--all-peer-ips-unique or somesuch), in which case it can safely assume that it doesn't ever need to connect back to an IP once it sees an inbound connection from it. Would this work for you?

@awh awh removed the bug label Mar 30, 2015
@fermayo
Copy link

fermayo commented Mar 30, 2015

Ok, I understand. So maybe we could hide the logs (in the default logging level) after the first connection attempt fails with Already have connection to ..., even if weave is trying to connect to it in the background? In most cases, it will never reach a different peer using that IP and makes troubleshooting any other problem very difficult.
I would avoid using the flag you suggest so we cover more connectivity scenarios using the default approach of reconnecting indefinitely.

@rade
Copy link
Member

rade commented Mar 30, 2015

I don't think ordinary NAT is an issue here. As of #451, weave generally won't attempt to connect to the ephemeral ports you'd typically see as the sources of NAT'ed connections. We only connect to the standard weave port of such addresses. The odds of finding different peers there over time are small. Same goes for non-NAT'ed connections.

Mobile peers, i.e. peers that keep changing IPs, possibly cycling through a common IP space, is a different matter. I'd be surprised if that was a common weave deployment scenario though.

@bboreham
Copy link
Contributor

A public cloud (e.g. EC2) where IP addresses are recycled by machines coming and going is another scenario where we can't rely on finding the same peer at a given IP address over time.

@awh awh removed their assignment Apr 8, 2015
@rade
Copy link
Member

rade commented Apr 9, 2015

The "finding a different peer at the same address over time" issue is a red herring. A weave peer attempts to connect to:

  1. Any IP:port specified when that peer was weave launched or weave connected
  2. Any IP:port of the remote end of an outbound connection another peer has
  3. Any IP:standard-port of the remote end of an inbound connection that another peer has

...in all cases only if it does not already have a connection to that IP:port.

The proposed change is to introduce a flag that controls whether the port is relevant in that test. This does not affect weave's ability to discover a different peer at the same IP address over time.

@rade
Copy link
Member

rade commented Apr 9, 2015

@fermayo Are you aware of any configuration in your current/planned deployments where a single weave peer would be required to make multiple outbound connections to same IP but different ports?

@rade
Copy link
Member

rade commented Apr 9, 2015

Incidentally, resolving #412 would probably cut down on log noise a fair bit.

@fermayo
Copy link

fermayo commented Apr 10, 2015

@rade no, we don't have any plans to do that

@rade rade added the feature label Apr 12, 2015
@rade rade self-assigned this Apr 12, 2015
@rade
Copy link
Member

rade commented Apr 12, 2015

I wonder whether a flag is really the right approach here. Really the property is associated with the individual addresses supplied on the command line. e.g. one wants to be able to say "connect to this address", or "connect to this address only if I don't already have a connection to the same IP".

How about we base that decision on whether a port has been supplied? i.e. supplying an address w/o a port means "connect to this address on the standard port, but only if there isn't already a connection to that IP".

@rade
Copy link
Member

rade commented Apr 12, 2015

Can we think of common scenarios where a user would want their weave to "connect to this address on this non-standard port, but only if there isn't already a connection to that IP"? Bear in mind that the standard port is configurable these days.

@rade
Copy link
Member

rade commented Apr 12, 2015

Bear in mind that the standard port is configurable these days.

...though not properly, and in a documented way, until #551 gets merged.

rade added a commit to rade/weave that referenced this issue Apr 14, 2015
If a peer was specified on the command line w/o a port, then do not
connect to it if we have inbound connections from that IP.

This eliminates unnecessary (and ultimately failing0 connection
attempts in the common scenario of several peers having been told
about each other on their command lines.

Closes weaveworks#478.
@rade rade added this to the 0.10.0 milestone Apr 14, 2015
@rade rade closed this as completed in #555 Apr 15, 2015
rade added a commit that referenced this issue Apr 15, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants