-
Notifications
You must be signed in to change notification settings - Fork 673
Trying to understand weave's connection behaviour #502
Comments
When I run weave I don't see the peers come in on the Docker bridge address. Maybe you are running Docker in some nonstandard configuration? |
Consider an existing weave network of A and B. Now a new peer C comes along. Due to firewall restrictions, it can connect to A but not B. But B can connect to C. With the above strategy, B will learn about the inbound connection on A from C (through gossip from A) and establish a connection to C. |
Not as far as I know. Weave is straight from master, with one host being my laptop host OS (fedora), and the two others being VMs (one fedora, one ubuntu). All on docker 1.5, connecting over a linux bridge. |
Correction: I have docker 1.4.1 on the host host, the VMs have 1.5. And the host host is the only one shows the docker bridge IPs on incoming connections. I haven't established whether these things are connected. |
Ok, I understand how the source addresses of incoming connections on my host are rewritten with the IP address of the docker hub. It's due to the interaction of docker's iptable rules with the libvirt iptable rules that do NATting for the VMs. I have my libvirt VMs configured to use a NATted bridge on 192.168.122.0/24, which involves these iptables rules:
Plus I have the following rules introduced by docker for the weave container:
When I do
The rules on the DOCKER chain then rewrite the destination address to 172.17.0.35. Then the POSTROUTING rules run, and now match the packet. From iptables-extensions(8) "Masquerading is equivalent to specifying a [SNAT] mapping to the IP address of the interface the packet is going out", which in this case is docker0. So the source address is rewritten to 172.17.42.1. And so the packet on the docker0 bridge becomes:
|
@dpw what, if anything, is left to do here? |
I second this sentiment - I'll have a stab at producing such an account. |
So, the reason I had left this open was as a reminder to decide whether the crazy address rewriting that caused my confusion is a libvirt issue or a docker issue, and to report it, possibly with a fix. But that's a fairly low priority right now. So @awh can take it, otherwise I would have suggested iceboxing it. |
#448 says:
Consider the case where weave is launched on three hosts, A, B, and C, with B and C told to connect to A. For instances, with A having IP address 19.168.122.1:
Once B and C have successfully connected to A, the topology section of the
weave status
output on any of the hosts will looks something like:In that case,
weave status
was run on B, and the peers appear in the order B, A, C.We see that B and C are connected to 192.168.122.1, as expected. And A shows two incoming connections from 172.17.42.1, which is the IP address of the docker bridge. So the IP addresses of B and C are being disguised by docker's connection proxying. Fair enough.
Now, due to #451, B and C will try to connect to 172.17.42.1, but on weave's standard port number, rather than the port numbers reported by A. So they try to connect to themselves, and doing
docker logs weave
on B or C reveals the following messages, repeated every few minutes:weave 2015/03/30 16:50:04.518189 ->[172.17.42.1:36367] connection shutting down due to error during handshake: Cannot connect to ourself
It's while this is not a terrible outcome, it does clutter the logs, and it is unclear to me what the rationale is.
It seems to me that ideally, A would find out the true IP addresses of B and C, and they would connect to each other. But I suppose this would mean running the router container with
--net=host
? So we rule that possibility out.But if we know that the weave router cannot trust the remote addresses reported for incoming connections, then what is the point of other peers trying to connect to those addresses, even with the normalized port number? I must be missing something.
At the very least, it would be good to have a full account of weave's connection behaviour and its rationale somewhere, because I am struggling to derive it from the code and github issues.
The text was updated successfully, but these errors were encountered: