-
Notifications
You must be signed in to change notification settings - Fork 670
new peers cannot join when connection limit has been reached #426
Comments
As an interim improvement, any objections to raising the connection limit to 100? That's 10k connections in a fully connected network, which I should think modern network kit has no trouble handling. Beyond that... some algorithm that decides whether to "bump" existing connections for new ones based on "connectivity":
and a few more like that. The objective is to maximise routability between peers and to minimise hop count. That should do for starters; we can get more clever later, taking into account latencies, bandwidth, congestion, etc, etc. |
Sounds a reasonable solution. If I can get weave working well for my test senario we could easily have a network of thousands. |
+1 to allow more than 11 peers to join the network ;-) |
Note that there is a command-line parameter |
I can get 29 weaves running on my laptop, and once all connections are established, the avg cpu load is ~10%. 30 runs into difficulty because the concurrent connection establishment does create a very high load, resulting in some timeouts. But that wouldn't happen when running each weave in a separate machine (with more CPU resources overall than my single laptop). Stop gap solution for #426.
Recent changes have made weave much better behaved when there are large numbers of fully-connected peers. I can run 50 fully-connected weaves on my laptop, and could no doubt go much further if each weave was on a separate machine, as is typically the case. Therefore in most deployments of >30 peers it should be possible to avoid the problems described in this issue by supplying an increased |
Just hit this issue, new nodes can not join, took some time to debug, wondering if our machine has network issues... This limit really should be well documented, especially in site/troubleshooting.md |
Please make it possible to set this |
Since I did not perform a deep dive into the code, but rather just assumed that the docs would mention how to modify such a crucial value, I honestly believed based on your edit in the first post in this thread ("We've since increased the default limit to 30.") that the only way to modify the level would be to recompile and push my own image. What happens in an auto-scaling scenario, and my once-correct connection limit value is no longer valid? Can I update my weave network in a rolling fashion with new |
Sorry this was missed from the docs. We should really set the default a bit higher - it's set very cautiously because we were interested to hear from users what sorts of sizes and topologies they had, but 20 million "pulls" later that's not happening. Yes, rolling update is fine - there will be a brief break in communications on each node as the routing and policy rules are recreated, but this should be invisible because higher-level protocols (e.g. TCP) will retransmit. |
@bboreham About this issue ...
We had a cluster scale up to 33 nodes yesterday. 2 nodes could not connect to the rest of the cluster. We fixed it by setting the
Should we worry about this? 😅 |
No, don't worry, that comment was nearly three years ago, and the bug was fixed. Default limit will be raised to 100 by #3234. |
@bboreham Perfect! Thanks. 🙌 |
We've since increased the default limit to
30100200.If you have 11 peers all fully-connected, then a 12th peer is unable to make a connection. This is because there is a limit of 10 connections per peer and all of them are at that limit.
Evidence:
weave status
from one of 11 connected peers:Log from new peer:
weave status
from new peer:Logs from existing peer:
The text was updated successfully, but these errors were encountered: