Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

investigate restarting of networks and traffic incurred #1396

Open
acud opened this issue May 15, 2019 · 8 comments
Open

investigate restarting of networks and traffic incurred #1396

acud opened this issue May 15, 2019 · 8 comments
Assignees

Comments

@acud
Copy link
Member

acud commented May 15, 2019

when we restart a swarm node the kademlia table's connection changes - nodes which are known about and that were with established connections to us are not prioritized over other nodes.

this, in turn, incurs a big traffic load as a node suddenly has to push for example half of its chunks (bin 0) to all of the new peers in bin 0

this is easily reproducible when running a persistent cluster, shutting it down, then starting the nodes again.

we should prioritize known peers when reconnecting

@acud acud added the kademlia label May 15, 2019
@nonsense
Copy link
Contributor

when we start a swarm node, we should know what our previous kademlia table was (who our peers were) and peers that we were connected to should have priority over peers we know about.

@nonsense
Copy link
Contributor

nonsense commented May 15, 2019

when we look into this we should also consider how to reduce the number of redundant peers - sometimes we have a lot of peers in bin 0, 1, etc. - we should have only K peers (where K is 2 or 3 or other small value), rather than 10 or 20 peers in lower bins.

peers exchange information regarding their view of the network (the hive protocol), if they also say how many peers they are connected to in a given bin, we could maybe start dropping redundant connections/peers.

@kortatu
Copy link
Contributor

kortatu commented Sep 30, 2019

when we look into this we should also consider how to reduce the number of redundant peers - sometimes we have a lot of peers in bin 0, 1, etc. - we should have only K peers (where K is 2 or 3 or other small value), rather than 10 or 20 peers in lower bins.

How this relates to #1436 ?
In PR #1833, we have implemented a higher number of peers for lowe bins (0,1, etc..) than for higher bins.

@nonsense
Copy link
Contributor

@kortatu good question, sorry about the confusion.

Currently by design, we should have only K peers in every bucket. This is what is mentioned in this issue here.

However as explained in #1436 , this is not a great solution because:

Make sure we have a lot of peers in bin 0, as they are responsible for retrieve requests for half of the chunks on any given download, and we should run fetch requests to all of them. For example, if we are supposed to fetch a 4GB file in a timely fashion, then 1/2 * 4GB == 2GB chunks will be retrieved from bin 0 peers, then we need at least 10 peers for bin 0, in order to deliver adequate user experience.

I believe the correct solution is to use more peers in lower bins and load balance on them (according to XOR, or round-robin, or whatever).

I will check #1833 but based on what you say, you've implemented a better strategy than the one we currently have and discussed here.

@nonsense
Copy link
Contributor

@kortatu then again, ideally we should have functionality to drop redundant peers and some way of determining which peer connection is redundant.

@kortatu
Copy link
Contributor

kortatu commented Sep 30, 2019

I think one of the things we are not considering in suggest peer is the address space covered by each peer.
For example, take that your node address is 00000000, you will try to fill in bin 0 with addresses 1xxxxxxx. But we are not considering which ones, only that they go in bin 0.
If we, for example take 10000000 and 10000001 , is totally ok for our suggest peer algorithm but they cover less address space and also, we are forced to have some load balancing between them because almost all addresses in 1x will have the same distance to those peers.
But if we instead take an address 11000000 the load balancing will be naturally done (half the addresses in bin 0 will be closer to peer 1 10000000, and half to peer 2 11000000), and also we will reach faster the destination (as we can skip one jump getting directly to 11xxxxxx if we need it).

This problem is more sever in bin 0 but it can happen in every bin. That's why I think that, even though is a good thing to have a Load Balancing when several peers are the same distance away from the destination, a better selection of peers will improve performance and distribution.

@nonsense
Copy link
Contributor

@kortatu you are right about that. There is not guarantee that the space will be split between the peers in a given bin, and they can be skewed a lot.

@kortatu
Copy link
Contributor

kortatu commented Sep 30, 2019

And if the case we have the three peers 10000000, 10000001 and 11000000, one of the first two will be redundant as you said.

And other consequence of the address space is that, by looking at the stats of connections per peers, we can decide to get a new peer near the the ones more overloaded. So if the previous example, I have much more connections to 11000000 we should consider looking for a peer near that one , ideally 11100000 as it will cover half the space previously covered by 11000000 (so if we find some overloaded peer with a PO of N, we should add peer with a proximity order of N+1, N+2... to it )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants