Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce number of connections #4695

Closed
istae opened this issue May 27, 2024 · 8 comments
Closed

reduce number of connections #4695

istae opened this issue May 27, 2024 · 8 comments
Assignees
Labels

Comments

@istae
Copy link
Member

istae commented May 27, 2024

currently on mainnet, average number of connections is around 200 connections per node, and this is mainly due to the high storage radius and high number of nodes (relative to the radius) so the max connections per bin should be adjusted.

We can achieve this by reducing the over saturation peers count from 20 to 16 in kademlia.

@bee-runner bee-runner bot added the issue label May 27, 2024
@janos
Copy link
Member

janos commented May 29, 2024

May I ask what is the reason to reduce the number of connections? Number of connections influences the network topology with the consequence of changing the number of hops required for a chunk to reach the desired node. It would be good to measure what is the benefit in respect of the download and upload performance (speed and resource consumption) before settling on the saturation peers count. I would even say that a dynamic saturation peers count is a good thing to have based on the network conditions, but that would be a bit larger feature to add.

@istae
Copy link
Member Author

istae commented May 29, 2024

That reminds me of an old branch from 3 years ago :) #2530
The simple reason is to reduce the connection count or limit it without hopefully not hurting performance.

I believe we can achieve better performance by having higher peer count for shallower bins and fewer peers for deeper bins. This may seem counterintuitive but on average half of the chunk requests will go to bin 0 because relative to your address, half of the network falls in your bin 0.

So instead of having a constant 16 suffix addresses (4 bits) to balance the bins, image that we have 64 balanced addresses for bin 0, then 32 for bin 1, 16 for bin 3, 8 for the rest of bins, request hops could be drastically reduced, and the total sum of connection count will lower than 200.

@janos
Copy link
Member

janos commented May 29, 2024

That is a very interesting approach. I am not sure why #2530 is abandoned, as it has potential. I believe that research team should validate it before making changes to the topology. There is also a possibility that by reducing the number of peers in bins can result with the same or even higher syncing activity as the number of hops is increased. But, any assumptions should be validated by measuring changes in syncing and retrieving time and system resources consumption.

@ldeffenb
Copy link
Collaborator

May I ask what is the reason to reduce the number of connections?

I believe the origin of this request is due to some home routers becoming completely saturated and leaving the local network unusable when multiple bee nodes are run.

@janos
Copy link
Member

janos commented May 29, 2024

I believe the origin of this request is due to some home routers becoming completely saturated and leaving the local network unusable when multiple bee nodes are run.

A configurable maximal number of connected peers would be good to have for such situations for users to fine tune based on resources. Kademlia already has SaturationPeers in its Options struct it just is not configurable using cli flags or configuration files.

I still believe that reducing the number of peers is not the best solution, but to address the issue in the syncing protocols.

@zelig
Copy link
Member

zelig commented May 30, 2024

This should definitely be a SWIP first.

Actually, the peercount in the PO bins ought to be chosen to reflect 1) the connectivity restrictions of a node and 2) the throughput requierments.

  1. should be calibrated or bounded by a configurable constant non-negative integer (yes, in fact setting it to 0 should create a non-connected local client).
  2. should be informed by the distribution of chunk push and retrieve requests.

(2) is working the following way.
Let's consider a random sample of N swarm chunks. When you take the chunks PO with respect to a particular address the POs follow a reverse expobential scale. In particular, in PO i, you will have 2^{1-i}*N chunks. Therefore when a node sends requests for these N chunks the number requests that should be routed to a peer in PO bin i is 2^{1-i}*N.
The chunks falling into a bin are uniformly distributed so as long as the peers in the bin are balanced each gets the same amount of requests.

Our ultimate goal is to handle requests most efficiently, i.e., to potentially max out the throughput of each peer in the event of a lot of requests.
With the naive assumption of the throughput of each peer connection being constant (or at least a distribution independent of the peer). the best strategy to max out throughput is to have a uniform distribution of requests over peers.

You need to connect to each node within the neighbourhood designated by the storage depth D, i.e., cca S*2^{-D}
If the max of peers is M, you got R=M-S*2^{-D} connections to allocate to the first D PO bins.
So if you operate a swarm node with a local API calls and do not participate in to guarantee uniform distribution, then you need 2^{1-i}*R peers in PO bin i.
The same applies if the node is serving requests to light clients with 1 connection

Interestingly, if you are a node operator and not using the API, ie. only do forwarding for full nodes with saturated kademlia table, then the distribution of requests is constant across bins so you need to put R/(D-1) peers in each of the bins 1, ..., d-1

So,

  • using the exponential decay formula peers in shallower bins will idle when it is forwarding traffic.
  • using the constant formula, peers in deeper bins will idle when it is API or light cllient traffic.
  • opportunistic caching probably tiltts the whole thing towwwards expo decay..

now go figure

@zelig
Copy link
Member

zelig commented May 30, 2024

The ultimate solution though should probably be directly driven by throughput. If some peers max out, then we open a connection to a peer that is a PO deepest new sister to that node

@istae
Copy link
Member Author

istae commented Aug 21, 2024

Actions taken so far:

  1. Oversaturation count per bin has been reduce from 20 to 18. fix(kademlia): tweak saturation count #4760
  2. The prune bin function was using the wrong counting method, so 1-2 peers were being pruned frequently unnecessarily fix(prune): prune func does not count peers correctly #4759
  3. Stopped pullsyncing with peers below a certain bin limit syncing with peers below a certain threshold #4762

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants