-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ipfs/go-bitswap] Reduce Broadcasting #81
Comments
I agree with the general idea 👍 We don't need to broadcast so many CIDs. Two considerations:
|
So, my point here is that including more than one CID doesn't actually help us in the end. See:
Basically, sure, adding additional CIDs may get us those blocks. But it won't get us any closer to finding the CIDs that are already in our broadcast set.
Yes, at that point we've already moved on. I'm proposing announcing the last block requested to find the "most specific" peers. "general" peers (e.g., ones that have the root only) may still be able to help "share the load" so to speak, but I'm not sure if it's worth broadcasting to find those peers. |
Ok makes sense 👍
I think we may want to broadcast want-haves for several CIDs at once. First we broadcast a request for cid1, and get back Peer A:
Peer B, C & D connect to us.
We broadcast a want-have for cid 6 and get a HAVE from Peer B. Instead of broadcasting one want-have at a time it would be more efficient to broadcast all of them together. |
It's a performance trade-off, but I don't think this scenario is all that common. The basic assumption of sessions is that most blocks being requested through the session are correlated in some way. If different peers have individual blocks, the session is pretty useless. But I guess the best solution is to adapt:
NOTE: we'd only increase the number of CIDs if the broadcast was actually successful. That is, it found a new peer that wasn't previously in the session that had the block. |
@Stebalien how about using probabilities via feedback from previous attempts to determine which nodes should get a "broadcast" request? We do number_of_supplies / number_of_requests for each peer – so for example, we get 0.03 for a 3% chance of a successful reply. Then we feed this into an 'exponential moving average' with an alpha of 0.005. If we don't do any requests to peers, their value should just be frozen. This avoids dropping of the value by supplying zeros and also avoids that we have to do calculations. To make sure new connections get some chance of getting requests, we should send a value of 1 into the 'exponential moving average' to get it started. Now we use the result as the probability that we send a given CID-broadcast to a node we know (maybe not even be connected to). We might want to redial some known "useful" nodes in the past, to check if they got useful data for us now. So if we have nodes which don't supply any useful data for us in the recent past, we slowly phase them out from being asked and just focus on the nodes which do useful response to us. |
Currently, when we can't find some content, we add it to the "broadcast" list and ask every connected peer. Unfortunately, this means that bitswap nodes under heavy load will:
Furthermore, these massive broadcast lists end creating a lot of unnecessary traffic.
Insight: Sessions tend to be highly correlated (by design) so asking for more than one CID in a session is unlikely to give us any extra information. At the very least, broadcasting a request for a new CID B won't get us any closer to finding some previously broadcast CID A.
Given this insight, we can trim down the broadcast list to one want per session, which is much more manageable. It also doesn't really matter which want, as long as it's relevant.
Proposal:
The text was updated successfully, but these errors were encountered: