-
Notifications
You must be signed in to change notification settings - Fork 112
Timeout Management #244
Comments
(discussed a bit in person) To account for busy peers, we should consider only timing out peers if they haven't sent us anything (want have, block, don't have) within some period. (not discussed in person) Actually, we could even do this in all cases for all peers with outstanding wants. This might be significantly simpler than tracking per-want timeouts. Thoughts? I don't want to redirect this when you've already started, only if you think this would be simpler. I'm even wondering if it makes sense to remove old peers from the session the same way (when idle) (we may have discussed this?). My thinking is that, if an old peer isn't responding to requests, it might not have anything we're looking for. If we run out of peers to ask, we'll just broadcast. (and we'll broadcast occasionally anyways). |
I think we still need to track per-want timeouts for the case where we send a want-block to a peer running an old version of Bitswap, because old Bitswap simply doesn't respond to want-block if it doesn't have the block (this is the case covered by #248)
If the peers have a part of the data that we haven't got to requesting yet, and they are dropped from the session, then we have to wait for a timeout + broadcast in order to add them back into the session. |
This would unfairly disadvantage old peers who have some of the blocks. On the other hand, old peers that have some of the blocks are more likely to slow us down if we have other peers with most/all of the blocks so there is a trade-off here.
Note: we'll have to wait for a timeout either way. If the peer turns out to be bad and we keep randomly selecting them, we'll have to wait for a timeout in that case as well before we ask other peers in the session. |
Ah yeah that's true. I suggest we consider each kind of timeout separately:
One mitigating factor is that Bitswap will favour peers with lower latency, which are likely to be new peers (as the simulated DONT_HAVE for old peers occurs after latency + padding). |
I have two concerns with that:
|
I'm not sure if I was clear above, I'm suggesting that we implement all 3 of these timeouts. Note that the Session timeout (3) is already there, it was in the old Bitswap and was ported over to the new Bitswap. 1. DONT_HAVE timeout (peers running old Bitswap)You're right it's tricky to measure accurately because the peer may be busy sending blocks to other peers. In practice though it may not be such a problem:
So the DONT_HAVE timeout becomes more like a hint: wait for a peer for x amount of time, and if it doesn't respond, ask someone else at the risk of getting some duplicates. 2. Peer timeout
|
👍
My concern is that:
You're right in that it might not lead to duplicate blocks but it might lead to a lot of extra work (i.e., sending lots of wants to peers that we immediately cancel, and forcing peers to deal with these). Also remember: "freezing". IIRC, we turned that off in the new system but canceling wants to old peers will slow us to a crawl (maybe not something we really care about?). However, this is something that we should be able to test.
Ah, sorry, I misread this. I thought this only applied to new peers. Applying this to all peers resolves the rest of my concerns. |
Sync agreement: go with the simplest solution for now. We won't know the block latency anyways when we first start so this isn't wasted work. Later, we can experiment with tracking per-block latencies. |
Closing in favour of ipfs/boxo#125 |
When a Bitswap Session fetches a block it sends want-haves to all the peers in the session, and a single "optimistic" want-block to one of the peers. If the peer that receives the want-block responds with DONT_HAVE, the Session checks the other responses and sends want-block to any peer that responded with HAVE.
If a peer that receives a want-block doesn't respond, the Session will hang. A peer may not respond because:
Proposed solution:
The text was updated successfully, but these errors were encountered: