-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitswap wantlist consistency #3398
Comments
Note: Its worth thinking about connection closing mechanics here, fixing this correctly will make connection closing much simpler. |
Another idea: use something other than connections for triggering this condition. This will never stop giving if we don't manage to abstract connections away properly.
|
@lgierth So would my node generate an instance ID on daemon startup and use that to send to everyone in identify? Or would each node just pick a random number on connection, chosen by the local node (meaning you don't have to send anything)? |
I think UUID is quite a good way to solve it, I don't know if it would be on node level or bitswap level. |
Yep the former. A node would generate the instance id on startup and keep it as long as it runs, i.e. per-process-and-peerid, not per-connection. If a node reconnects to you, it'll do a new identify handshake and thus send its new instance id, which triggers the condition.
Yeah UUID would work as a generator. I can imagine a similar data sync situation with other (future) protocols, that's why I was thinking to do generically in identify. |
We can have some cooldown-ish interval for that, similar to flood protection |
No need to make it a uuid. Just read cryptographic randomness -- 32 bits ought to do it for honest nodes. malicious nodes can choose the value arbitrarily-- so must protect against that. Send it on connection setup. (should not be in every bitswap message as that will waste O(N) msgs. |
Yeah as part of the identify handshake |
no, should be in bitswap itself-- the point is that bitswap may not see a connection reset, because a (disconnect, then connect) may happen without one side noticing, or a peer may have 2 conns open. this is basically a (unidirectional) bitswap "session id" |
But don't we always do an identify handshake right when opening a connection to a peer? Our local identify listener would notice the changed remote instance id and notify all other local parts that are interested. I'm pretty sure we'll run into similar data sync situations with other (future) protocols.
I'm not sure we can solve this if we keep coupling bitswap to the concept of a "connection". Ideally a stream would be independent of the underlying connection, there's mechanisms which allow spreading one stream over multiple connections (e.g. multipath tcp), or to have a stream roam between connections (e.g. homenet), and we'll just create ourselves more edge cases. |
I'm of the opinion it should be a node-global thing and not specific to bitswap. The whole idea is that this ID allows bitswap to tell that a hard disconnect/connect (node reset) has happened, so claiming that "bitswap may not see a connection reset" is missing the point. |
This is mostly correct, but unfortunately the 'New connection' notification gets sent out before the identify handshake finishes, since its done as an 'after the fact' type thing... This makes it a little difficult, bitswap will have to use some sort of 'wait for identify to finish' thing. |
yeah this: "bitswap would e.g. register a handler/notifier with the identify protocol that triggers when the instance id changes" :) I figure moving swarm protocols away from the new-connection notification is generally a good direction. |
So, a bit of an update here. The issue that prompted this issue was me trying to sync close to four terabytes of data from nihal to my home server. The process keeps getting stuck, with lots of entries in my wantlist, and with WAYYYY more entries in nihals image of my wantlist. Something is going on here where we're not getting a solid accounting of correct wantlists, or cancels arent getting through, or something. I'm hoping to get a new version of go-ipfs on nihal that has a small bugfix that should help a little bit with this problem. |
This issue has been resolved, The 'full' flag wasnt being sent on outgoing bitswap wantlist messages. meaning we never actually reset wantlists. |
I believe we have protocol bug in bitswap related to keeping wantlists in sync:
My current proposal to fixing this issue is to record the exact connection used for the last successful wantlist update, and if upon a subsequent send, we are using a different connection, send a full update.
The text was updated successfully, but these errors were encountered: