Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitswap wantlist consistency #3398

Closed
whyrusleeping opened this issue Nov 19, 2016 · 15 comments
Closed

Bitswap wantlist consistency #3398

whyrusleeping opened this issue Nov 19, 2016 · 15 comments
Labels
kind/bug A bug in existing code (including security flaws) need/community-input Needs input from the wider community status/in-progress In progress topic/bitswap Topic bitswap

Comments

@whyrusleeping
Copy link
Member

I believe we have protocol bug in bitswap related to keeping wantlists in sync:

  • Peer A and B are connected to eachother and are sending and receiving blocks from eachother.
  • Peer B restarts their daemon quickly, resetting their knowledge of Peer A's wantlist.
  • Peer A receives the new connection from B before they realize the initial connection has dropped
  • Peer A was sending a wantlist update to peer B during their restart, this send fails, but simply try to open a new stream to the peer, since we didnt have a 'full disconnect'.
  • Peer A assumes that Peer B still has all previous knowledge of their wantlist and does not send a full update, resulting in a hang, as Peer B will not send any blocks to A that are on the missing wantlist.

My current proposal to fixing this issue is to record the exact connection used for the last successful wantlist update, and if upon a subsequent send, we are using a different connection, send a full update.

@whyrusleeping whyrusleeping added topic/bitswap Topic bitswap kind/bug A bug in existing code (including security flaws) need/community-input Needs input from the wider community labels Nov 19, 2016
@whyrusleeping
Copy link
Member Author

Note: Its worth thinking about connection closing mechanics here, fixing this correctly will make connection closing much simpler.

@ghost
Copy link

ghost commented Nov 19, 2016

My current proposal to fixing this issue is to record the exact connection used for the last successful wantlist update, and if upon a subsequent send, we are using a different connection, send a full update.

Another idea: use something other than connections for triggering this condition. This will never stop giving if we don't manage to abstract connections away properly.

we could send an "instance id" with identify
if the instance id changes, that node has restarted
and other nodes should assume that shared state got lost on the other end
instance id would just be an opaque byte sequence
that way we don't couple it to connect/disconnect, which is hard to detect and do right
and connect/disconnect is also a concept that we won't neccessarily have in the same way with packet transports
whyrusleeping Kubuxu jbenet ^
the assumption that shared state got lost on the other end would probably be on a per-protocol basis
bitswap could say that e.g. provider records MUST survive a restart, while wantlists MAY survive a restart
bitswap would e.g. register a handler/notifier with the identify protocol that triggers when the instance id changes
instance id of the remote peer, that is

@whyrusleeping
Copy link
Member Author

@lgierth So would my node generate an instance ID on daemon startup and use that to send to everyone in identify? Or would each node just pick a random number on connection, chosen by the local node (meaning you don't have to send anything)?

@Kubuxu
Copy link
Member

Kubuxu commented Nov 19, 2016

I think UUID is quite a good way to solve it, I don't know if it would be on node level or bitswap level.

@ghost
Copy link

ghost commented Nov 19, 2016

So would my node generate an instance ID on daemon startup and use that to send to everyone in identify? Or would each node just pick a random number on connection, chosen by the local node (meaning you don't have to send anything)?

Yep the former. A node would generate the instance id on startup and keep it as long as it runs, i.e. per-process-and-peerid, not per-connection. If a node reconnects to you, it'll do a new identify handshake and thus send its new instance id, which triggers the condition.

I think UUID is quite a good way to solve it, I don't know if it would be on node level or bitswap lever.

Yeah UUID would work as a generator. I can imagine a similar data sync situation with other (future) protocols, that's why I was thinking to do generically in identify.

@ghost
Copy link

ghost commented Nov 19, 2016

it can still be abused though. i can just keep rerolling to force you to spend a ton of BW

We can have some cooldown-ish interval for that, similar to flood protection

@jbenet
Copy link
Member

jbenet commented Nov 19, 2016

No need to make it a uuid. Just read cryptographic randomness -- 32 bits ought to do it for honest nodes. malicious nodes can choose the value arbitrarily-- so must protect against that. Send it on connection setup. (should not be in every bitswap message as that will waste O(N) msgs.

@ghost
Copy link

ghost commented Nov 19, 2016

Send it on connection setup. (should not be in every bitswap message as that will waste O(N) msgs.

Yeah as part of the identify handshake

@jbenet
Copy link
Member

jbenet commented Nov 19, 2016

no, should be in bitswap itself-- the point is that bitswap may not see a connection reset, because a (disconnect, then connect) may happen without one side noticing, or a peer may have 2 conns open.

this is basically a (unidirectional) bitswap "session id"

@ghost
Copy link

ghost commented Nov 19, 2016

But don't we always do an identify handshake right when opening a connection to a peer? Our local identify listener would notice the changed remote instance id and notify all other local parts that are interested. I'm pretty sure we'll run into similar data sync situations with other (future) protocols.

Send it on connection setup. (should not be in every bitswap message as that will waste O(N) msgs. [...] the point is that bitswap may not see a connection reset, because a (disconnect, then connect) may happen without one side noticing, or a peer may have 2 conns open.

I'm not sure we can solve this if we keep coupling bitswap to the concept of a "connection". Ideally a stream would be independent of the underlying connection, there's mechanisms which allow spreading one stream over multiple connections (e.g. multipath tcp), or to have a stream roam between connections (e.g. homenet), and we'll just create ourselves more edge cases.

@whyrusleeping
Copy link
Member Author

I'm of the opinion it should be a node-global thing and not specific to bitswap. The whole idea is that this ID allows bitswap to tell that a hard disconnect/connect (node reset) has happened, so claiming that "bitswap may not see a connection reset" is missing the point.

@whyrusleeping
Copy link
Member Author

@lgierth

But don't we always do an identify handshake right when opening a connection to a peer?

This is mostly correct, but unfortunately the 'New connection' notification gets sent out before the identify handshake finishes, since its done as an 'after the fact' type thing... This makes it a little difficult, bitswap will have to use some sort of 'wait for identify to finish' thing.

@ghost
Copy link

ghost commented Nov 20, 2016

bitswap will have to use some sort of 'wait for identify to finish' thing.

yeah this: "bitswap would e.g. register a handler/notifier with the identify protocol that triggers when the instance id changes" :)

I figure moving swarm protocols away from the new-connection notification is generally a good direction.

@whyrusleeping
Copy link
Member Author

So, a bit of an update here.

The issue that prompted this issue was me trying to sync close to four terabytes of data from nihal to my home server. The process keeps getting stuck, with lots of entries in my wantlist, and with WAYYYY more entries in nihals image of my wantlist. Something is going on here where we're not getting a solid accounting of correct wantlists, or cancels arent getting through, or something. I'm hoping to get a new version of go-ipfs on nihal that has a small bugfix that should help a little bit with this problem.

@whyrusleeping
Copy link
Member Author

This issue has been resolved, The 'full' flag wasnt being sent on outgoing bitswap wantlist messages. meaning we never actually reset wantlists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/community-input Needs input from the wider community status/in-progress In progress topic/bitswap Topic bitswap
Projects
None yet
Development

No branches or pull requests

3 participants