Bitswap wantlist consistency #3398

whyrusleeping · 2016-11-19T00:37:11Z

I believe we have protocol bug in bitswap related to keeping wantlists in sync:

Peer A and B are connected to eachother and are sending and receiving blocks from eachother.
Peer B restarts their daemon quickly, resetting their knowledge of Peer A's wantlist.
Peer A receives the new connection from B before they realize the initial connection has dropped
Peer A was sending a wantlist update to peer B during their restart, this send fails, but simply try to open a new stream to the peer, since we didnt have a 'full disconnect'.
Peer A assumes that Peer B still has all previous knowledge of their wantlist and does not send a full update, resulting in a hang, as Peer B will not send any blocks to A that are on the missing wantlist.

My current proposal to fixing this issue is to record the exact connection used for the last successful wantlist update, and if upon a subsequent send, we are using a different connection, send a full update.

whyrusleeping · 2016-11-19T00:39:18Z

Note: Its worth thinking about connection closing mechanics here, fixing this correctly will make connection closing much simpler.

ghost · 2016-11-19T03:11:18Z

My current proposal to fixing this issue is to record the exact connection used for the last successful wantlist update, and if upon a subsequent send, we are using a different connection, send a full update.

Another idea: use something other than connections for triggering this condition. This will never stop giving if we don't manage to abstract connections away properly.

we could send an "instance id" with identify
if the instance id changes, that node has restarted
and other nodes should assume that shared state got lost on the other end
instance id would just be an opaque byte sequence
that way we don't couple it to connect/disconnect, which is hard to detect and do right
and connect/disconnect is also a concept that we won't neccessarily have in the same way with packet transports
whyrusleeping Kubuxu jbenet ^
the assumption that shared state got lost on the other end would probably be on a per-protocol basis
bitswap could say that e.g. provider records MUST survive a restart, while wantlists MAY survive a restart
bitswap would e.g. register a handler/notifier with the identify protocol that triggers when the instance id changes
instance id of the remote peer, that is

whyrusleeping · 2016-11-19T05:06:37Z

@lgierth So would my node generate an instance ID on daemon startup and use that to send to everyone in identify? Or would each node just pick a random number on connection, chosen by the local node (meaning you don't have to send anything)?

Kubuxu · 2016-11-19T12:28:11Z

I think UUID is quite a good way to solve it, I don't know if it would be on node level or bitswap level.

ghost · 2016-11-19T13:06:13Z

So would my node generate an instance ID on daemon startup and use that to send to everyone in identify? Or would each node just pick a random number on connection, chosen by the local node (meaning you don't have to send anything)?

Yep the former. A node would generate the instance id on startup and keep it as long as it runs, i.e. per-process-and-peerid, not per-connection. If a node reconnects to you, it'll do a new identify handshake and thus send its new instance id, which triggers the condition.

I think UUID is quite a good way to solve it, I don't know if it would be on node level or bitswap lever.

Yeah UUID would work as a generator. I can imagine a similar data sync situation with other (future) protocols, that's why I was thinking to do generically in identify.

ghost · 2016-11-19T13:08:25Z

it can still be abused though. i can just keep rerolling to force you to spend a ton of BW

We can have some cooldown-ish interval for that, similar to flood protection

jbenet · 2016-11-19T13:42:30Z

No need to make it a uuid. Just read cryptographic randomness -- 32 bits ought to do it for honest nodes. malicious nodes can choose the value arbitrarily-- so must protect against that. Send it on connection setup. (should not be in every bitswap message as that will waste O(N) msgs.

ghost · 2016-11-19T13:46:37Z

Send it on connection setup. (should not be in every bitswap message as that will waste O(N) msgs.

Yeah as part of the identify handshake

jbenet · 2016-11-19T13:48:30Z

no, should be in bitswap itself-- the point is that bitswap may not see a connection reset, because a (disconnect, then connect) may happen without one side noticing, or a peer may have 2 conns open.

this is basically a (unidirectional) bitswap "session id"

ghost · 2016-11-19T13:55:40Z

But don't we always do an identify handshake right when opening a connection to a peer? Our local identify listener would notice the changed remote instance id and notify all other local parts that are interested. I'm pretty sure we'll run into similar data sync situations with other (future) protocols.

Send it on connection setup. (should not be in every bitswap message as that will waste O(N) msgs. [...] the point is that bitswap may not see a connection reset, because a (disconnect, then connect) may happen without one side noticing, or a peer may have 2 conns open.

I'm not sure we can solve this if we keep coupling bitswap to the concept of a "connection". Ideally a stream would be independent of the underlying connection, there's mechanisms which allow spreading one stream over multiple connections (e.g. multipath tcp), or to have a stream roam between connections (e.g. homenet), and we'll just create ourselves more edge cases.

whyrusleeping · 2016-11-19T22:41:49Z

I'm of the opinion it should be a node-global thing and not specific to bitswap. The whole idea is that this ID allows bitswap to tell that a hard disconnect/connect (node reset) has happened, so claiming that "bitswap may not see a connection reset" is missing the point.

whyrusleeping · 2016-11-19T22:53:15Z

@lgierth

But don't we always do an identify handshake right when opening a connection to a peer?

This is mostly correct, but unfortunately the 'New connection' notification gets sent out before the identify handshake finishes, since its done as an 'after the fact' type thing... This makes it a little difficult, bitswap will have to use some sort of 'wait for identify to finish' thing.

ghost · 2016-11-20T03:13:47Z

bitswap will have to use some sort of 'wait for identify to finish' thing.

yeah this: "bitswap would e.g. register a handler/notifier with the identify protocol that triggers when the instance id changes" :)

I figure moving swarm protocols away from the new-connection notification is generally a good direction.

whyrusleeping · 2016-11-28T21:40:01Z

So, a bit of an update here.

The issue that prompted this issue was me trying to sync close to four terabytes of data from nihal to my home server. The process keeps getting stuck, with lots of entries in my wantlist, and with WAYYYY more entries in nihals image of my wantlist. Something is going on here where we're not getting a solid accounting of correct wantlists, or cancels arent getting through, or something. I'm hoping to get a new version of go-ipfs on nihal that has a small bugfix that should help a little bit with this problem.

whyrusleeping · 2017-09-02T22:59:33Z

This issue has been resolved, The 'full' flag wasnt being sent on outgoing bitswap wantlist messages. meaning we never actually reset wantlists.

whyrusleeping added topic/bitswap Topic bitswap kind/bug A bug in existing code (including security flaws) need/community-input Needs input from the wider community labels Nov 19, 2016

whyrusleeping mentioned this issue Nov 22, 2016

cleanup bitswap and handle message send failure slightly better #3408

Merged

whyrusleeping added the status/in-progress In progress label Nov 28, 2016

whyrusleeping closed this as completed Sep 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bitswap wantlist consistency #3398

Bitswap wantlist consistency #3398

whyrusleeping commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

ghost commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

Kubuxu commented Nov 19, 2016 •

edited

Loading

ghost commented Nov 19, 2016

ghost commented Nov 19, 2016

jbenet commented Nov 19, 2016

ghost commented Nov 19, 2016

jbenet commented Nov 19, 2016 •

edited

Loading

ghost commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

ghost commented Nov 20, 2016

whyrusleeping commented Nov 28, 2016

whyrusleeping commented Sep 2, 2017

Bitswap wantlist consistency #3398

Bitswap wantlist consistency #3398

Comments

whyrusleeping commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

ghost commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

Kubuxu commented Nov 19, 2016 • edited Loading

ghost commented Nov 19, 2016

ghost commented Nov 19, 2016

jbenet commented Nov 19, 2016

ghost commented Nov 19, 2016

jbenet commented Nov 19, 2016 • edited Loading

ghost commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

whyrusleeping commented Nov 19, 2016

ghost commented Nov 20, 2016

whyrusleeping commented Nov 28, 2016

whyrusleeping commented Sep 2, 2017

Kubuxu commented Nov 19, 2016 •

edited

Loading

jbenet commented Nov 19, 2016 •

edited

Loading