fix: race condition when requesting the same block twice #214

achingbrain · 2020-03-04T19:08:07Z

When we call blockstore.putMany, some implementations will batch up all the puts and write them at once. This means that blockstore.has might not return true for a little while - if another request for a given block comes in before blockstore.has returns true it'll get added to the want list. If the block then finishes it's batch and finally a remote peer supplies the wanted block, the notifications that complete the second block request will never get sent and the process will hang indefinitely.

The change made here is to separate the sending of notifications out from putting things into the blockstore. If the blockstore has a block, but the block is still in the wantlist, send notifications that we now have the block.

achingbrain · 2020-03-04T19:11:03Z

Fixes ipfs/js-ipfs#2814

dirkmc · 2020-03-04T20:29:31Z

src/index.js

-        self.network.provide(block.cid).catch((err) => {
-          self._log.error('Failed to provide: %s', err.message)
-        })
+        self._sendHaveBlockNotifications(block)


It's my understanding that the async function asyncFn passed to blockstore.putMany(asyncFn) supplies the blocks to be put - but doesn't make guarantee about when those blocks will be commited to the blockstore.

So I think it may make more sense to send notifications after blockstore.putMany() has completed.

I thought similar, but you'd have to iterate over the list of blocks twice and they may not be available to the source any more (e.g. it may not be an array).

const iter = function * () { yield 'cid' } function main (source) { for (const cid of source) { console.info(cid) } for (const cid of source) { console.info(cid) } } main(iter()) // prints 'cid' once, not twice

Either that or you store the blocks in a list for processing after putMany, which may exhaust available memory, or you store just the CIDs and retrieve the blocks after the putMany, though they may not be available and the list may be unreasonably long, etc..

The solution I am using in the new message types PR is to store them in an array:
https://github.com/ipfs/js-ipfs-bitswap/pull/211/files/f2b6be5ffbae25bdd911496a356b02b4490714ce#diff-1fdf421c05c1140f6d71444ea2b27638R285

I don't think it will use much more memory, because it's just a list of pointers, right?

It's going to be a factor of the size of the added blocks - if you add a 10GB .iso file to IPFS, sticking them all in an array before sending notifications will cause the process to consume 10GB of memory.

If instead each block is processed one at a time (or in small batches) from reading the file to writing it out to disk it'll consume a lot less.

This isn't the only problem site, obviously - if our final blockstore batch size is unbounded it'll crash there too.

Also, I’m not sure bitswap is the right place to solve this problem. The blockstore has created it by making the behaviour of putMany non-obvious (to me at least), it should make the caller’s life a bit easier by limiting batch sizes, caching things that have been batched up, etc, though this is way out of scope for this PR.

IIUC the blocks are a parameter to bitswap.putMany(blocks), so they're not going to get cleaned up until the function returns. If we create an array of pointers to the blocks, that will use a negligible amount of memory compared to the blocks themselves.

I agree that the blockstore is probably a better place to tackle memory issues.

Just to clarify, when we call yield on line 292 I don't believe that means that the block has been committed to the blockstore. That's why I'm suggesting we only send out notifications after blockstore.putMany() completes.

bitswap.putMany(blocks) takes an iterable as an argument, which might be an array in which case yes, they are all in memory at once so storing references to them in an array does not hurt, but it also might be a readable stream or a generator in which case they are pulled one block at a time. It's for this case that we do not want to stick them all in an array.

Ahh I see you're right blocks is an async iterable.

For correctness I believe we need to send notifications after the block has been committed, ie at the end of blockstore.putMany().

In order to be able to send notifications we could check the type of blocks and then

if it's an array: store a list of refs to each block that was put

if it's an async iterable: store a list of CIDs then call datastore.getBlocks() for notifications. Probably most datastore implementations will have some caching that will make this perform reasonably well

Looking back on this blockstore.putMany() conflates batching with streaming which I think is a mistake.

The intention (as I see it) of putMany is to allow streaming blocks into the blockstore, but it can actually end up batching those blocks, worst case storing them until the stream ends - and in our case forcing the caller to store something to do with the whole stream (e.g. the list of CIDs to notify of)

I think we should be explicit in our interfaces and remove the batching semantic from blockstore.putMany() and instead expose a blockstore.putBatch() or similar method for when that behaviour is desirable.

The batching is done at the repo level, I'm more than happy to pull that out.

It's been pulled out. ipfs-repo@3.x.x .putMany() now streams to the datastore instead of using batches.

src/index.js

dirkmc · 2020-03-05T22:24:18Z

@achingbrain what do you think about this approach? #216

I still need to fix the test but want to get your thoughts first

When we call `blockstore.putMany`, some implementations will batch up all the `put`s and write them at once. This means that `blockstore.has` might not return `true` for a little while - if another request for a given block comes in before `blockstore.has` returns `true` it'll get added to the want list. If the block then finishes it's batch and finally a remote peer supplies the wanted block, the notifications that complete the second block request will never get sent and the process will hang idefinately. The change made here is to separate the sending of notifications out from putting things into the blockstore. If the blockstore has a block, but the block is still in the wantlist, send notifications that we now have the block.

We weren't using 1.2.0 in the protocol negotiation. Now we do.

achingbrain · 2020-05-20T14:44:17Z

Further improvements in this branch:

Upgrade to use the streaming API from interface-datastore
- Does not assume that only arrays of CIDs are being passed any more, only uses the AsyncIterable interface contract to access data
Actually dial remote nodes with bitswap 1.2.0
~~Stop using store.has as it'll be removed in the next interface-datastore release~~

hugomrdias

LGTM but i would keep .has and would try to find another way to surface the issues that come with relying in the .has output without a transaction

- Key notifications by CID multihashes so one block can service multiple wants if they are for the same data but requested with difference CIDs - Pass in an AbortSignal to notifications and that tears down only the listeners set up for that invocation and rejects the current promise only when aborted - notifications.unwant will now reject all outstanding promises for that CID A follow up commit will handle removing things from the want list.

hugomrdias

LGTM great work!!

achingbrain · 2020-05-26T13:51:00Z

@dirkmc can this be merged & released?

vasco-santos · 2020-05-26T17:50:24Z

test/network/network.node.js

+    networkA.start()
+    networkB.start()
+
+    // FIXME: have to already be connected as sendMessage only accepts a peer id, not a PeerInfo


This gets fixed in #217

The peer address just needs to be added to the AddressBook before the operation

Could you create an issue to track this please?

I already updated #217 to add the peer multiaddrs to the AddressBook before the sendMessage and removed the dial operation from the test.

We are removing the PeerInfo from libp2p@0.28. multiaddrs for the peer should be in the AddressBook before any attempt to dial using a PeerId. Considering this case, the nodes needed to "discover" themselves, in order to send a message. The "discover" here is adding the multiaddrs to the AddressBook and then use the sendMessage with the PeerId.

So, the goal here is to use the new libp2p and not to accept PeerInfo in the sendMessage. I can create an issue to track this, but I am considering this as part of updating libp2p in bitswap.

it('dials to peer using Bitswap 1.2.0', async () => { networkA = new Network(p2pA, bitswapMockA) // only supports 1.2.0 networkB = new Network(p2pB, bitswapMockB) networkB.protocols = ['/ipfs/bitswap/1.2.0'] networkA.start() networkB.start() const deferred = pDefer() bitswapMockB._receiveMessage = () => { deferred.resolve() } p2pA.peerStore.addressBook.set(p2pB.peerId, p2pB.multiaddrs) await networkA.sendMessage(p2pB.peerId, new Message(true)) return deferred })

dirkmc

Some good cleanup in here as well, thanks! 👍

src/index.js

Co-authored-by: dirkmc <dirkmdev@gmail.com>

achingbrain requested review from dirkmc, hugomrdias and jacobheun March 4, 2020 19:08

dirkmc reviewed Mar 4, 2020

View reviewed changes

src/index.js Outdated Show resolved Hide resolved

zachferland mentioned this pull request Mar 13, 2020

Can't reopen the same box with identity wallet if local ipfs has been cleared 3box/3box#941

Closed

achingbrain force-pushed the fix/race-condition-when-requesting-the-same-block branch from 2d8c892 to 57185ea Compare May 4, 2020 16:24

achingbrain force-pushed the fix/race-condition-when-requesting-the-same-block branch from 57185ea to 1fc09ed Compare May 19, 2020 09:18

achingbrain added 2 commits May 20, 2020 14:56

chore: upgrade to datastore streaming api

fa2310e

fix: actually really actually use bitswap 1.2.0

53c0790

We weren't using 1.2.0 in the protocol negotiation. Now we do.

hugomrdias approved these changes May 21, 2020

View reviewed changes

achingbrain added 6 commits May 21, 2020 11:33

chore: restore .has for stat generation

7b331ce

chore: handle other context unwanting the block we want

9b2336f

fix: give listeners the block with the right cid

1874dc8

feat: allow passing abort signal in to get and getMany

d5d81fc

chore: avoid artificially induced async

bd676f0

hugomrdias approved these changes May 26, 2020

View reviewed changes

vasco-santos reviewed May 26, 2020

View reviewed changes

vasco-santos mentioned this pull request May 26, 2020

feat: use libp2p 0.28.x #217

Merged

5 tasks

achingbrain linked an issue May 26, 2020 that may be closed by this pull request

Improve Wantlist handling like go did #26

Closed

dirkmc approved these changes May 27, 2020

View reviewed changes

src/index.js Outdated Show resolved Hide resolved

achingbrain and others added 3 commits May 27, 2020 15:46

chore: update src/index.js

ac1e2a8

Co-authored-by: dirkmc <dirkmdev@gmail.com>

chore: fix typo, remove redundant eslint ignores

f88c55e

fix: let the blockstore ignore duplicate puts

d17c787

achingbrain merged commit 78ce032 into master May 27, 2020

achingbrain deleted the fix/race-condition-when-requesting-the-same-block branch May 27, 2020 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: race condition when requesting the same block twice #214

fix: race condition when requesting the same block twice #214

achingbrain commented Mar 4, 2020

achingbrain commented Mar 4, 2020

dirkmc Mar 4, 2020

achingbrain Mar 4, 2020 •

edited

Loading

dirkmc Mar 4, 2020

achingbrain Mar 4, 2020

achingbrain Mar 4, 2020

dirkmc Mar 4, 2020

achingbrain Mar 5, 2020

dirkmc Mar 5, 2020 •

edited

Loading

achingbrain May 4, 2020

achingbrain May 20, 2020

dirkmc commented Mar 5, 2020

achingbrain commented May 20, 2020 •

edited

Loading

hugomrdias left a comment

hugomrdias left a comment

achingbrain commented May 26, 2020

vasco-santos May 26, 2020 •

edited

Loading

achingbrain May 26, 2020

vasco-santos May 27, 2020

dirkmc left a comment

fix: race condition when requesting the same block twice #214

fix: race condition when requesting the same block twice #214

Conversation

achingbrain commented Mar 4, 2020

achingbrain commented Mar 4, 2020

Choose a reason for hiding this comment

achingbrain Mar 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkmc Mar 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkmc commented Mar 5, 2020

achingbrain commented May 20, 2020 • edited Loading

hugomrdias left a comment

Choose a reason for hiding this comment

hugomrdias left a comment

Choose a reason for hiding this comment

achingbrain commented May 26, 2020

vasco-santos May 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkmc left a comment

Choose a reason for hiding this comment

achingbrain Mar 4, 2020 •

edited

Loading

dirkmc Mar 5, 2020 •

edited

Loading

achingbrain commented May 20, 2020 •

edited

Loading

vasco-santos May 26, 2020 •

edited

Loading