Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bitswap): wantlist overflow handling #629
fix(bitswap): wantlist overflow handling #629
Changes from all commits
a06156c
9c35f18
eaf6667
25e948e
3f54359
48a42c8
7c11036
994bd46
50421b4
ffd64ac
ece31b4
8664e38
0d1f8e1
b016327
2faf741
9c2759d
d24abc7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get what the idea is here and if this is necessary / if we can make this much cheaper
In either case it seems like we could add some extra data to the in-memory structs here rather than going to the blockstore to see if we have the data (and being at the mercy of whatever caching, bloom filters, etc. are used there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that there is a DONT_HAVE message queued for the peer, but it has not been sent yet and is blocking new messages from being queued for the peer. So, cancel the unsent DONT_HAVE and try to enqueue something possibly more important. Either a delayed HAVE message will replace the pending DONT_HAVE, or the peer can ask again later. This should keep messages moving, even if there is some backup sending DONT_HAVE messages to peers.
This also handles the case where a DONT_HAVE message has been sent, but is not removed from the queue. Once a message is sent, the want is removed from the message queue and peer ledger only when blocks have been sent or when block presence has been sent. If a DONT_HAVE was sent the want remains on the queue and peer ledger as a place-holder should a block arrive later, and this is stopping new wants from being accepted. This is what the 5th bullet in #527 is referring to by:
So, in short, it handles both cases.
Yes, the wants for which block is found can be recorded in the peer ledger so that these can be ignored in overflow handling. However, that would need to be done in every call to
engine.MessageReceived
, and seems less preferable than doing something more expensive only during the exceptional case.The task queue does already have this info, but this would require locking the tasqueue and the peer tracker for each overflow want CID to look at. Or, this would require a new taskqueue API to get a list of wants with
HaveBlock
set totrue
for a given peer. This last option might be less expensive than looking at the blockstore, but I was not comfortable with that amount of new plumbing for handling this bitswap exceptional case. WDYT?Check warning on line 844 in bitswap/server/internal/decision/engine.go
Codecov / codecov/patch
bitswap/server/internal/decision/engine.go#L842-L844
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to above is this about blocks that aren't present or subscriptions?
Note: the reason I'm pushing on the difference is that from my perspective subscriptions are much more expensive by virtue of occupying memory for an indefinite amount of time rather than a transient "while I'm sending out a response". Not sure if that's enough to justify different lists, but it's how I'm thinking in my review here (but lmk if you disagree or think I'm missing the point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the engine perspective, I do not think there is any need for distinction between subscription and request-response since that I think only determines how long a peer is in the task queue/ledger.
Overall, it probably does make more sense to only do this overflow handling for subscriptions. I was thinking/hoping this would handle itself by subscriptions being the ones primarily affected in the first place and needing to do overflow handling. I think some real-world use is necessary to determine this. I will add logging that can be used to determine when overflow handling is happening.