-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bitswap): wantlist overflow handling #629
Conversation
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #629 +/- ##
==========================================
+ Coverage 59.82% 59.92% +0.10%
==========================================
Files 238 238
Lines 30004 30049 +45
==========================================
+ Hits 17949 18008 +59
+ Misses 10437 10425 -12
+ Partials 1618 1616 -2
|
This option will need a minor change |
wantlist overflow handling now cancels existing entries to make room for newer requests. This fix prevents the wantlist from filling up with CIDs that the server does not have. Fixes #527
713faee
to
9c35f18
Compare
Handle incoming wants that could not be addded to the peer ledger without exceeding the peer want limit. These are handled by trying to make room for them by canceling existing wants for which there is no block. If this does not make sufficient room, then any lower priority wants that have blocks are canceled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing on this 🙏
// Do not take more wants that can be handled. | ||
if len(wants) > int(e.maxQueuedWantlistEntriesPerPeer) { | ||
// Keep the highest priority wants. | ||
slices.SortFunc(wants, func(a, b bsmsg.Entry) int { | ||
return cmp.Compare(b.Entry.Priority, a.Entry.Priority) | ||
}) | ||
wants = wants[:int(e.maxQueuedWantlistEntriesPerPeer)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be fine, however do we want to do this here given that there could be duplicates once we actually look at the wantlist?
- Pro: We shed ram usage earlier
- Con: There might be a better set of wants that we can honor if we wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to truncate the list, but sorting can be avoided.
Truncation makes sense:
The incoming wants are already unique, so even if there are no existing wants, or all possible wants have duplicates, there is still no way any more than the limit can be used. If all those not added to the message queue are included in the overflow then they will end up getting dropped anyway because there will not be enough existing wants available to replace. Dropping them early does two things:
- Prevents trying to add excessive number of wants to peer ledger.
- Prevents a large number of wants from being sorted and incurring block size lookup in
handleOverflow
.
Truncation without sorting:
This will potentially lose higher priority wants, but avoids sorting an incoming wantlist of unknown size. In the usual case sorting is not needed anyway, so not sorting will avoid a performance hit.
There might be a better set of wants that we can honor if we wait
True, but then it is necessary to examine all the wants (a possibly excessive amount) to see if they can be added directly to the message queue, or need to be handled as overflow.
for _, entry := range existingWants { | ||
queuedWantKs.Add(entry.Cid) | ||
} | ||
queuedBlockSizes, err := e.bsm.getBlockSizes(ctx, queuedWantKs.Keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get what the idea is here and if this is necessary / if we can make this much cheaper
- Is this meant as "is there a DONT_HAVE response queued up that we should replace".
- While I get this it might also be overkill, and it might be fine to respect the user priority in responding with DONT_HAVEs, HAVEs, and blocks in the same way.
- Is this meant as "I previously sent a DONT_HAVE and now this is sitting on my list as a subscription".
- As discussed this definitely seems like something we should want to knock off our list if out of space
In either case it seems like we could add some extra data to the in-memory structs here rather than going to the blockstore to see if we have the data (and being at the mercy of whatever caching, bloom filters, etc. are used there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that there is a DONT_HAVE message queued for the peer, but it has not been sent yet and is blocking new messages from being queued for the peer. So, cancel the unsent DONT_HAVE and try to enqueue something possibly more important. Either a delayed HAVE message will replace the pending DONT_HAVE, or the peer can ask again later. This should keep messages moving, even if there is some backup sending DONT_HAVE messages to peers.
This also handles the case where a DONT_HAVE message has been sent, but is not removed from the queue. Once a message is sent, the want is removed from the message queue and peer ledger only when blocks have been sent or when block presence has been sent. If a DONT_HAVE was sent the want remains on the queue and peer ledger as a place-holder should a block arrive later, and this is stopping new wants from being accepted. This is what the 5th bullet in #527 is referring to by:
This is because the bitswap server never cleanup entries after sending DONT_HAVE
So, in short, it handles both cases.
it seems like we could add some extra data to the in-memory structs here rather than going to the blockstore.
Yes, the wants for which block is found can be recorded in the peer ledger so that these can be ignored in overflow handling. However, that would need to be done in every call to engine.MessageReceived
, and seems less preferable than doing something more expensive only during the exceptional case.
The task queue does already have this info, but this would require locking the tasqueue and the peer tracker for each overflow want CID to look at. Or, this would require a new taskqueue API to get a list of wants with HaveBlock
set to true
for a given peer. This last option might be less expensive than looking at the blockstore, but I was not comfortable with that amount of new plumbing for handling this bitswap exceptional case. WDYT?
return wants | ||
} | ||
|
||
// Remove entries for blocks that are not present to make room for overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to above is this about blocks that aren't present or subscriptions?
Note: the reason I'm pushing on the difference is that from my perspective subscriptions are much more expensive by virtue of occupying memory for an indefinite amount of time rather than a transient "while I'm sending out a response". Not sure if that's enough to justify different lists, but it's how I'm thinking in my review here (but lmk if you disagree or think I'm missing the point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the engine perspective, I do not think there is any need for distinction between subscription and request-response since that I think only determines how long a peer is in the task queue/ledger.
Overall, it probably does make more sense to only do this overflow handling for subscriptions. I was thinking/hoping this would handle itself by subscriptions being the ones primarily affected in the first place and needing to do overflow handling. I think some real-world use is necessary to determine this. I will add logging that can be used to determine when overflow handling is happening.
// Not enough dont-haves removed. Replace existing entries, that are a | ||
// lower priority, with overflow entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice I expect this never to happen given that IIUC the boxo client (which is the most widely used one) just decreases the priority over time
mq.priority-- |
Given that this is the case it'd be good to:
- Make this shortcut fairly cheaply when looking at the overflow + existing lists (which have already been sorted earlier) we see that the lowest priority in the existingwants is higher than the highest in the overflow list (doesn't have to be an explicit check, but anything that allows this to be pretty cheap rather than linear)
- Have it still work (and be tested) when people choose different priorities
- Not being particularly expensive even in pathological cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it probably will be unlikely to happen given that the client decreases priority over time. The thinking here was looking more toward a future where priority may be set by path distance, where items closer to a DAG root have a higher priority. In that case it seems more likely that as new wants are requested that some do have a high priority because they are root or closer to root items.
- Revised logic to break out as soon as lowest usable priority is hit on incoming wantlist (now ordered from most to least important).
- Still works.
- In the normal case, most of the work was already done in sorting the lists, so this compares list items until the priority in the ascending-sorted list is higher than the descending-sorted list. The pathological case is where all new incoming wants are at a higher priority. This can be detected, but there is not a better way to handle it since canceling the individual overflows need to be done. Clearing the peer's wantlist does the same thing, just for all CIDs, so that is not better.
// Sort wl and overflow from least to most important. | ||
slices.SortFunc(existingWants, func(a, b wl.Entry) int { | ||
return cmp.Compare(a.Priority, b.Priority) | ||
}) | ||
slices.SortFunc(overflow, func(a, b bsmsg.Entry) int { | ||
return cmp.Compare(a.Entry.Priority, b.Entry.Priority) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps premature optimization (depends on how large people set their limits to and how frequently they're hit). Seems like we could spend a lot of time sorting here.
For the overflow list this is probably fine (shouldn't be that big anyway and it's related to message size), but if nodes spend significant time near the limit they'll be doing:
- Copy the wantlist map into a list
- Sort the list
For basically every message that comes in.
Can the PeerRequestQueue help us out here since it's already storing a prioritized queue of what needs to be done? It might not due to how the locking/concurrency works but could save a bunch of pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PeerRequestQueue cannot really help out, since that maintains a heap so getting an ordered list would require iterating the heap. I do not see a much less expensive alternative, other than just clearing the peer's want list at some point, like if overflow is happening too frequently.
Maybe if overflow happens 5 times in a row for a particular peer, then clear that peer's message queue? WDYT?
switching early to ipfs/boxo#629 to see if ci passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smoke tested in ipfs/kubo#10462 and lgtm.
Release / test plan: I'll apply cosmetics below and tag boxo release on Monday and bubble up to kubo 0.30.0-rc1 (ipfs/kubo#10436), then deploy rc1 to collab cluster and see if it impacts performance of its nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Handle incoming wants that can not be added to the peer ledger without exceeding the peer want limit. These are handled by trying to make room for them by canceling existing wants for which there is no block. If this does not make sufficient room, then any lower priority wants that have blocks are canceled. This fix prevents the wantlist from filling up with CIDs that the server does not have.
Priority is also considered when truncating a wantlist that exceeds the size limit. Considering priority in wantlist truncation and overflow handling ensures that higher priority tasks handled when wantlist size needs to be limited. This will be more important in the future if/when priority is determined by a block's dag path (higher priority when closer to root).
Fixes #527