Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

fix: reduce receive contention #536

Merged
merged 1 commit into from
Oct 26, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions internal/decision/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -673,12 +673,18 @@ func (e *Engine) ReceiveFrom(from peer.ID, blks []blocks.Block) {
// Check each peer to see if it wants one of the blocks we received
var work bool
missingWants := make(map[peer.ID][]cid.Cid)
e.lock.RLock()
for _, b := range blks {
k := b.Cid()

for _, p := range e.peerLedger.Peers(k) {
e.lock.RLock()
peers := e.peerLedger.Peers(k)
e.lock.RUnlock()
Comment on lines +679 to +681
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming that the amount of time to grab the ledger maps for all the peers is long enough that it's worth locking for each peer rather than grabbing all the ledgers at once and chucking them into an array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm just assuming that repeatedly taking and releasing the lock isn't going to be too expensive.


for _, p := range peers {
e.lock.RLock()
ledger, ok := e.ledgerMap[p]
e.lock.RUnlock()
Comment on lines +684 to +686
Copy link
Contributor

@aschmahmann aschmahmann Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sanity check question. Does it matter if the operations in the unlocked portion end up happening on an invalid ledger (i.e. one that is either no longer in the map)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of. But... that's already an issue pretty much everywhere in the code so I'm ignoring it for now. Really, we need to be reference counting, but that's a larger change.

I say kind of because all this code is a bit racy (which is why we re-broadcast wants occasional).


if !ok {
// This can happen if the peer has disconnected while we're processing this list.
log.Debugw("failed to find peer in ledger", "peer", p)
Expand Down Expand Up @@ -718,7 +724,6 @@ func (e *Engine) ReceiveFrom(from peer.ID, blks []blocks.Block) {
e.updateMetrics()
}
}
e.lock.RUnlock()

// If we found missing wants (e.g., because the peer disconnected, we have some races here)
// remove them from the list. Unfortunately, we still have to re-check because the user
Expand Down