Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use peermanager scores for blocksync peers and don't error out on block mismatch #162

Merged
merged 21 commits into from
Oct 31, 2023

Conversation

philipsu522
Copy link
Contributor

@philipsu522 philipsu522 commented Oct 6, 2023

Describe your changes and provide context

Import peermanager scores to blocksync reactor and use that to determine which peers to pick for blocksync. We also don't error out on a mismatch of block height from peer who sent it

Testing performed to validate your change

  • Added unit tests
  • Deploy to autobake, stopped a validator and restarted - verified catch up still works
  • Tested on standalone rpc in pacific-1

@philipsu522 philipsu522 requested a review from yzang2019 October 11, 2023 20:23
@@ -4,7 +4,9 @@ import (
"context"
"errors"
"fmt"
"github.com/tendermint/tendermint/internal/p2p"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: fix lint issue

for _, peer := range pool.peers {
// Generate a sorted list
sortedPeers := pool.getSortedPeers(pool.peers)
fmt.Printf("PSUDEBUG - block sync with sorted peers: %v\n", sortedPeers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we remove this debug log?

fmt.Printf("PSUDEBUG - block sync with sorted peers: %v\n", sortedPeers)
for _, nodeId := range sortedPeers {
peer := pool.peers[nodeId]
pool.peerManager.Score(peer.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we get the score out here without using it?

// Generate a sorted list
sortedPeers := pool.getSortedPeers(pool.peers)
fmt.Printf("PSUDEBUG - block sync with sorted peers: %v\n", sortedPeers)
for _, nodeId := range sortedPeers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few further optimizations we can do:

  1. We should probably avoid using the low score peers, for example the ones with score 0
  2. We should do some random shuffling so that we don't always targeting the same few top peers

@@ -47,7 +50,7 @@ const (
maxDiffBetweenCurrentAndReceivedBlockHeight = 100
)

var peerTimeout = 15 * time.Second // not const so we can override with tests
var peerTimeout = 3 * time.Second // not const so we can override with tests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting 15s for timeout is too long given we expect blocks much more frequently

@philipsu522 philipsu522 enabled auto-merge (squash) October 23, 2023 16:48
@philipsu522 philipsu522 changed the title Use peermanager scores for blocksync peers Use peermanager scores for blocksync peers and don't error out on block mismatch Oct 23, 2023
@@ -315,7 +320,9 @@ func (pool *BlockPool) AddBlock(peerID types.NodeID, block *types.Block, extComm
}
} else {
err := errors.New("requester is different or block already exists")
pool.sendError(err, peerID)
// Original behavior is to error out when there is a mismatch, which shuts down the entire reactor.
Copy link
Contributor

@yzang2019 yzang2019 Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, one question: do we understand why the original behavior wants to error this out and shutdown the entire reactor? Is there going to be any side effect if we change this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's revert this now, and when the standalone rpc falls behind again (currently it is healthly, and was upgraded to the new version), we can re-apply this patch and see if it helps

@philipsu522 philipsu522 disabled auto-merge October 31, 2023 21:46
@philipsu522 philipsu522 merged commit dd79101 into main Oct 31, 2023
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants