This repository has been archived by the owner on Dec 4, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 535
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dbrajovic
requested review from
lazartravica,
Vuksan and
zivkovicmilos
as code owners
November 9, 2021 12:11
This was
linked to
issues
Nov 9, 2021
This was referenced Nov 9, 2021
lazartravica
suggested changes
Nov 9, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please add a test checking the difficulty broadcasted.
zivkovicmilos
approved these changes
Nov 10, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for figuring all of this out, it was definitely tricky 🙏
Looks great to me, please make sure to solve any failing tests 💯
brkomir
approved these changes
Nov 10, 2021
lazartravica
approved these changes
Nov 15, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes to the tests. LGTM
8 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This fix provides the correct
bestTd
values from peers when a node is choosing the best candidate.Changes include
Checklist
Additional comments
Explanation:
Cause
Validator nodes always send the
blockNumber
as Difficulty (instead of TotalDifficulty) when building a new block. If this info is relayed to the non validator node before it resolves its best peer, the non validator will end up looping inside ofRunAcceptState()
(for quite some time), listening for new blocks, but not being able to write them.As a consequence, reproducing the issue is a bit random, as the non validator will only sometimes be able to re-sync with the chain.
In-depth
When a node's
Syncer
first starts it subscribes to the blockchain's stream of events (in the background) and handles incoming peer connections (disconnections). While the event stream is something mostly continuous, the peer conns/deconns are rather sporadic in nature. Nonetheless, both flows access the data structures crucial for theSyncer
to work as intended.In our particular case scenario (step 4.), when a node first connects to the network it starts to receive these initial handshakes with other peers, each containing the difficulty of the chain (totalDifficulty) that peer is seeing. Example (with custom prints):
In the example above, the 4 validator nodes are up-and-running, while the last 2 non validators got stuck previously. The node being restarted is a non validator which also got stuck on
diff: 3742
. Let's see what it does next:What happened?
Right after connecting to its peers, there was a window of opportunity for a race to happen. If before reaching the call to bestPeer() a new block was broadcasted from the validator set, the end result of that broadcast will update the difficulty the node has previously registered for that peer (during the initial handshake). The origin of the "lower" value comes from the fact that the validator, instead of sending total difficulty of the chain, actually sent the last block's difficulty (which in IBFT is, at the time of writing this, equal to the block's number).
At the time of discovery, the validator nodes were the only ones broadcasting blocks to the network, which is why their difficulty is growing in the next call to
bestPeer()
(as opposed to the stuck non validators).Regardless, the node rejects these peers as candidates (believing they have yet to sync up with the chain) and repeats the process over again.
Fortunately, all it takes to solve this bug is store the chain's difficulty (total difficulty) in the
Difficulty
field ofStatus
instead of the last block's difficulty.