You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.
Block writing stops in non-sealing nodes due to nil BestPeer
Description
It looks like we are still observing the same issue as originally described in #167
After investigating, this seems to be related to some inconsistencies in how we determine the best peer to connect to within BestPeer(). We are seeing that at times, curDiff is always greater than bestTd, resulting in this function always returning nil.
We've adding some extra logging in BestPeer to help investigate:
Although the change 5170883 was put in (to broadcast the header's difficulty instead of the block number), it appears that we are actually initializing the header's Difficulty to be the same value as the header's Number, resulting in a broadcast of the same value:
Upon attempting to correct L353 above, we are still seeing the same issue as can be seen in our logs. Below the bestTd is always 464670654277. Furthermore, the difficulty outputted in the enqueue block debug statement has the same value as the block number:
In the above scenario, peer5 will always be our 'bestPeer', however its difficulty is always less than the current difficulty of 482664423817 so we will always return nil.
I would assume that if the Difficulty value for peer1-peer4 were sent correctly, we would connect to a better beer whose Difficulty would be greater than our current difficulty.
Your environment
OS and version Ubuntu 20
version of the Polygon SDK bba205e
branch that causes this issue develop
Steps to reproduce
start some non-sealing nodes connected to a cluster of 5 validators
observe that we can never connect to a peer who's difficulty is greater than the current block difficulty
restarting sometimes fixes the isssue
Expected behaviour
Gossip'd blocks should continue to be written to state and processed normally after bulk sync.
Actual behaviour
Although we still enqueue blocks, we are no longer writing them to state when we can't properly determine the best difficulty of a connected peer.
The text was updated successfully, but these errors were encountered:
Block writing stops in non-sealing nodes due to nil BestPeer
Description
It looks like we are still observing the same issue as originally described in #167
After investigating, this seems to be related to some inconsistencies in how we determine the best peer to connect to within
BestPeer()
. We are seeing that at times,curDiff
is always greater thanbestTd
, resulting in this function always returning nil.We've adding some extra logging in
BestPeer
to help investigate:Although the change 5170883 was put in (to broadcast the header's difficulty instead of the block number), it appears that we are actually initializing the header's
Difficulty
to be the same value as the header'sNumber
, resulting in a broadcast of the same value:https://github.com/0xPolygon/polygon-sdk/blob/bba205e6ed4aad9522450558ed3dbad67664e723/consensus/ibft/ibft.go#L349-L353
Upon attempting to correct L353 above, we are still seeing the same issue as can be seen in our logs. Below the bestTd is always 464670654277. Furthermore, the difficulty outputted in the
enqueue block
debug statement has the same value as the block number:I've put together the difficulties of the 5 peers currently connected:
In the above scenario, peer5 will always be our 'bestPeer', however its difficulty is always less than the current difficulty of 482664423817 so we will always return nil.
I would assume that if the Difficulty value for peer1-peer4 were sent correctly, we would connect to a better beer whose Difficulty would be greater than our current difficulty.
Your environment
Ubuntu 20
bba205e
develop
Steps to reproduce
Expected behaviour
Gossip'd blocks should continue to be written to state and processed normally after bulk sync.
Actual behaviour
Although we still enqueue blocks, we are no longer writing them to state when we can't properly determine the best difficulty of a connected peer.
The text was updated successfully, but these errors were encountered: