You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have found a significant bottleneck on my local node in the sync process which I believe might be impacting all nodes and severely degrading network block propagation when the validator block generation time is abnormal.
The below is 2 hours of data charted to highlight the durations the above method takes to fetch a block with the mean duration being above 10 seconds. Note: the gaps in data are when the node is not bulk syncing but popping latest block from peer (i.e. when network is more stable)
The issue I believe happens after after the node exits the watch sync (aka pop latest block loop) which remains the performant happy path, but these days it commonly gets interrupted by the 6 second pop block timeout and networking issues as block production times often exceed it. This means when your node does not hear about new block from the peer for 6 seconds, then we fall back on the bulk sync with the really slow method.
Regarding the proposed solution, all this common ancestor logic just doesn't make sense to me. All we want at that point is highest common block of local node and the peer, and we should be able to assume it is the local node blockchain header since syncer#BestPeer already made a check for us and gave us a peer with higher block than our local node (see https://github.com/dogechain-lab/dogechain/blob/v1.1.4/protocol/syncer.go#L344-L345)
The text was updated successfully, but these errors were encountered:
Hey @DarianShawn, luckily past months have been really stable in terms of sync, therefore I do not have any production logs to look at in terms of what happens when we fall back onto the bulk sync but I am pretty positive after reviewing the changes that we should be good going forward.
[Syncer performance bottleneck]
Description
I have found a significant bottleneck on my local node in the sync process which I believe might be impacting all nodes and severely degrading network block propagation when the validator block generation time is abnormal.
The bottleneck originates at the very beginning of bulk sync method https://github.com/dogechain-lab/dogechain/blob/v1.1.4/protocol/syncer.go#L545, where a common block ancestor is being fetched.
The below is 2 hours of data charted to highlight the durations the above method takes to fetch a block with the mean duration being above 10 seconds. Note: the gaps in data are when the node is not bulk syncing but popping latest block from peer (i.e. when network is more stable)
The issue I believe happens after after the node exits the watch sync (aka pop latest block loop) which remains the performant happy path, but these days it commonly gets interrupted by the 6 second pop block timeout and networking issues as block production times often exceed it. This means when your node does not hear about new block from the peer for 6 seconds, then we fall back on the bulk sync with the really slow method.
Regarding the proposed solution, all this common ancestor logic just doesn't make sense to me. All we want at that point is highest common block of local node and the peer, and we should be able to assume it is the local node blockchain header since syncer#BestPeer already made a check for us and gave us a peer with higher block than our local node (see https://github.com/dogechain-lab/dogechain/blob/v1.1.4/protocol/syncer.go#L344-L345)
The text was updated successfully, but these errors were encountered: