Syncer performance bottleneck #246

ircrp · 2022-11-11T16:51:47Z

[Syncer performance bottleneck]

Description

I have found a significant bottleneck on my local node in the sync process which I believe might be impacting all nodes and severely degrading network block propagation when the validator block generation time is abnormal.

The bottleneck originates at the very beginning of bulk sync method https://github.com/dogechain-lab/dogechain/blob/v1.1.4/protocol/syncer.go#L545, where a common block ancestor is being fetched.

The below is 2 hours of data charted to highlight the durations the above method takes to fetch a block with the mean duration being above 10 seconds. Note: the gaps in data are when the node is not bulk syncing but popping latest block from peer (i.e. when network is more stable)

The issue I believe happens after after the node exits the watch sync (aka pop latest block loop) which remains the performant happy path, but these days it commonly gets interrupted by the 6 second pop block timeout and networking issues as block production times often exceed it. This means when your node does not hear about new block from the peer for 6 seconds, then we fall back on the bulk sync with the really slow method.

Regarding the proposed solution, all this common ancestor logic just doesn't make sense to me. All we want at that point is highest common block of local node and the peer, and we should be able to assume it is the local node blockchain header since syncer#BestPeer already made a check for us and gave us a peer with higher block than our local node (see https://github.com/dogechain-lab/dogechain/blob/v1.1.4/protocol/syncer.go#L344-L345)

DarianShawn · 2022-11-15T10:38:06Z

Great job, man.
We've been hunting for POS related issues for the past few weeks, and now it's time to fix those performance issues.

@0xcb9ff9 @abrahamcruise321 Can you take the time to discuss a solution here?

DarianShawn · 2022-11-29T03:31:12Z

@ircrp New PR #265 focusing on the new syncer protocol for backwards compatibility.
Any suggestions are welcome.

DarianShawn · 2022-12-06T16:00:05Z

The new version v1.2.1 is released. Give another shot, if you like. @ircrp

ircrp · 2023-05-15T16:01:02Z

Hey @DarianShawn, luckily past months have been really stable in terms of sync, therefore I do not have any production logs to look at in terms of what happens when we fall back onto the bulk sync but I am pretty positive after reviewing the changes that we should be good going forward.

Thanks !

DarianShawn self-assigned this Nov 15, 2022

DarianShawn added bug Something isn't working feature-wanted We want this feature please labels Nov 15, 2022

DarianShawn added this to the Release 1.2.1 milestone Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncer performance bottleneck #246

Syncer performance bottleneck #246

ircrp commented Nov 11, 2022

DarianShawn commented Nov 15, 2022

DarianShawn commented Nov 29, 2022

DarianShawn commented Dec 6, 2022

ircrp commented May 15, 2023

Syncer performance bottleneck #246

Syncer performance bottleneck #246

Comments

ircrp commented Nov 11, 2022