Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncer performance bottleneck #246

Open
ircrp opened this issue Nov 11, 2022 · 4 comments
Open

Syncer performance bottleneck #246

ircrp opened this issue Nov 11, 2022 · 4 comments
Assignees
Labels
bug Something isn't working feature-wanted We want this feature please
Milestone

Comments

@ircrp
Copy link

ircrp commented Nov 11, 2022

[Syncer performance bottleneck]

Description

I have found a significant bottleneck on my local node in the sync process which I believe might be impacting all nodes and severely degrading network block propagation when the validator block generation time is abnormal.

The bottleneck originates at the very beginning of bulk sync method https://github.com/dogechain-lab/dogechain/blob/v1.1.4/protocol/syncer.go#L545, where a common block ancestor is being fetched.

The below is 2 hours of data charted to highlight the durations the above method takes to fetch a block with the mean duration being above 10 seconds. Note: the gaps in data are when the node is not bulk syncing but popping latest block from peer (i.e. when network is more stable)

image

The issue I believe happens after after the node exits the watch sync (aka pop latest block loop) which remains the performant happy path, but these days it commonly gets interrupted by the 6 second pop block timeout and networking issues as block production times often exceed it. This means when your node does not hear about new block from the peer for 6 seconds, then we fall back on the bulk sync with the really slow method.

Regarding the proposed solution, all this common ancestor logic just doesn't make sense to me. All we want at that point is highest common block of local node and the peer, and we should be able to assume it is the local node blockchain header since syncer#BestPeer already made a check for us and gave us a peer with higher block than our local node (see https://github.com/dogechain-lab/dogechain/blob/v1.1.4/protocol/syncer.go#L344-L345)

@DarianShawn DarianShawn self-assigned this Nov 15, 2022
@DarianShawn DarianShawn added bug Something isn't working feature-wanted We want this feature please labels Nov 15, 2022
@DarianShawn
Copy link
Collaborator

Great job, man.
We've been hunting for POS related issues for the past few weeks, and now it's time to fix those performance issues.

@0xcb9ff9 @abrahamcruise321 Can you take the time to discuss a solution here?

@DarianShawn DarianShawn added this to the Release 1.2.1 milestone Nov 15, 2022
@DarianShawn
Copy link
Collaborator

@ircrp New PR #265 focusing on the new syncer protocol for backwards compatibility.
Any suggestions are welcome.

@DarianShawn
Copy link
Collaborator

The new version v1.2.1 is released. Give another shot, if you like. @ircrp

@ircrp
Copy link
Author

ircrp commented May 15, 2023

Hey @DarianShawn, luckily past months have been really stable in terms of sync, therefore I do not have any production logs to look at in terms of what happens when we fall back onto the bulk sync but I am pretty positive after reviewing the changes that we should be good going forward.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature-wanted We want this feature please
Projects
None yet
Development

No branches or pull requests

2 participants