Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sync): Pause new downloads when Zebra reaches the lookahead limit #5561

Merged
merged 10 commits into from
Nov 9, 2022

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Nov 7, 2022

Motivation

We're seeing a lot of AboveLookaheadLimit errors in full syncs. Instead of dropping those blocks, we should wait for them to verify.

Also we might need more time for a full sync.

Designs

Add a watch channel to the downloader, which gets set when the downloader is a long way away from the state tip. Reset the channel when:

  • a downloaded block is much closer to the tip, or
  • there are no more blocks pending download or verify.

Solution

Syncer fixes:

  • Make the syncer pause new downloads when a downloaded block is 3000 blocks ahead of the state tip
    • If the block is 50,000 blocks ahead, still drop it anyway
  • Make sure the default lookahead limit includes 2 full checkpoints (1000 blocks)
    • Add a new test config for this
  • Allow 2000 extra blocks in the pre-download queue, or the state commit channel (was 4000 extra blocks)

Related fixes:

Related cleanups:

  • Fix up a syncer test that didn't actually test what it said it did
  • Add a getblocktemplate-rpcs alternative to the missing test config file message

Review

Anyone can review this PR, it's a high priority because it fixes errors on main.

I'm doing a full sync locally to check that it works.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
  • How do you know it works? Does it have tests?

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code P-High 🔥 I-slow Problems with performance or responsiveness I-integration-fail Continuous integration fails, including build and test failures labels Nov 7, 2022
@teor2345 teor2345 self-assigned this Nov 7, 2022
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Nov 7, 2022
@codecov
Copy link

codecov bot commented Nov 7, 2022

Codecov Report

Merging #5561 (4e687d7) into main (75f83fc) will decrease coverage by 0.13%.
The diff coverage is 48.10%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5561      +/-   ##
==========================================
- Coverage   78.84%   78.71%   -0.14%     
==========================================
  Files         305      305              
  Lines       38126    38178      +52     
==========================================
- Hits        30061    30050      -11     
- Misses       8065     8128      +63     

@teor2345 teor2345 removed the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Nov 7, 2022
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Nov 7, 2022
@teor2345 teor2345 changed the title fix(sync): Pause new downloads after Zebra has downloaded a lot of blocks fix(sync): Pause new downloads when Zebra reaches the lookahead limit Nov 7, 2022
@teor2345
Copy link
Contributor Author

teor2345 commented Nov 7, 2022

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Nov 7, 2022

update

✅ Branch has been successfully updated

@mpguerra
Copy link
Contributor

mpguerra commented Nov 7, 2022

Do we know why this has started happening now?

@teor2345
Copy link
Contributor Author

teor2345 commented Nov 7, 2022

Do we know why this has started happening now?

Probably left over from the changes in #4937. We tried tweaking the lookahead limit a few times, but that doesn't seem to help.

@teor2345
Copy link
Contributor Author

teor2345 commented Nov 7, 2022

Syncing worked, but it was pausing for a block timeout every few checkpoints.
So I tweaked the syncer to start again when there were only a few blocks left, rather than waiting for zero blocks.

Apart from that, the logging and sync speed look really good.

@teor2345 teor2345 marked this pull request as ready for review November 8, 2022 19:21
@teor2345 teor2345 requested review from a team as code owners November 8, 2022 19:21
@teor2345 teor2345 requested review from dconnolly and oxarbitrage and removed request for a team November 8, 2022 19:21
@teor2345
Copy link
Contributor Author

teor2345 commented Nov 8, 2022

There are still some timeouts in the logs, but I'll deal with them in another PR.

Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful, thank you.

zebrad/src/components/sync/downloads.rs Show resolved Hide resolved
zebrad/src/components/sync/downloads.rs Show resolved Hide resolved
mergify bot added a commit that referenced this pull request Nov 8, 2022
@teor2345
Copy link
Contributor Author

teor2345 commented Nov 9, 2022

@Mergifyio refresh

@mergify
Copy link
Contributor

mergify bot commented Nov 9, 2022

refresh

✅ Pull request refreshed

@teor2345
Copy link
Contributor Author

teor2345 commented Nov 9, 2022

@Mergifyio refresh

@mergify
Copy link
Contributor

mergify bot commented Nov 9, 2022

refresh

✅ Pull request refreshed

mergify bot added a commit that referenced this pull request Nov 9, 2022
@mergify mergify bot merged commit c4fad29 into main Nov 9, 2022
@mergify mergify bot deleted the fix-full-sync-fail branch November 9, 2022 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG I-integration-fail Continuous integration fails, including build and test failures I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ci: logs longer than GitHub's limit do not allow to see the actual error in the console
4 participants