Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sync): Temporarily set full verification concurrency to 30 blocks #4726

Merged
merged 16 commits into from
Jul 6, 2022

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Jun 30, 2022

Motivation

We're seeing a lot of sync failures in CI and on developer machines, because some large orchard blocks take a long time to verify.

Depends-On: #4752
Close #4715
Close #4729
Close #4650

Solution

Until we implement orchard batching, temporarily limit full verification to 30 blocks in parallel.

Syncer changes:

  • Limit the number of blocks submitted to download and verify
    • By the syncer
    • By the inbound service
  • Keep unused extra hashes and submit them to the sync downloader later
  • Split the checkpoint and full verify concurrency configs
    • Adjust lookahead limits when transitioning to full verification
    • Use a lower concurrency limit during full verification
    • Decrease full verification concurrency to 5 blocks

Related changes:

  • Return the maximum checkpoint height from the chain verifier
  • Return the verified block height from the sync downloader (actually not needed, I can remove these changes if we want)
  • Remove a verbose log

Review

This bug causes a lot of CI workflows to run slow or fail, so it's a high priority.

I'm still testing it locally, it also needs a full sync test.

Reviewer Checklist

  • Code implements Specs and Designs
  • Existing CI sync tests pass

@teor2345 teor2345 added C-bug Category: This is a bug P-High 🔥 I-hang A Zebra component stops responding to requests I-heavy Problems with excessive memory, disk, or CPU usage I-slow Problems with performance or responsiveness labels Jun 30, 2022
@teor2345 teor2345 self-assigned this Jun 30, 2022
@teor2345 teor2345 changed the title fix(sync): Temporarily full verification concurrency to 5 blocks fix(sync): Temporarily set full verification concurrency to 5 blocks Jun 30, 2022
@teor2345

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Jun 30, 2022

Codecov Report

Merging #4726 (88e2517) into main (42ef884) will decrease coverage by 0.11%.
The diff coverage is 58.55%.

@@            Coverage Diff             @@
##             main    #4726      +/-   ##
==========================================
- Coverage   78.87%   78.76%   -0.12%     
==========================================
  Files         306      306              
  Lines       37557    37671     +114     
==========================================
+ Hits        29624    29670      +46     
- Misses       7933     8001      +68     

@teor2345 teor2345 force-pushed the full-verify-limit branch from 8ec1e7a to 1c6958d Compare July 1, 2022 01:57
@teor2345 teor2345 marked this pull request as ready for review July 1, 2022 01:59
@teor2345 teor2345 requested review from a team as code owners July 1, 2022 01:59
@teor2345 teor2345 requested review from oxarbitrage and removed request for a team July 1, 2022 01:59
@teor2345 teor2345 removed the request for review from a team July 4, 2022 00:15
@teor2345 teor2345 requested a review from a team as a code owner July 4, 2022 06:32
@teor2345 teor2345 requested review from conradoplg and removed request for a team and conradoplg July 4, 2022 06:32
@conradoplg
Copy link
Collaborator

FYI I tried this out and my node still gets stuck at height 1719629

@teor2345 teor2345 force-pushed the full-verify-limit branch from 9bdd987 to 758dd70 Compare July 5, 2022 01:08
oxarbitrage
oxarbitrage previously approved these changes Jul 5, 2022
Copy link
Contributor

@oxarbitrage oxarbitrage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me however i am not sure if it brings too much practical improvements. Teor reported some improvements in the sync using this PR but further (unrelated?) problems a bit further. Conrado reported no benefits.

With that said, feel free to merge.

@teor2345 teor2345 added the do-not-merge Tells Mergify not to merge this PR label Jul 5, 2022
@teor2345

This comment was marked as outdated.

@teor2345 teor2345 marked this pull request as draft July 5, 2022 21:15
@teor2345 teor2345 removed the do-not-merge Tells Mergify not to merge this PR label Jul 5, 2022
@teor2345 teor2345 force-pushed the full-verify-limit branch from 758dd70 to 88e2517 Compare July 5, 2022 23:16
@teor2345 teor2345 changed the base branch from main to fix-redundant-halo2 July 5, 2022 23:16
@teor2345 teor2345 changed the title fix(sync): Temporarily set full verification concurrency to 5 blocks fix(sync): Temporarily set full verification concurrency to 30 blocks Jul 5, 2022
@teor2345
Copy link
Contributor Author

teor2345 commented Jul 5, 2022

I'm running a full sync test here, to check if this solves the remaining syncer stalls:
https://github.com/ZcashFoundation/zebra/actions/runs/2619418857

It should be faster than PR #4752.

@teor2345
Copy link
Contributor Author

teor2345 commented Jul 5, 2022

(I removed the batch verifier changes from this PR, I'll submit them in another PR.)

@teor2345 teor2345 marked this pull request as ready for review July 6, 2022 00:07
@teor2345
Copy link
Contributor Author

teor2345 commented Jul 6, 2022

Using a partly-synced local Zebra instance, I can get Zebra to sync to the tip with:

I'm running full syncs to confirm, but we might want to merge them anyway, so we can move on with other work.

Base automatically changed from fix-redundant-halo2 to main July 6, 2022 14:11
mergify bot added a commit that referenced this pull request Jul 6, 2022
@gustavovalverde gustavovalverde merged commit 87f4308 into main Jul 6, 2022
@gustavovalverde gustavovalverde deleted the full-verify-limit branch July 6, 2022 14:13
@gustavovalverde
Copy link
Member

Admin merged, as the full sync got stuck and we need the fix anyways.

@oxarbitrage oxarbitrage mentioned this pull request Jul 28, 2022
31 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-heavy Problems with excessive memory, disk, or CPU usage I-slow Problems with performance or responsiveness
Projects
None yet
4 participants