Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop batch verification lagging and failing entire batches #4729

Closed
Tracked by #4747
teor2345 opened this issue Jun 30, 2022 · 0 comments · Fixed by #4726 or #4750
Closed
Tracked by #4747

Stop batch verification lagging and failing entire batches #4729

teor2345 opened this issue Jun 30, 2022 · 0 comments · Fixed by #4726 or #4750
Assignees
Labels
C-bug Category: This is a bug I-slow Problems with performance or responsiveness S-needs-investigation Status: Needs further investigation

Comments

@teor2345
Copy link
Contributor

teor2345 commented Jun 30, 2022

Motivation

The batch verifier can lag and fail entire blocks, even if the proofs and signatures are valid. We should re-design it so it can't lag.

If we can't do that, the dropped verifications should fail immediately, rather than waiting for the block verification timeout:

2022-06-30T17:51:02.943983ZERROR{net="Main"}:sync:try_to_sync:extend_tips:zebra_consensus::primitives::redpallas: batch verification receiver lagged and lost verification results
2022-06-30T17:51:57.423166Z INFO{net="Main"}:zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.819% current_height=Height(1718380) network_upgrade=Nu5 remaining_sync_blocks=3119 time_since_last_state_block=1m
...
2022-06-30T17:54:57.425405Z INFO{net="Main"}:zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.819% current_height=Height(1718380) network_upgrade=Nu5 remaining_sync_blocks=3122 time_since_last_state_block=4m
2022-06-30T17:55:57.426619Z INFO{net="Main"}:zebrad::components::sync::progress: estimated progress to chain tip sync_percent=99.819% current_height=Height(1718381) network_upgrade=Nu5 remaining_sync_blocks=3121 time_since_last_state_block=0s
2022-06-30T17:56:47.835664Z WARN{net="Main"}:sync:try_to_sync:zebrad::components::sync: error downloading and verifying block e=Invalid { error: Block(Transaction(InternalDowncastError("downcast to known transaction error type failed, original error: Elapsed(())"))), height: Height(1718445), hash: block::Hash("0000000000f0f61e6f42984784ad367711c0b3e704e840797606314426dc2a90") }

https://github.com/ZcashFoundation/zebra/runs/7135923149?check_suite_focus=true#step:6:644

Designs

We can either:

  • replace the batch verifier with a watch channel, and create a new watch channel for each batch
  • create a new broadcast channel for each batch, so it only ever has one result in it, and make the channel size 1
@teor2345 teor2345 added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage S-needs-investigation Status: Needs further investigation I-slow Problems with performance or responsiveness P-Optional ✨ labels Jun 30, 2022
@teor2345 teor2345 changed the title If the batch verifier lags, fail those verifications immediately Stop batch verification lagging and failing entire batches Jul 4, 2022
@teor2345 teor2345 self-assigned this Jul 6, 2022
@ftm1000 ftm1000 removed the S-needs-triage Status: A bug report needs triage label Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug I-slow Problems with performance or responsiveness S-needs-investigation Status: Needs further investigation
Projects
None yet
2 participants