You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
teor2345 opened this issue
Jun 28, 2022
· 5 comments
· Fixed by #4726
Assignees
Labels
A-networkArea: Network protocol updates or fixesC-bugCategory: This is a bugC-securityCategory: Security issuesI-hangA Zebra component stops responding to requestsI-heavyProblems with excessive memory, disk, or CPU usageI-slowProblems with performance or responsiveness
any extra verifications are slower because RAM or CPU is fully used
the syncer keeps downloading and queueing large numbers of blocks for verification, taking more RAM
when blocks timeout, any uncommitted blocks get cancelled, and they have to be downloaded and verified again
We've partially fixed this issue by decreasing the syncer lookahead in PRs #4662, #4670, #4679.
We also improved halo2 verification speed in PR #4699.
Designs
When we impose backpressure on the syncer, we need to verify the lowest blocks first.
This means that the backpressure has to be implemented in the syncer task.
(After that, the individual block download and verify tasks can run out of order.)
Here is one possible fix:
when doing checkpoint verification, let the syncer look ahead 400-1200 blocks
when doing full verification, let the syncer look ahead 1-max_concurrent_block_requests blocks
400 is the minimum for checkpointing, because the maximum checkpoint gap is 400 blocks.
The max_concurrent_block_requests default might need to be tuned for good full verification performance.
The halo2 verification performance improvements in PR #4699, and the updated checkpoints in PR #4708 really helped.
CPU and RAM usage are mostly normal, but they occasionally double. There was one block that took a minute to verify, but most of the time verification happens smoothly.
A-networkArea: Network protocol updates or fixesC-bugCategory: This is a bugC-securityCategory: Security issuesI-hangA Zebra component stops responding to requestsI-heavyProblems with excessive memory, disk, or CPU usageI-slowProblems with performance or responsiveness
Motivation
When Zebra's full validation slows down:
We've partially fixed this issue by decreasing the syncer lookahead in PRs #4662, #4670, #4679.
We also improved
halo2
verification speed in PR #4699.Designs
When we impose backpressure on the syncer, we need to verify the lowest blocks first.
This means that the backpressure has to be implemented in the syncer task.
(After that, the individual block download and verify tasks can run out of order.)
Here is one possible fix:
max_concurrent_block_requests
blocks400 is the minimum for checkpointing, because the maximum checkpoint gap is 400 blocks.
The
max_concurrent_block_requests
default might need to be tuned for good full verification performance.Here is where the syncer limit is implemented:
zebra/zebrad/src/components/sync.rs
Lines 425 to 436 in 54efbe9
The text was updated successfully, but these errors were encountered: