Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert note commitment and history trees when forking non-finalized chains #4794

Closed
6 tasks done
teor2345 opened this issue Jul 19, 2022 · 14 comments · Fixed by #6122
Closed
6 tasks done

Revert note commitment and history trees when forking non-finalized chains #4794

teor2345 opened this issue Jul 19, 2022 · 14 comments · Fixed by #6122
Assignees
Labels
A-consensus Area: Consensus rule updates A-state Area: State / database changes C-bug Category: This is a bug C-security Category: Security issues I-slow Problems with performance or responsiveness

Comments

@teor2345
Copy link
Contributor

teor2345 commented Jul 19, 2022

Motivation

When Zebra forks a non-finalized chain, it rebuilds the note commitment trees from the finalized tip.
But this can execute 100 blocks worth of note commitment tree updates, which uses a lot of CPU.

This change is a high priority for miners. They are unlikely to use Zebra unless it is fixed, because it prevents them mining 1-2 blocks after each chain fork.

We're seeing rebuild times of around 75 seconds, which is way too long:

2022-07-20T20:34:08.026902Z INFO {zebrad="b52b10c" net="Main"}:{peer=Out("3.72.134.66:8233")}:msg_as_req{msg="inv"}:inbound:download_and_verify{hash=00000000017131670f50301f97d80699baaef991ffed0c4ec9b6a0b0c189e587}:state: zebra_state::service::non_finalized_state::chain: finished rebuilding note commitment trees after a non-finalized chain fork rebuild_time=75.654621465s fork_height=Height(1744525) fork_tip=block::Hash("0000000000f4dd1105f94c06f28189ebdd60c0895d8931486847cce3278590e5")

Consensus Rules

The consensus rules require the trees to match the chain tip, including all the parent blocks back to genesis.

The revert needs to happen correctly, so the trees match the new fork tip block. (Forking is an efficiency thing, the consensus rules don't specify how to do forks.)

If we remove duplicate storage of tip trees by height and as their own field, we will only have to update them in one place. This makes keeping them consistent very easy.

Designs

Instead of rebuilding the trees from the root, we should remove the trees and anchors that are above the fork.

Refactors:

  • replace tip sprout, sapling, orchard, history tree fields with methods that do a lookup using the tip height

Change:

  • during a fork, revert sprout, sapling, orchard, history trees by removing anchors and trees above the fork height in revert_chain_with() for sprout, sapling, orchard shielded data, and history trees
  • remove the code that rebuilds the history trees in rebuild_trees_parallel()

Cleanup:

  • remove Chain::with_trees() and use derive(Clone) instead
  • remove unused tree arguments from Chain::fork()
  • cleanup any methods in NonFinalizedState that call these methods

Related Work

This might be needed by mining pools.

We added a log in PR #4795, which told us this can take around 75 seconds.

@teor2345 teor2345 added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage S-needs-investigation Status: Needs further investigation P-Low ❄️ C-security Category: Security issues I-slow Problems with performance or responsiveness A-state Area: State / database changes A-cryptography Area: Cryptography related labels Jul 19, 2022
@conradoplg
Copy link
Collaborator

Note that it's non-trivial to remove nodes from a Frontier (at least I couldn't figure it out quickly when I tried). But since we store the trees for each height now, we can just start from the fork point and add the new nodes.

@teor2345
Copy link
Contributor Author

We're seeing rebuild times of around 75 seconds, which is way too long:

2022-07-20T20:34:08.026902Z INFO {zebrad="b52b10c" net="Main"}:{peer=Out("3.72.134.66:8233")}:msg_as_req{msg="inv"}:inbound:download_and_verify{hash=00000000017131670f50301f97d80699baaef991ffed0c4ec9b6a0b0c189e587}:state: zebra_state::service::non_finalized_state::chain: finished rebuilding note commitment trees after a non-finalized chain fork rebuild_time=75.654621465s fork_height=Height(1744525) fork_tip=block::Hash("0000000000f4dd1105f94c06f28189ebdd60c0895d8931486847cce3278590e5")

So this is a high priority.

@upbqdn
Copy link
Member

upbqdn commented Jul 22, 2022

Note that it's non-trivial to remove nodes from a Frontier (at least I couldn't figure it out quickly when I tried).

I think Frontier doesn't support that operation at all.

@teor2345
Copy link
Contributor Author

Zebra's performance is good enough for now.

@mpguerra mpguerra moved this to 🆕 New in Zebra Sep 22, 2022
@mpguerra mpguerra added this to Zebra Sep 22, 2022
@teor2345
Copy link
Contributor Author

We're not making changes of this size after tagging the release candidate.

@teor2345 teor2345 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2022
Repository owner moved this from 🆕 New to ✅ Done in Zebra Oct 11, 2022
@teor2345
Copy link
Contributor Author

This is mitigated by #5257, which runs these slow updates in a separate thread.

@teor2345 teor2345 reopened this Oct 12, 2022
@mpguerra mpguerra moved this from ✅ Done to 🔖 Triaged in Zebra Nov 9, 2022
@mpguerra mpguerra moved this from 🔖 Triaged to 🆕 New in Zebra Nov 9, 2022
@teor2345
Copy link
Contributor Author

@mpguerra we've been told by a mining pool that fork speed is a significant issue for them.

I've updated this ticket based on the current code, we've done most of the work here already, due to other state tickets.

@mpguerra
Copy link
Contributor

Thanks for the updates! This seems like a significant change so, in terms of scheduling, we'd probably want to evaluate how this would affect the stable release candidate series and the audit.

@teor2345
Copy link
Contributor Author

Thanks for the updates! This seems like a significant change so, in terms of scheduling, we'd probably want to evaluate how this would affect the stable release candidate series and the audit.

I think this change is a high priority for miners. They are unlikely to use Zebra unless it is fixed, because it prevents them mining 1-2 blocks after each chain fork.

Here is my evaluation of the complexity of this change:

This change is all contained within the chain.rs module. It should change less than 100 lines of code. As part of this change, we can delete some complex tree update code that's no longer needed.

This will fix a current performance bug in the stable release candidate series. We almost fixed this bug before the audit, because the performance is much worse than zcashd. But we ran out of time.

We already have unit and full sync tests for this code, so it is a low risk change.

@teor2345 teor2345 added A-consensus Area: Consensus rule updates and removed A-cryptography Area: Cryptography related S-needs-investigation Status: Needs further investigation labels Dec 14, 2022
@teor2345
Copy link
Contributor Author

I've added a note about the consensus rules to the ticket description. If we do the change the way I described it in the ticket, then keeping the consensus rules actually becomes a lot simpler.

So I can't see anything complicated about this ticket. Maybe Deirdre or Conrado could double-check?

@mpguerra
Copy link
Contributor

So I can't see anything complicated about this ticket. Maybe Deirdre or Conrado could double-check?

@conradoplg do you mind double checking this?

@conradoplg
Copy link
Collaborator

I agree it's not complicated, the old code was basically a workaround, and we will able to replace it with much neater code. This part of the code is well tested so it is unlikely that a bug will be introduced. The hardest part will understanding the old code, but I can help if whoever implements this is not familiar with it.

@teor2345
Copy link
Contributor Author

I can also help with implementation or reviews.

@teor2345 teor2345 self-assigned this Jan 16, 2023
@mpguerra
Copy link
Contributor

moving this to Sprint 3, we should prioritise testing instead for this sprint

@teor2345 teor2345 removed their assignment Jan 17, 2023
@teor2345 teor2345 self-assigned this Jan 30, 2023
@teor2345 teor2345 added P-Medium ⚡ and removed P-Low ❄️ S-needs-triage Status: A bug report needs triage labels Feb 7, 2023
@mergify mergify bot closed this as completed in #6122 Feb 13, 2023
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Zebra Feb 13, 2023
mpguerra added a commit that referenced this issue May 19, 2023
mergify bot pushed a commit that referenced this issue May 23, 2023
* ZIPs were updated to remove ambiguity, this was tracked in #1267.

* #2105 was fixed by #3039 and #2379 was closed by #3069

* #2230 was a duplicate of #2231 which was closed by #2511

* #3235 was obsoleted by #2156 which was fixed by #3505

* #1850 was fixed by #2944, #1851 was fixed by #2961 and #2902 was fixed by #2969

* We migrated to Rust 2021 edition in Jan 2022 with #3332

* #1631 was closed as not needed

* #338 was fixed by #3040 and #1162 was fixed by #3067

* #2079 was fixed by #2445

* #4794 was fixed by #6122

* #1678 stopped being an issue

* #3151 was fixed by #3934

* #3204 was closed as not needed

* #1213 was fixed by #4586

* #1774 was closed as not needed

* #4633 was closed as not needed

* Clarify behaviour of difficulty spacing

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour when retrying block downloads

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify when we might want to fix

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify what we might want to change in future

Co-authored-by: teor <teor@riseup.net>

* Clarify benefits of how we do block verification

Co-authored-by: teor <teor@riseup.net>

* Fix rustfmt errors

---------

Co-authored-by: teor <teor@riseup.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-consensus Area: Consensus rule updates A-state Area: State / database changes C-bug Category: This is a bug C-security Category: Security issues I-slow Problems with performance or responsiveness
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants