Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test if checkpoint verifier and state service are correctly reset on block commit errors #2654

Closed
Tracked by #745 ...
conradoplg opened this issue Aug 20, 2021 · 16 comments
Assignees
Labels
C-enhancement Category: This is an improvement C-security Category: Security issues I-hang A Zebra component stops responding to requests I-panic Zebra panics with an internal error message

Comments

@conradoplg
Copy link
Collaborator

conradoplg commented Aug 20, 2021

Motivation

We want to test that errors in the state are handled correctly.

Design

We want to do these tests here:

  • when committing checkpointed blocks, fail a finalized state block commit
    • commit_finalized
  • when committing fully verified blocks, fail a non-finalized state block commit
    • commit_block
    • commit_new_chain
  • drop finalized blocks in the queue
  • drop non-finalized blocks in the queue

If these tests take longer than a day to write, we're going to close this ticket as "too hard for now".

Old Description

#2633 adds ZIP-221/244 block commitment validation in the checkpoint verifier. The check is actually done in the finalized state, and when that happens, the checkpoint verifier is reset to the tip of the finalized state in order to keep running correctly.

However, this hasn't been tested, since it is tricky to test.

Implement a test to verify that it's working correctly.

(Note that is only required if/when we have a checkpoint post-Nu5. Before that all block contents are committed to by the block hash, so they must be correct. Nu5-onward that no longer applies due to ZIP-244.)

Previous attempt

I attempted to write this test in #2633 but gave up because it was too difficult and we had to keep working on other stuff.
See discussion. However, most of the problems were caused by trying to test it along with the "fake activation heights" feature in that PR, which would require generating fake blocks with correct difficulties, etc. in order to pass the checkpoint verifier. Another alternative would be to use real blocks similar to the continuous_blockchain - but the problem would then be how to ensure that one block fails to be comitted in order to test if that's handled correctly (the real blocks have small heights, and thus they don't use the block commitment in the header, so there is nothing to corrupt)

@conradoplg conradoplg added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage labels Aug 20, 2021
@teor2345 teor2345 added this to the 2021 Sprint 21 milestone Aug 23, 2021
@teor2345
Copy link
Contributor

(the real blocks have small heights, and thus they don't use the block commitment in the header, so there is nothing to corrupt)

I think we can test using the fake network upgrade heights, and the fake transaction version conversion methods.

@teor2345 teor2345 added C-security Category: Security issues I-hang A Zebra component stops responding to requests I-panic Zebra panics with an internal error message labels Oct 13, 2021
@mpguerra
Copy link
Contributor

mpguerra commented Nov 3, 2021

@teor2345
Copy link
Contributor

teor2345 commented Nov 4, 2021

We don't have any post-NU5 checkpoints yet. This ticket is optional until we do.

@mpguerra mpguerra removed this from the 2021 Sprint 23 milestone Nov 4, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Nov 5, 2021
@ftm1000
Copy link

ftm1000 commented Jan 26, 2022

redistributing issues that can be separately worked from Epic #2311

@mpguerra mpguerra mentioned this issue Jan 27, 2022
40 tasks
@teor2345
Copy link
Contributor

teor2345 commented Jun 2, 2022

This is an obscure case, we can test it if it causes problems later.

@teor2345 teor2345 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2022
@teor2345 teor2345 changed the title Test if checkpoint verifier is correctly reset on error Test if checkpoint verifier and state service are correctly reset on block commit errors Sep 14, 2022
@teor2345
Copy link
Contributor

We might need to do this for #4937

@teor2345 teor2345 reopened this Sep 14, 2022
@mpguerra mpguerra moved this to 🆕 New in Zebra Sep 22, 2022
@mpguerra mpguerra added this to Zebra Sep 22, 2022
@teor2345
Copy link
Contributor

Just to expand on my last comment a bit more: #4937 changes how error handling works in the state, and makes it more complicated, so we probably need to test that it actually works.

@mpguerra
Copy link
Contributor

let's timebox this to one day's work and move on if it's too hard...

@teor2345
Copy link
Contributor

teor2345 commented Sep 27, 2022

We think we want to do these tests here:

  • when committing checkpointed blocks, fail a finalized state block commit
  • when committing fully verified blocks, fail a non-finalized state block commit
  • when committing fully verified blocks, fail a finalized state block commit edit: this can't happen, it's currently a panic

@teor2345 teor2345 assigned arya2 and teor2345 and unassigned arya2 Sep 29, 2022
@teor2345
Copy link
Contributor

teor2345 commented Oct 3, 2022

I can do this, but I might want to work on it with either @arya2 or @upbqdn (whoever is available).

@upbqdn
Copy link
Member

upbqdn commented Oct 4, 2022

I can help.

@teor2345
Copy link
Contributor

teor2345 commented Oct 7, 2022

Just a reminder that this task has a time limit of 1 day, so if it's not done by the end of the sprint, we should just close it.

We're looking for a really quick test here, we just need to send one error, and check that the state doesn't crash or hang.

@teor2345
Copy link
Contributor

@mpguerra we decided not to do this in this sprint, because it's not a release blocker, and we've done enough manual testing.

@teor2345 teor2345 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2022
Repository owner moved this from 🆕 New to ✅ Done in Zebra Oct 11, 2022
@mpguerra mpguerra reopened this Oct 14, 2022
@mpguerra
Copy link
Contributor

re-opening for #5377

@mpguerra
Copy link
Contributor

mpguerra commented Oct 20, 2022

Closing for now. See test-block-commit-service for draft solution in case we want to pick this up again in future.

@upbqdn
Copy link
Member

upbqdn commented Oct 20, 2022

Closing for now. See test-block-commit-service for draft solution in case we want to pick this up again in future.

The corresponding PR is #5377.

@mpguerra mpguerra closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: This is an improvement C-security Category: Security issues I-hang A Zebra component stops responding to requests I-panic Zebra panics with an internal error message
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants