Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(unrecoverable-error): implement halting of full node execution #809

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

glevco
Copy link
Contributor

@glevco glevco commented Oct 12, 2023

Motivation

Currently, when an exception happens during a consensus update, it marks the tx as voided with a custom marker and continues to operate. This operation may be faulty, though, as the database is likely in an undesired state. For example, if such exception happens when a block is received, no following block will be accepted and the full node will not be able to sync anymore. If the full node is manually stopped and restarted, it starts up but continues to be unable to sync.

This PR's goal is to, instead, completely halt and exit the full node in those cases, forcing manual intervention. When some exception happens during a consensus update, the full node process exits with a non-zero exit code. This also guarantees that the database is marked as corrupted, so the full node cannot be restarted normally, only by a full verification or new database.

This PR is only more restrictive than what's currently implemented, and will be necessary for the Feature Activation for Transactions.

Acceptance Criteria

  • Create new ExecutionManager with crash_and_exit() method.
  • Create new related methods in EventManager and TransactionStorage for dealing with a full node crash.
  • Change handling of exceptions during consensus so it halts and exit the full node, in addition to voiding the tx.

Checklist

  • If you are requesting a merge into master, confirm this code is production-ready and can be included in future releases as soon as it gets merged

@glevco glevco self-assigned this Oct 12, 2023
@glevco glevco force-pushed the feat/unrecoverable-error branch 2 times, most recently from bd220e7 to d759c4e Compare October 12, 2023 18:12
@glevco glevco marked this pull request as ready for review October 12, 2023 18:21
@codecov
Copy link

codecov bot commented Oct 12, 2023

Codecov Report

Attention: Patch coverage is 86.76471% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 85.08%. Comparing base (da4a89c) to head (452b16b).

Files Patch % Lines
hathor/event/event_manager.py 66.66% 3 Missing ⚠️
hathor/manager.py 57.14% 2 Missing and 1 partial ⚠️
hathor/execution_manager.py 92.30% 2 Missing ⚠️
hathor/transaction/storage/transaction_storage.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #809      +/-   ##
==========================================
- Coverage   85.17%   85.08%   -0.09%     
==========================================
  Files         292      293       +1     
  Lines       22667    22719      +52     
  Branches     3415     3418       +3     
==========================================
+ Hits        19306    19331      +25     
- Misses       2687     2706      +19     
- Partials      674      682       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@glevco glevco force-pushed the feat/unrecoverable-error branch 2 times, most recently from 833e11f to 16b471f Compare October 19, 2023 20:23
hathor/consensus/consensus.py Show resolved Hide resolved
hathor/consensus/consensus.py Show resolved Hide resolved
hathor/consensus/consensus.py Outdated Show resolved Hide resolved
hathor/consensus/consensus.py Outdated Show resolved Hide resolved
@glevco glevco force-pushed the feat/unrecoverable-error branch from 16b471f to c3577d9 Compare October 24, 2023 18:12
@glevco
Copy link
Contributor Author

glevco commented Oct 24, 2023

@msbrogli the code changed a lot, so I just rebased it. Could you please re-review all files?

@glevco glevco requested a review from msbrogli October 24, 2023 18:21
@glevco glevco force-pushed the feat/unrecoverable-error branch 2 times, most recently from 735ae0d to a57f1af Compare October 25, 2023 17:26
@glevco glevco force-pushed the feat/unrecoverable-error branch 2 times, most recently from ccc49e6 to 8b342a9 Compare November 7, 2023 03:25
@glevco glevco force-pushed the feat/unrecoverable-error branch from 8b342a9 to 4278afa Compare November 16, 2023 20:37
@glevco glevco force-pushed the feat/unrecoverable-error branch from 4278afa to 8967098 Compare November 27, 2023 20:24
Copy link
Member

@jansegre jansegre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could start using this now when a storage write fails. Do you think it'd be worth to include this in this PR?

@glevco glevco force-pushed the feat/unrecoverable-error branch from 8967098 to 627494a Compare December 5, 2023 19:50
@glevco glevco force-pushed the feat/unrecoverable-error branch from 627494a to 41cb0fb Compare January 2, 2024 16:50
hathor/event/event_manager.py Outdated Show resolved Hide resolved
hathor/execution_manager.py Show resolved Hide resolved
hathor/transaction/storage/transaction_storage.py Outdated Show resolved Hide resolved
hathor/execution_manager.py Outdated Show resolved Hide resolved
@glevco glevco force-pushed the feat/unrecoverable-error branch from 41cb0fb to 1aab024 Compare January 15, 2024 19:50
@glevco
Copy link
Contributor Author

glevco commented Jan 16, 2024

We could start using this now when a storage write fails. Do you think it'd be worth to include this in this PR?

@jansegre I think we can do it in a separate PR

@glevco glevco force-pushed the feat/unrecoverable-error branch from c4bb1f4 to 2938c08 Compare January 16, 2024 02:16
jansegre
jansegre previously approved these changes Jan 16, 2024
msbrogli
msbrogli previously approved these changes Mar 4, 2024
@glevco glevco merged commit b1e8d35 into master Mar 5, 2024
12 checks passed
@glevco glevco deleted the feat/unrecoverable-error branch March 5, 2024 15:35
@jansegre jansegre mentioned this pull request Apr 5, 2024
2 tasks
@jansegre jansegre mentioned this pull request May 8, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants