-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(unrecoverable-error): implement halting of full node execution #809
Conversation
bd220e7
to
d759c4e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #809 +/- ##
==========================================
- Coverage 85.17% 85.08% -0.09%
==========================================
Files 292 293 +1
Lines 22667 22719 +52
Branches 3415 3418 +3
==========================================
+ Hits 19306 19331 +25
- Misses 2687 2706 +19
- Partials 674 682 +8 ☔ View full report in Codecov by Sentry. |
833e11f
to
16b471f
Compare
16b471f
to
c3577d9
Compare
@msbrogli the code changed a lot, so I just rebased it. Could you please re-review all files? |
735ae0d
to
a57f1af
Compare
ccc49e6
to
8b342a9
Compare
8b342a9
to
4278afa
Compare
4278afa
to
8967098
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could start using this now when a storage write fails. Do you think it'd be worth to include this in this PR?
8967098
to
627494a
Compare
627494a
to
41cb0fb
Compare
41cb0fb
to
1aab024
Compare
@jansegre I think we can do it in a separate PR |
c4bb1f4
to
2938c08
Compare
12040da
to
452b16b
Compare
Motivation
Currently, when an exception happens during a consensus update, it marks the tx as voided with a custom marker and continues to operate. This operation may be faulty, though, as the database is likely in an undesired state. For example, if such exception happens when a block is received, no following block will be accepted and the full node will not be able to sync anymore. If the full node is manually stopped and restarted, it starts up but continues to be unable to sync.
This PR's goal is to, instead, completely halt and exit the full node in those cases, forcing manual intervention. When some exception happens during a consensus update, the full node process exits with a non-zero exit code. This also guarantees that the database is marked as corrupted, so the full node cannot be restarted normally, only by a full verification or new database.
This PR is only more restrictive than what's currently implemented, and will be necessary for the Feature Activation for Transactions.
Acceptance Criteria
ExecutionManager
withcrash_and_exit()
method.EventManager
andTransactionStorage
for dealing with a full node crash.Checklist
master
, confirm this code is production-ready and can be included in future releases as soon as it gets merged