-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Revert fork choice if disk write fails #2068
Conversation
I've been thinking more about whether to reset the head tracker as well, and I think I'm leaning towards not resetting it. If we leave it intact then we won't forget about any blocks written to disk, and can prune them later. Compare this to if we do reset: where blocks get forgotten about on disk and potentially bloat the database indefinitely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an easy and effective solution, nice idea!
Apart from the head tracker, I don't think this creates any new assumptions. It will still present as the same error to downstream components (p2p, api ,etc), which is good. That being said, reverting fork choice will have some effects on downstream components, e.g:
- Peers to be banned on the p2p network because we may interpret some blocks as "parent unknown" and therefore not a valid chain. I think this is a little contrived though.
- Attestations from p2p and the api might be rejected. This could result in missed attestations by a validator.
Although these outcomes aren't desirable, they come about as the result of an IO error that should only happen when there's underlying hardware issues. These changes allow us to go from stalling to staying online, which is a huge leap. Going from staying online to operating perfectly would be a lot of work with marginal gains, IMO.
I'm happy to merge this. I made one comment, it's not a blocker though.
crit!( | ||
self.log, | ||
"No stored fork choice found to restore from"; | ||
"warning" => "The database is likely corrupt now" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should mention --purge-db
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
bors r+ |
## Issue Addressed Closes #2028 Replaces #2059 ## Proposed Changes If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028). ## Additional Info * This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see #2067) * This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk _and_ to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).
Pull request successfully merged into unstable. Build succeeded: |
Issue Addressed
Closes #2028
Replaces #2059
Proposed Changes
If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028).
Additional Info