Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Revert fork choice if disk write fails #2068

Closed
wants to merge 3 commits into from

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #2028
Replaces #2059

Proposed Changes

If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028).

Additional Info

  • This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see Handle ungraceful shutdown on first startup #2067)
  • This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk and to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of Remove head tracker in favour of fork choice #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress database A0 labels Dec 8, 2020
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Dec 9, 2020
@michaelsproul
Copy link
Member Author

I've been thinking more about whether to reset the head tracker as well, and I think I'm leaning towards not resetting it. If we leave it intact then we won't forget about any blocks written to disk, and can prune them later. Compare this to if we do reset: where blocks get forgotten about on disk and potentially bloat the database indefinitely.

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an easy and effective solution, nice idea!

Apart from the head tracker, I don't think this creates any new assumptions. It will still present as the same error to downstream components (p2p, api ,etc), which is good. That being said, reverting fork choice will have some effects on downstream components, e.g:

  • Peers to be banned on the p2p network because we may interpret some blocks as "parent unknown" and therefore not a valid chain. I think this is a little contrived though.
  • Attestations from p2p and the api might be rejected. This could result in missed attestations by a validator.

Although these outcomes aren't desirable, they come about as the result of an IO error that should only happen when there's underlying hardware issues. These changes allow us to go from stalling to staying online, which is a huge leap. Going from staying online to operating perfectly would be a lot of work with marginal gains, IMO.

I'm happy to merge this. I made one comment, it's not a blocker though.

crit!(
self.log,
"No stored fork choice found to restore from";
"warning" => "The database is likely corrupt now"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should mention --purge-db here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Dec 9, 2020
@michaelsproul
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Dec 9, 2020
## Issue Addressed

Closes #2028
Replaces #2059

## Proposed Changes

If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028).

## Additional Info

* This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see #2067)
* This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk _and_ to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).
@bors bors bot changed the title Revert fork choice if disk write fails [Merged by Bors] - Revert fork choice if disk write fails Dec 9, 2020
@bors bors bot closed this Dec 9, 2020
@michaelsproul michaelsproul deleted the fork-choice-revert branch December 9, 2020 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0 database ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants