[Merged by Bors] - Revert fork choice if disk write fails #2068

michaelsproul · 2020-12-08T07:24:50Z

Issue Addressed

Closes #2028
Replaces #2059

Proposed Changes

If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028).

Additional Info

This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see Handle ungraceful shutdown on first startup #2067)
This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk and to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of Remove head tracker in favour of fork choice #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).

michaelsproul · 2020-12-09T00:27:54Z

I've been thinking more about whether to reset the head tracker as well, and I think I'm leaning towards not resetting it. If we leave it intact then we won't forget about any blocks written to disk, and can prune them later. Compare this to if we do reset: where blocks get forgotten about on disk and potentially bloat the database indefinitely.

paulhauner

This seems like an easy and effective solution, nice idea!

Apart from the head tracker, I don't think this creates any new assumptions. It will still present as the same error to downstream components (p2p, api ,etc), which is good. That being said, reverting fork choice will have some effects on downstream components, e.g:

Peers to be banned on the p2p network because we may interpret some blocks as "parent unknown" and therefore not a valid chain. I think this is a little contrived though.
Attestations from p2p and the api might be rejected. This could result in missed attestations by a validator.

Although these outcomes aren't desirable, they come about as the result of an IO error that should only happen when there's underlying hardware issues. These changes allow us to go from stalling to staying online, which is a huge leap. Going from staying online to operating perfectly would be a lot of work with marginal gains, IMO.

I'm happy to merge this. I made one comment, it's not a blocker though.

paulhauner · 2020-12-09T04:02:51Z

beacon_node/beacon_chain/src/beacon_chain.rs

+                    crit!(
+                        self.log,
+                        "No stored fork choice found to restore from";
+                        "warning" => "The database is likely corrupt now"


Perhaps we should mention --purge-db here?

michaelsproul · 2020-12-09T05:10:16Z

bors r+

## Issue Addressed Closes #2028 Replaces #2059 ## Proposed Changes If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028). ## Additional Info * This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see #2067) * This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk _and_ to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).

bors · 2020-12-09T06:06:30Z

Pull request successfully merged into unstable.

Build succeeded:

Revert fork choice if disk write fails

b71f112

michaelsproul added work-in-progress PR is a work-in-progress database A0 labels Dec 8, 2020

Satisfy Clippy

63facbd

michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Dec 9, 2020

michaelsproul requested a review from paulhauner December 9, 2020 00:28

paulhauner approved these changes Dec 9, 2020

View reviewed changes

Suggest --purge-db

fe764f7

michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Dec 9, 2020

bors bot changed the title ~~Revert fork choice if disk write fails~~ [Merged by Bors] - Revert fork choice if disk write fails Dec 9, 2020

bors bot closed this Dec 9, 2020

michaelsproul deleted the fork-choice-revert branch December 9, 2020 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Revert fork choice if disk write fails #2068

[Merged by Bors] - Revert fork choice if disk write fails #2068

michaelsproul commented Dec 8, 2020

michaelsproul commented Dec 9, 2020

paulhauner left a comment

paulhauner Dec 9, 2020

michaelsproul Dec 9, 2020

michaelsproul commented Dec 9, 2020

bors bot commented Dec 9, 2020

[Merged by Bors] - Revert fork choice if disk write fails #2068

[Merged by Bors] - Revert fork choice if disk write fails #2068

Conversation

michaelsproul commented Dec 8, 2020

Issue Addressed

Proposed Changes

Additional Info

michaelsproul commented Dec 9, 2020

paulhauner left a comment

Choose a reason for hiding this comment

paulhauner Dec 9, 2020

Choose a reason for hiding this comment

michaelsproul Dec 9, 2020

Choose a reason for hiding this comment

michaelsproul commented Dec 9, 2020

bors bot commented Dec 9, 2020