Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use persistent data structures in fork choice #2059

Closed
wants to merge 1 commit into from

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #2028

Proposed Changes

This PR fixes issues with fork choice becoming out of sync with the database after failed writes (the root cause of #2028). It does this using quite a drastic approach: rather than mutating fork choice directly, we take a clone of it, mutate the clone, and then atomically update the original once the database write has succeeded. In order to make the clone cheap(er), we use persistent data structures from the im crate.

In order to make this work, I've added SSZ implementations for Arc and Vector.

Additional Info

I'm still assessing the performance and memory impact of this change. We should definitely test it for a substantial period of time on our Pyrmont nodes to ensure the performance penalty is tolerable.

@michaelsproul
Copy link
Member Author

Bugger, looks like cargo audit is failing for good reason 😭

bodil/im-rs#153

I'll investigate further tomorrow

@michaelsproul
Copy link
Member Author

Alternative solutions to consider, in roughly improving order:

  1. Commit to disk before doing any of the fork choice checks or modifications (messy?)
  2. Clone fork choice before each mutation without making it persistent (slow?)
  3. Rollback the in-memory fork-choice to the version from disk if the write fails (maybe OK?)

Or, run on a forked + fixed version of im...

@michaelsproul
Copy link
Member Author

It seems to be about 2x slower in the average case too, which is pretty bad. The graph below shows a Pyrmont node syncing with this branch (13:00-16:45), and keeping up with the head. At 18:30 I switched to v1.0.3 for comparison, and you can see average fork choice time dropped from around 40ms to 20ms. At 22:30 I switched back to this branch, and you can see the runtime doubled again. The peaks are also a bit higher.

fork-choice

I'm going to shelve this approach for now and investigate alternatives.

bors bot pushed a commit that referenced this pull request Dec 9, 2020
## Issue Addressed

Closes #2028
Replaces #2059

## Proposed Changes

If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028).

## Additional Info

* This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see #2067)
* This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk _and_ to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).
@michaelsproul michaelsproul deleted the persistent-fork-choice branch February 15, 2021 21:58
@michaelsproul michaelsproul added the consensus An issue/PR that touches consensus code, such as state_processing or block verification. label Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0 consensus An issue/PR that touches consensus code, such as state_processing or block verification. work-in-progress PR is a work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant