Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockchain: Convert to direct single-step reorgs. #1500

Merged
merged 1 commit into from
Nov 12, 2018

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Oct 16, 2018

This requires PR #1471.

This modifies the chain reorganization logic to directly perform the reorg one block at a time with rollback in the case of failure, as opposed to the existing memory-based two-step approach, so that it is more optimized for the typical case, better handles large reorgs, gives the ability to implement better caching strategies, and helps provide a path to decouple the chain processing and connection code from the download logic. It also removes the cached stxos from the view since the aforementioned changes make them no longer necessary.

A side effect of these changes is that it is no longer possible to know if a reorg will succeed before actually performing it, so the NTReorganization notification is now sent after a successful reorg. The notification really should have been sent after the reorg before anyway.

Prior to these changes, chain reorganization used a two-step approach such that the first step involved checking all of the blocks along the reorg path in memory and then actually performing the reorg in a second step if those checks succeeded. While that approach does have some benefits in terms of avoiding any intermediate mutation to the current best chain for failed reorgs, and thus not requiring a rollback in that case, it also has some disadvantages such as not scaling well with large reorgs, being more difficult to make use of different caching strategies, and hindering the ability to decouple the connection code from the download logic.

In a certain sense, the approach this replaces assumed that a reorg would fail and took measures to detect that condition prior to performing the reorg, while the new approach assumes the reorg will succeed and rolls back the changes in the very rare case it doesn't. This is an acceptable and safe assumption because the proof-of-work requirements make it exceedingly expensive to create blocks that are valid enough to trigger a reorg yet ultimately end up failing to connect, thus miners are heavily disincentivized from creating such invalid blocks and attackers are also unable to easily create such blocks either. Even in the case of attack, the only result would be nodes performing slightly more database updates than the existing approach.

The following results show the difference between performing the large reorg full block tests before and after these changes:

before: 4.3GB memory usage, 2m18.629s to complete
after:  2.8GB memory usage, 2m04.056s to complete

As can be seen, the new approach takes much less memory and is also a bit faster as well.

This is work towards #1145.

@davecgh davecgh added this to the 1.4.0 milestone Oct 16, 2018
@davecgh davecgh mentioned this pull request Oct 16, 2018
33 tasks
@davecgh davecgh force-pushed the blockchain_single_step_reorg branch from 3cd19f3 to 4371c09 Compare October 16, 2018 19:46
Copy link
Member

@dajohi dajohi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testnet miner tOK

@davecgh davecgh force-pushed the blockchain_single_step_reorg branch from 4371c09 to 49c997e Compare October 16, 2018 20:56
blockchain/chain.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_single_step_reorg branch from 49c997e to aeb7351 Compare October 16, 2018 21:24
@davecgh davecgh force-pushed the blockchain_single_step_reorg branch 4 times, most recently from de7528d to 0609124 Compare October 16, 2018 22:38
This modifies the chain reorganization logic to directly perform the
reorg one block at a time with rollback in the case of failure, as
opposed to the existing memory-based two-step approach, so that it is
more optimized for the typical case, better handles large reorgs, gives
the ability to implement better caching strategies, and helps provide a
path to decouple the chain processing and connection code from the
download logic.  It also removes the cached stxos from the view since
the aforementioned changes make them no longer necessary.

A side effect of these changes is that it is no longer possible to know
if a reorg will succeed before actually performing it, so the
NTReorganization notification is now sent after a successful reorg.  The
notification really should have been sent after the reorg before anyway.

Prior to these changes, chain reorganization used a two-step approach
such that the first step involved checking all of the blocks along the
reorg path in memory and then actually performing the reorg in a second
step if those checks succeeded.  While that approach does have some
benefits in terms of avoiding any intermediate mutation to the current
best chain for failed reorgs, and thus not requiring a rollback in that
case, it also has some disadvantages such as not scaling well with large
reorgs, being more difficult to make use of different caching
strategies, and hindering the ability to decouple the connection code
from the download logic.

In a certain sense, the approach this replaces assumed that a reorg
would fail and took measures to detect that condition prior to
performing the reorg, while the new approach assumes the reorg will
succeed and rolls back the changes in the very rare case it doesn't.
This is an acceptable and safe assumption because the proof-of-work
requirements make it exceedingly expensive to create blocks that are
valid enough to trigger a reorg yet ultimately end up failing to
connect, thus miners are heavily disincentivized from creating such
invalid blocks and attackers are also unable to easily create such
blocks either.  Even in the case of attack, the only result would be
nodes performing slightly more database updates than the existing
approach.

The following results show the difference between performing the large
reorg full block tests before and after these changes:

before: 4.3GB memory usage, 2m18.629s to complete
after:  2.8GB memory usage, 2m04.056s to complete

As can be seen, the new approach takes much less memory and is also a
bit faster as well.
@davecgh davecgh force-pushed the blockchain_single_step_reorg branch from 568e9a5 to 9c4569a Compare November 12, 2018 22:00
@davecgh davecgh merged commit 9c4569a into decred:master Nov 12, 2018
@davecgh davecgh deleted the blockchain_single_step_reorg branch November 12, 2018 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants