Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trie/triedb/pathdb: improve dirty node flushing trigger #28426

Merged
merged 3 commits into from
Oct 31, 2023

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Oct 27, 2023

This pull request fixes an edge case in state history management.

In the context of the path model, Geth maintains a list of state histories to facilitate the rollback of the persistent state when needed. To prevent uncontrolled expansion of state history, Geth also provides a mechanism to prune the oldest state histories.

Because Geth manages the state layer in a tree structure, whenever a new layer is piled on top, it will trigger the merging of bottom diff layer with disk layer. This merging operation also involves the construction of the corresponding state history of the bottom diff layer and the truncation of oldest state histories.

However, a potential issue can happen if an unclean shutdown occurs after persisting the state history but before flushing the cached states to disk. In this scenario, we've defined a recovery mechanism to truncate any excess state history above the disk layer during the next restart, ensuring that the state history always aligns with the disk layer.

Now, let's discuss the specific edge case we're addressing. When we flush a state history, this process implicitly triggers the removal of the oldest history objects (tail truncation). The concern is what happens if the new tail history object is even newer than the persistent state. In such a situation, the recovery mechanism fails after an unclean shutdown because all state histories are now newer than the persistent state, it's impossible to align the state history and disk layer anymore.

The fix for this edge case is to introduce a guarantee that the state history always cover the persistent state. To achieve this, we've added a new condition that enforces the flushing of the cached state if this guarantee is not met. Specifically, if the oldest history after tail truncation is higher than persistent state, forcibly flush the cached states before the tail truncation.

@rjl493456442 rjl493456442 marked this pull request as ready for review October 27, 2023 12:03
@rjl493456442 rjl493456442 added this to the 1.13.5 milestone Oct 30, 2023
return nil, err
}
// To remove outdated history objects from the end, we set the 'tail' parameter
// to 'oldest-1' due to the offset between the freezer index and the history ID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting implicit thing here is that nodebuffer.flush() will "surely" push the disk layer beyond oldest-1. This is kind of true, most of the time, since only the 128 diff layers re main after the flush and limit in theory is more like 90K.

Thus two questions/requests:

  • Would be nice to maybe mention this fact in the comments.
  • What happens if limit is configured to be 64? Perhaps we should forbid the limit being below the diff layer count (apart from 0 meaning infinite)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no oldest = bottom.stateID() - limit + 1, so flushing everything will move the disk layer to bottom.stateID().

In that case oldest - 1 will be bottom.stateID() - limit + 1 - 1 == bottom.stateID() - limit. So anything above 0 limit should be ok.

if err != nil {
return nil, err
}
log.Debug("Prune state history", "number", pruned)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Pruned

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also perhaps "number", use "stateid". We usually use number for block numbers and it's going to be confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number here prefers to the number of history get pruned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Debug("Pruned state history", "items", pruned, "tailid", oldest) i will fix the log with this.

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karalabe karalabe merged commit ea2e66a into ethereum:master Oct 31, 2023
2 checks passed
devopsbo3 pushed a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
* trie/triedb/pathdb: improve dirty node flushing trigger

* trie/triedb/pathdb: add tests

* trie/triedb/pathdb: address comment
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
Dergarcon pushed a commit to specialmechanisms/mev-geth-0x2mev that referenced this pull request Jan 31, 2024
* trie/triedb/pathdb: improve dirty node flushing trigger

* trie/triedb/pathdb: add tests

* trie/triedb/pathdb: address comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants