-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trie/triedb/pathdb: improve dirty node flushing trigger #28426
Conversation
ddb1dc8
to
665597c
Compare
return nil, err | ||
} | ||
// To remove outdated history objects from the end, we set the 'tail' parameter | ||
// to 'oldest-1' due to the offset between the freezer index and the history ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting implicit thing here is that nodebuffer.flush() will "surely" push the disk layer beyond oldest-1
. This is kind of true, most of the time, since only the 128 diff layers re main after the flush and limit
in theory is more like 90K.
Thus two questions/requests:
- Would be nice to maybe mention this fact in the comments.
- What happens if limit is configured to be 64? Perhaps we should forbid the limit being below the diff layer count (apart from 0 meaning infinite)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no oldest = bottom.stateID() - limit + 1
, so flushing everything will move the disk layer to bottom.stateID()
.
In that case oldest - 1
will be bottom.stateID() - limit + 1 - 1
== bottom.stateID() - limit
. So anything above 0 limit should be ok.
trie/triedb/pathdb/disklayer.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
log.Debug("Prune state history", "number", pruned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Pruned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also perhaps "number", use "stateid". We usually use number for block numbers and it's going to be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number here prefers to the number of history get pruned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Debug("Pruned state history", "items", pruned, "tailid", oldest)
i will fix the log with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* trie/triedb/pathdb: improve dirty node flushing trigger * trie/triedb/pathdb: add tests * trie/triedb/pathdb: address comment
…reum#28426)" This reverts commit e47ad2f.
…reum#28426)" This reverts commit e47ad2f.
* trie/triedb/pathdb: improve dirty node flushing trigger * trie/triedb/pathdb: add tests * trie/triedb/pathdb: address comment
* trie/triedb/pathdb: improve dirty node flushing trigger * trie/triedb/pathdb: add tests * trie/triedb/pathdb: address comment
This pull request fixes an edge case in state history management.
In the context of the path model, Geth maintains a list of state histories to facilitate the rollback of the persistent state when needed. To prevent uncontrolled expansion of state history, Geth also provides a mechanism to prune the oldest state histories.
Because Geth manages the state layer in a tree structure, whenever a new layer is piled on top, it will trigger the merging of bottom diff layer with disk layer. This merging operation also involves the construction of the corresponding state history of the bottom diff layer and the truncation of oldest state histories.
However, a potential issue can happen if an unclean shutdown occurs after persisting the state history but before flushing the cached states to disk. In this scenario, we've defined a recovery mechanism to truncate any excess state history above the disk layer during the next restart, ensuring that the state history always aligns with the disk layer.
Now, let's discuss the specific edge case we're addressing. When we flush a state history, this process implicitly triggers the removal of the oldest history objects (tail truncation). The concern is what happens if the new tail history object is even newer than the persistent state. In such a situation, the recovery mechanism fails after an unclean shutdown because all state histories are now newer than the persistent state, it's impossible to align the state history and disk layer anymore.
The fix for this edge case is to introduce a guarantee that the state history always cover the persistent state. To achieve this, we've added a new condition that enforces the flushing of the cached state if this guarantee is not met. Specifically, if the oldest history after tail truncation is higher than persistent state, forcibly flush the cached states before the tail truncation.