core/state/snapshot: ensure Cap retains a min number of layers #22331
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a relatively tiny change with non-obvious refactors.
Fixes an issue where the snapshot based state pruner fails due to not enough diff layers.
The snapshots have a
Cap
functionality, which takes the snapshot tree (disk layer + diff layer tree on top) and limits it to a number of requested layers, merging anything above the limit downwards. The goal ofCap
is to ensure that we always have 128 recent block's state available in memory, but anything above that can be pushed to disk. Turns out the definition of "retain 128 layers" is not that obvious.Before diving into the problem, an important detail is that although snapshots have 2 types of layers (diskLayer backed by the database and diffLayer backed by memory), it actually differentiates the bottom most diffLayer from the others on top. In theory, every time we have more than 128 diff layers, we could push the bottom ones disk. This, however, would result in thrashing the database, since each new block would push some old diff to disk. In practice, whenever the layer cap is exceeded, we accumulate the writes into the bottom-most diff. Only when this bottom diff layer exceeds a threshold of 4MB, is it pushed to disk.
That catch 22 is that we need to merge data into the bottom diff layer (accumulator) first, and only then will we know if it overflows. This means that when we request 128 layers from
Cap
, it does not know in advance if the accumulator will survive, and as such will be unable to retain exactly 128 layers. The current behavior ofcap
(lower case) was to retain at max 128, but if the accumulator overflown, it retained only 127.This is an issue from Geth's perspective, because Geth's assumption is that there are always 128 recent snapshots available. This invariant is not necessarily broken after a flatten
cap({stale disk layer, accumulator layer, diff layers}) == {recent disk layer, diff layers}
; however seeing just a{disk layer, diff layers}
set without context, we can't say if the disk layer is stale or not (it is the parent of the first diff, we just don't know if the first diff is an accumulator covering a range of blocks or a single block).Long story short, if Geth needs a guarantee that there are always 128 recent states tracked by the snapshotter, we can't rely on the accumulator layer in that 128 because it may or may not survive a
Cap
operation. The naive solution would be to update Geth to maintain 129 layers, but that seems like fixing the symptom instead of the problem.This PR solves the actual problem by changing
Cap
so that instead of including the accumulator layer (which may or may not survive the cap) in the layer count, it excludes it. The result is the same that instead of retaining 128 or 127 diff layers, it will now retain 129 by default and occasionally 128.An interesting this is the corner case handling when requesting 0 or 1 layers. Previously
0
flattened everything into the disk layer and1
flattened everything into the accumulator, potentially ending up with0
either way if the accumulator became too large. The PR also changes this so that0
retains the same meaning (otherwise there's no way to flatten everything + the effect is mostly the same, we get the same root availability), but1
becomes 1 diff layer and maybe an optional accumulator. It becomes impossible to have a call that "merge everything into the accumulator". I think it's acceptable, these special cases were mostly used by the tests and we just need to ensure all strange cornercases are covered.With these ever so slight semantic modifications, the tests needed some extensions and heavier changes to make sure they still behave the same way they intended to.