core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie #28327

rjl493456442 · 2023-10-13T06:30:49Z

This pull request implements two optional features for stacktrie:

Filtering out the boundary nodes to avoid committing "incomplete" ones
Detecting dangling nodes fall within the path covered by an extension node

These two features can be used to enhance the snap sync in the case of path mode.

Why to filter out boundary nodes?

In the snap sync, a large trie is separated into several chunks(usually 16) to sync concurrently. For single chunk, there is a corresponding stack trie to accept the states within this range and build the merkle trie nodes on top.

However, the lack of states ahead and behind, the generated merkle trie nodes on boundary are incomplete. They do not match the nodes in the full trie.

In this example above, the trie is divided into three ranges. The partial trie of these ranges will generate incorrect boundary nodes. e.g. node A was in a deeper position, but now moved to a higher level. The root node is also incorrect due to missing children.

How to detect boundary nodes?

All nodes along the path of the 'first-inserted-state' node key are considered left-boundary nodes.
All nodes along the path of the 'last-inserted-state' node key are considered right-boundary nodes.

How can we guarantee that nodes except the left and right boundaries are all correct?

Because these states fall within the range are complete. The first and last states can uniquely determine the position of the subtrie which contains all the states in the middle. In another word, the subtrie of the internal states(except first and last) is completely consistent with the one in the full trie, and the position of this subtrie can be uniquely determined by the boundary states.

Therefore, we can conclude that the nodes except the left and right boundaries are all correct.

Why to remove the dangling nodes fall within the range covered by extension node?

Since the snap syncer might sync the storage trie multiple times due to sync cycle termination and resumption, and the sync target can change because of pivot movement. This scenario can occur which disrupts state healing.

In cycle 1, the storage trie appears as follows. The snap syncer successfully retrieved the entire storage trie and stored the nodes in the database.

However, cycle 1 was terminated due to a pivot movement at that time, with an incomplete storage trie of another account ahead. Consequently, this account range will need to be reprocessed in the next cycle, and, if the storage trie has changed, it will be resynced.

In cycle 2, the shape of the storage trie changes. Specifically, the short node at [0, 1, 2, 3, 5] is modified, and a few node branches are removed, causing nodes M-3 and M-4 to become dangling.

When the snap sync is completed, the state healer begins filling in the missing nodes. If, at that time, the storage reverts to the state it was in during cycle 1, the state healer will stop at M-3. This is because the state healer operates under the assumption that if a node exists, the entire sub-trie should also be present. However, in this case, M-3 is merely a dangling node, and the corresponding sub-trie is incomplete. For instance, the node at the path [0, 1, 2, 3, 5] is N-4, does not matching with M-3.

Thus, we can conclude that whenever we commit nodes into the database, we must ensure that the entire path space is uniquely occupied and remove all the dangling nodes within that path space.

Overhead analysis

In order to detect the dangling nodes, a lot of database reads will be conducted. The statistics from a live sync show that 950K database reads are performed in total(and nothing detected, kind of expected, very rare to occur).

Although the overhead is not trivial, but also not significant. We have to live with it to avoid corruption that can happen in a very low possibility.

holiman · 2023-10-13T14:09:26Z

If you first make a standalone PR containing only the writeFn -> options restructure, we could probably get that merged pretty quickly

holiman · 2023-10-16T09:35:14Z

trie/stacktrie.go

+	NoLeftBoundary  bool                                             // Flag whether the nodes on the left boundary are skipped for committing
+	NoRightBoundary bool                                             // Flag whether the nodes on the right boundary are skipped for committing


Can't we have

leftBoundary []byte // left boundary (may be nil). Must be hex-encoded path to left boundary, if set. rightBoundary []byte // right boundary (may be nil). Must be hex-encoded path to left boundary, if set.

And also have a SetLeftBoundary(path []byte) which we can invoke when we insert the first item?

In this approach, the left boundary should always be the first entry, namely the first entry along with its path should be filtered out as boundary.

We can't use Origin as the boundary to also keep the first entry, because this is no guarantee that the first entry will be at the same position as the one in full tree.

The same logic can apply for the right boundary. The last entry should also be excluded.

I will try to response with some more logical words.

I will try to response with some more logical words.

No that's fine, I understand the problem and what you are saying. But I don't see why we would need a NoLeftBoundary boolean: we could just let a nil border signify NoLeftBoundary.

So in my suggestion: we would explicitly set the left boundary to the first element we feed it (or, if initiated with a proof, then we'd set it to the origin). And vice versa with right boundary.

sounds good to me.

sounds good to me.

Heh, just as I changed my mind, and made a PR (#28361) with lastPath in the stacktrie, thus necessitating having the booleans in the options :)

trie/stacktrie.go

rjl493456442 · 2023-10-17T06:08:33Z

In a snap sync, there are 267k boundary nodes filtered out. Also in order to detect dangling nodes, 956K lookups are performed(no dangling detected).

trie/stacktrie.go

holiman · 2023-10-17T07:28:56Z

trie/stacktrie.go

 	}
+	t.last = append([]byte{}, k...)


It would be neat to not have to copy this (since it's an extra copy for every inserted item). I think it's not needed, the k is freshly allocated in this scope (not external access), and even though a stNode may have a reference to it, I don't think it will ever be modified.

Alternatively, we could have t.last be initialized as a e..g 32-byte slice, and then we could do t.last = append(t.last[:0], k...) or something to that effect

I am not 100% confident about it. Because we do hexToCompactInPlace for extension and leaf node, which mutates the key slice directly.

and for leaf node, it will directly keep the passed key slice when we construct it.

case emptyNode: /* Empty */ st.typ = leafNode st.key = key st.val = value

But I can definitely go with second approach, getting rid of slice allocation.

Personally I don't like hexToCompactInPlace optimization, because it might introduce some nasty issues which are pretty hard to debug.

Personally I don't like hexToCompactInPlace optimization, because it might introduce some nasty issues which are pretty hard to debug.

I agree!

eth/protocols/snap/sync.go

holiman

On the whole looks pretty good to me

holiman · 2023-10-18T06:03:59Z

eth/protocols/snap/sync.go

+							options = options.WithCleaner(func(path []byte) {
+								s.cleanPath(subtask.genBatch, owner, path)
+							})
+							options = options.SkipBoundary(true, true, func(path []byte, hash common.Hash, blob []byte) {


The cleaner should only be used in path mode, but the SkipBoundary might aswell used regardless, no?

Theoretically SkipBoundary can be used in hash. But I won't do it now to avoid some unexpected node missing issue.

eth/protocols/snap/sync.go

trie/stacktrie.go

rjl493456442 · 2023-10-18T07:47:25Z

@holiman

In this case we won't start the whole "16 parallel task resolver" (no chunks) just to fetch it, but we will issue one more request.

I don't think so if i understand correctly. If we just receive a part of storage slots, we will switch to chunk-mode anyway. Because we have no idea how many slots left, we just blindly create 16 chunks to fetch concurrently.

holiman · 2023-10-18T08:03:43Z

Because we have no idea how many slots left, we just blindly create 16 chunks to fetch concurrently.

if estimate, err := estimateRemainingSlots(len(keys), lastKey); err == nil {
						if n := estimate / (2 * (maxRequestSize / 64)); n+1 < chunks {
							chunks = n + 1
						}

if estimate is small enough then n is 0, and chunks becomes 1. SO

					r := newHashRange(lastKey, chunks)

We get a new range starting at the last key, ending in 0xfff..f. We immediately commit the current data into the stacktrie, and once we get the next packet, it will complete the range and the hash will match up with the storage root.

rjl493456442 · 2023-10-18T12:24:57Z

Ah interesting, I missed that mechanism, thanks for pointing it out.

eth/protocols/snap/sync.go

holiman

LGTM

eth/protocols/snap/sync_test.go

karalabe · 2023-10-20T10:21:35Z

eth/protocols/snap/sync.go

+	if owner != (common.Hash{}) && rawdb.ExistsStorageTrieNode(s.db, owner, path) {
+		rawdb.DeleteStorageTrieNode(batch, owner, path)
+		deletionGauge.Inc(1)
+	}


Out of curiosity, would a blind delete be more expensive vs the current check-and-delete?

I think check-and-delete is more expensive. However, the overhead is accepted, especially when we have the pebble fixed(start to use bloom filter).

I don't have strong opinion, but this current approach we can expose more information to metrics(e.g. how many dangling nodes we really detect).

It's a good question, and def not straight-forward answer. blind delete would put a bunch of tombstones in level0, so it's definitely not a given that it would be faster -- and if it is, it might make other parts slower due to the tombstone processing during e.g. compaction.

trie/stacktrie.go

eth/protocols/snap/sync.go

Co-authored-by: Martin Holst Swende <martin@swende.se>

rjl493456442 · 2023-10-20T13:27:09Z

@holiman @karalabe deployed it on 5 to do a final snap sync.

karalabe

SGTM

…in stacktrie (ethereum#28327) * core, eth, trie: filter out boundary nodes in stacktrie * eth/protocol/snap: add comments * Update trie/stacktrie.go Co-authored-by: Martin Holst Swende <martin@swende.se> * eth, trie: remove onBoundary callback * eth/protocols/snap: keep complete boundary nodes * eth/protocols/snap: skip healing if the storage trie is already complete * eth, trie: add more metrics * eth, trie: address comment --------- Co-authored-by: Martin Holst Swende <martin@swende.se>

…g nodes in stacktrie (ethereum#28327)" This reverts commit 6fe5774.

…in (ethereum#28327)

…in stacktrie (ethereum#28327) * core, eth, trie: filter out boundary nodes in stacktrie * eth/protocol/snap: add comments * Update trie/stacktrie.go Co-authored-by: Martin Holst Swende <martin@swende.se> * eth, trie: remove onBoundary callback * eth/protocols/snap: keep complete boundary nodes * eth/protocols/snap: skip healing if the storage trie is already complete * eth, trie: add more metrics * eth, trie: address comment --------- Co-authored-by: Martin Holst Swende <martin@swende.se>

rjl493456442 force-pushed the stacktrie-pathcleaner branch from 42b088d to 5f68032 Compare October 16, 2023 07:52

holiman reviewed Oct 16, 2023

View reviewed changes

trie/stacktrie.go Outdated Show resolved Hide resolved

rjl493456442 force-pushed the stacktrie-pathcleaner branch from 3915540 to cd40e91 Compare October 17, 2023 03:26

rjl493456442 changed the title ~~core, eth, trie: clean up dangling nodes on the node path~~ core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie Oct 17, 2023

holiman reviewed Oct 17, 2023

View reviewed changes

trie/stacktrie.go Outdated Show resolved Hide resolved

holiman reviewed Oct 17, 2023

View reviewed changes

rjl493456442 force-pushed the stacktrie-pathcleaner branch from 005d8d0 to 9454e72 Compare October 17, 2023 12:58

rjl493456442 marked this pull request as ready for review October 17, 2023 12:59

rjl493456442 requested a review from karalabe as a code owner October 17, 2023 12:59

rjl493456442 commented Oct 18, 2023

View reviewed changes

eth/protocols/snap/sync.go Show resolved Hide resolved

holiman reviewed Oct 18, 2023

View reviewed changes

eth/protocols/snap/sync.go Outdated Show resolved Hide resolved

holiman approved these changes Oct 19, 2023

View reviewed changes

holiman reviewed Oct 19, 2023

View reviewed changes

eth/protocols/snap/sync_test.go Outdated Show resolved Hide resolved

holiman added this to the 1.13.5 milestone Oct 19, 2023

karalabe reviewed Oct 20, 2023

View reviewed changes

trie/stacktrie.go Outdated Show resolved Hide resolved

karalabe reviewed Oct 20, 2023

View reviewed changes

eth/protocols/snap/sync.go Outdated Show resolved Hide resolved

karalabe reviewed Oct 20, 2023

View reviewed changes

eth/protocols/snap/sync.go Outdated Show resolved Hide resolved

rjl493456442 and others added 5 commits October 20, 2023 21:20

core, eth, trie: filter out boundary nodes in stacktrie

03e5cfa

eth/protocol/snap: add comments

3be59e3

Update trie/stacktrie.go

c884b4d

Co-authored-by: Martin Holst Swende <martin@swende.se>

eth, trie: remove onBoundary callback

a43ea13

eth/protocols/snap: keep complete boundary nodes

7e41bee

rjl493456442 added 3 commits October 20, 2023 21:20

eth/protocols/snap: skip healing if the storage trie is already complete

a155a06

eth, trie: add more metrics

10e4305

eth, trie: address comment

573676a

rjl493456442 force-pushed the stacktrie-pathcleaner branch from ae07a53 to 573676a Compare October 20, 2023 13:20

karalabe approved these changes Oct 23, 2023

View reviewed changes

karalabe merged commit ab04aeb into ethereum:master Oct 23, 2023
1 check passed

devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023

Revert "core, eth, trie: filter out boundary nodes and remove danglin…

26cd4e6

…g nodes in stacktrie (ethereum#28327)" This reverts commit 6fe5774.

devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023

Revert "core, eth, trie: filter out boundary nodes and remove danglin…

3d0d185

…g nodes in stacktrie (ethereum#28327)" This reverts commit 6fe5774.

BrewTestBot mentioned this pull request Nov 14, 2023

ethereum 1.13.5 Homebrew/homebrew-core#154261

Merged

rjl493456442 mentioned this pull request Nov 27, 2023

trie: remove inconsistent trie nodes during sync in path mode #28595

Merged

weiihann mentioned this pull request Dec 5, 2023

all: pull snap sync PRs from upstream v1.13.5 bnb-chain/bsc#2035

Merged

sduchesneau pushed a commit to streamingfast/go-ethereum that referenced this pull request Dec 19, 2023

core, eth, trie: filter out boundary nodes and remove dangling nodes …

d9873bb

…in (ethereum#28327)

rjl493456442 mentioned this pull request Apr 1, 2024

core, eth/protocols/snap, trie: implement gentrie #29313

Merged

Francesco4203 mentioned this pull request Nov 4, 2024

trie: pbss fix release v1.13.5 continue axieinfinity/ronin#621

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie #28327

core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie #28327

rjl493456442 commented Oct 13, 2023 •

edited

Loading

holiman commented Oct 13, 2023

holiman Oct 16, 2023

rjl493456442 Oct 16, 2023

rjl493456442 Oct 16, 2023

holiman Oct 17, 2023

rjl493456442 Oct 17, 2023

holiman Oct 17, 2023

rjl493456442 commented Oct 17, 2023 •

edited

Loading

holiman Oct 17, 2023

holiman Oct 17, 2023

rjl493456442 Oct 17, 2023

rjl493456442 Oct 17, 2023

holiman Oct 17, 2023

holiman left a comment

holiman Oct 18, 2023

rjl493456442 Oct 18, 2023

rjl493456442 commented Oct 18, 2023

holiman commented Oct 18, 2023

rjl493456442 commented Oct 18, 2023

holiman left a comment

karalabe Oct 20, 2023

rjl493456442 Oct 20, 2023

holiman Oct 20, 2023

rjl493456442 commented Oct 20, 2023

karalabe left a comment

		NoLeftBoundary bool // Flag whether the nodes on the left boundary are skipped for committing
		NoRightBoundary bool // Flag whether the nodes on the right boundary are skipped for committing

core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie #28327

core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie #28327

Conversation

rjl493456442 commented Oct 13, 2023 • edited Loading

holiman commented Oct 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjl493456442 commented Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holiman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjl493456442 commented Oct 18, 2023

holiman commented Oct 18, 2023

rjl493456442 commented Oct 18, 2023

holiman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjl493456442 commented Oct 20, 2023

karalabe left a comment

Choose a reason for hiding this comment

rjl493456442 commented Oct 13, 2023 •

edited

Loading

rjl493456442 commented Oct 17, 2023 •

edited

Loading