Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie #28327

Merged
merged 8 commits into from
Oct 23, 2023

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Oct 13, 2023

This pull request implements two optional features for stacktrie:

  • Filtering out the boundary nodes to avoid committing "incomplete" ones
  • Detecting dangling nodes fall within the path covered by an extension node

These two features can be used to enhance the snap sync in the case of path mode.


Why to filter out boundary nodes?

In the snap sync, a large trie is separated into several chunks(usually 16) to sync concurrently. For single chunk, there is a corresponding stack trie to accept the states within this range and build the merkle trie nodes on top.

However, the lack of states ahead and behind, the generated merkle trie nodes on boundary are incomplete. They do not match the nodes in the full trie.

截屏2023-10-17 下午12 14 05

In this example above, the trie is divided into three ranges. The partial trie of these ranges will generate incorrect boundary nodes. e.g. node A was in a deeper position, but now moved to a higher level. The root node is also incorrect due to missing children.

How to detect boundary nodes?

All nodes along the path of the 'first-inserted-state' node key are considered left-boundary nodes.
All nodes along the path of the 'last-inserted-state' node key are considered right-boundary nodes.

How can we guarantee that nodes except the left and right boundaries are all correct?

Because these states fall within the range are complete. The first and last states can uniquely determine the position of the subtrie which contains all the states in the middle. In another word, the subtrie of the internal states(except first and last) is completely consistent with the one in the full trie, and the position of this subtrie can be uniquely determined by the boundary states.

Therefore, we can conclude that the nodes except the left and right boundaries are all correct.


Why to remove the dangling nodes fall within the range covered by extension node?

Since the snap syncer might sync the storage trie multiple times due to sync cycle termination and resumption, and the sync target can change because of pivot movement. This scenario can occur which disrupts state healing.

In cycle 1, the storage trie appears as follows. The snap syncer successfully retrieved the entire storage trie and stored the nodes in the database.

However, cycle 1 was terminated due to a pivot movement at that time, with an incomplete storage trie of another account ahead. Consequently, this account range will need to be reprocessed in the next cycle, and, if the storage trie has changed, it will be resynced.

截屏2023-10-17 上午11 50 24

In cycle 2, the shape of the storage trie changes. Specifically, the short node at [0, 1, 2, 3, 5] is modified, and a few node branches are removed, causing nodes M-3 and M-4 to become dangling.

截屏2023-10-17 上午11 57 03

When the snap sync is completed, the state healer begins filling in the missing nodes. If, at that time, the storage reverts to the state it was in during cycle 1, the state healer will stop at M-3. This is because the state healer operates under the assumption that if a node exists, the entire sub-trie should also be present. However, in this case, M-3 is merely a dangling node, and the corresponding sub-trie is incomplete. For instance, the node at the path [0, 1, 2, 3, 5] is N-4, does not matching with M-3.

Thus, we can conclude that whenever we commit nodes into the database, we must ensure that the entire path space is uniquely occupied and remove all the dangling nodes within that path space.

Overhead analysis

In order to detect the dangling nodes, a lot of database reads will be conducted. The statistics from a live sync show that 950K database reads are performed in total(and nothing detected, kind of expected, very rare to occur).

Although the overhead is not trivial, but also not significant. We have to live with it to avoid corruption that can happen in a very low possibility.

@holiman
Copy link
Contributor

holiman commented Oct 13, 2023

If you first make a standalone PR containing only the writeFn -> options restructure, we could probably get that merged pretty quickly

@rjl493456442 rjl493456442 force-pushed the stacktrie-pathcleaner branch from 42b088d to 5f68032 Compare October 16, 2023 07:52
Comment on lines 37 to 38
NoLeftBoundary bool // Flag whether the nodes on the left boundary are skipped for committing
NoRightBoundary bool // Flag whether the nodes on the right boundary are skipped for committing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we have

leftBoundary []byte // left boundary (may be nil). Must be hex-encoded path to left boundary, if set. 
rightBoundary []byte // right boundary (may be nil). Must be hex-encoded path to left boundary, if set. 

And also have a SetLeftBoundary(path []byte) which we can invoke when we insert the first item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this approach, the left boundary should always be the first entry, namely the first entry along with its path should be filtered out as boundary.

We can't use Origin as the boundary to also keep the first entry, because this is no guarantee that the first entry will be at the same position as the one in full tree.

The same logic can apply for the right boundary. The last entry should also be excluded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to response with some more logical words.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to response with some more logical words.

No that's fine, I understand the problem and what you are saying. But I don't see why we would need a NoLeftBoundary boolean: we could just let a nil border signify NoLeftBoundary.

So in my suggestion: we would explicitly set the left boundary to the first element we feed it (or, if initiated with a proof, then we'd set it to the origin). And vice versa with right boundary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me.

Heh, just as I changed my mind, and made a PR (#28361) with lastPath in the stacktrie, thus necessitating having the booleans in the options :)

trie/stacktrie.go Outdated Show resolved Hide resolved
@rjl493456442 rjl493456442 force-pushed the stacktrie-pathcleaner branch from 3915540 to cd40e91 Compare October 17, 2023 03:26
@rjl493456442 rjl493456442 changed the title core, eth, trie: clean up dangling nodes on the node path core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie Oct 17, 2023
@rjl493456442
Copy link
Member Author

rjl493456442 commented Oct 17, 2023

截屏2023-10-17 下午2 08 26

In a snap sync, there are 267k boundary nodes filtered out. Also in order to detect dangling nodes, 956K lookups are performed(no dangling detected).

trie/stacktrie.go Outdated Show resolved Hide resolved
}
t.last = append([]byte{}, k...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be neat to not have to copy this (since it's an extra copy for every inserted item). I think it's not needed, the k is freshly allocated in this scope (not external access), and even though a stNode may have a reference to it, I don't think it will ever be modified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could have t.last be initialized as a e..g 32-byte slice, and then we could do t.last = append(t.last[:0], k...) or something to that effect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% confident about it. Because we do hexToCompactInPlace for extension and leaf node, which mutates the key slice directly.

and for leaf node, it will directly keep the passed key slice when we construct it.

	case emptyNode: /* Empty */
		st.typ = leafNode
		st.key = key
		st.val = value

But I can definitely go with second approach, getting rid of slice allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't like hexToCompactInPlace optimization, because it might introduce some nasty issues which are pretty hard to debug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't like hexToCompactInPlace optimization, because it might introduce some nasty issues which are pretty hard to debug.

I agree!

@rjl493456442 rjl493456442 force-pushed the stacktrie-pathcleaner branch from 005d8d0 to 9454e72 Compare October 17, 2023 12:58
@rjl493456442 rjl493456442 marked this pull request as ready for review October 17, 2023 12:59
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole looks pretty good to me

options = options.WithCleaner(func(path []byte) {
s.cleanPath(subtask.genBatch, owner, path)
})
options = options.SkipBoundary(true, true, func(path []byte, hash common.Hash, blob []byte) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleaner should only be used in path mode, but the SkipBoundary might aswell used regardless, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically SkipBoundary can be used in hash. But I won't do it now to avoid some unexpected node missing issue.

eth/protocols/snap/sync.go Show resolved Hide resolved
trie/stacktrie.go Outdated Show resolved Hide resolved
trie/stacktrie.go Outdated Show resolved Hide resolved
@rjl493456442
Copy link
Member Author

@holiman

In this case we won't start the whole "16 parallel task resolver" (no chunks) just to fetch it, but we will issue one more request.

I don't think so if i understand correctly. If we just receive a part of storage slots, we will switch to chunk-mode anyway. Because we have no idea how many slots left, we just blindly create 16 chunks to fetch concurrently.

@holiman
Copy link
Contributor

holiman commented Oct 18, 2023

Because we have no idea how many slots left, we just blindly create 16 chunks to fetch concurrently.

if estimate, err := estimateRemainingSlots(len(keys), lastKey); err == nil {
						if n := estimate / (2 * (maxRequestSize / 64)); n+1 < chunks {
							chunks = n + 1
						}

if estimate is small enough then n is 0, and chunks becomes 1. SO

					r := newHashRange(lastKey, chunks)

We get a new range starting at the last key, ending in 0xfff..f. We immediately commit the current data into the stacktrie, and once we get the next packet, it will complete the range and the hash will match up with the storage root.

@rjl493456442
Copy link
Member Author

Ah interesting, I missed that mechanism, thanks for pointing it out.

eth/protocols/snap/sync.go Outdated Show resolved Hide resolved
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@holiman holiman added this to the 1.13.5 milestone Oct 19, 2023
if owner != (common.Hash{}) && rawdb.ExistsStorageTrieNode(s.db, owner, path) {
rawdb.DeleteStorageTrieNode(batch, owner, path)
deletionGauge.Inc(1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, would a blind delete be more expensive vs the current check-and-delete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think check-and-delete is more expensive. However, the overhead is accepted, especially when we have the pebble fixed(start to use bloom filter).

I don't have strong opinion, but this current approach we can expose more information to metrics(e.g. how many dangling nodes we really detect).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question, and def not straight-forward answer. blind delete would put a bunch of tombstones in level0, so it's definitely not a given that it would be faster -- and if it is, it might make other parts slower due to the tombstone processing during e.g. compaction.

trie/stacktrie.go Outdated Show resolved Hide resolved
@rjl493456442 rjl493456442 force-pushed the stacktrie-pathcleaner branch from ae07a53 to 573676a Compare October 20, 2023 13:20
@rjl493456442
Copy link
Member Author

@holiman @karalabe deployed it on 5 to do a final snap sync.

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@karalabe karalabe merged commit ab04aeb into ethereum:master Oct 23, 2023
1 check passed
devopsbo3 pushed a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
…in stacktrie (ethereum#28327)

* core, eth, trie: filter out boundary nodes in stacktrie

* eth/protocol/snap: add comments

* Update trie/stacktrie.go

Co-authored-by: Martin Holst Swende <martin@swende.se>

* eth, trie: remove onBoundary callback

* eth/protocols/snap: keep complete boundary nodes

* eth/protocols/snap: skip healing if the storage trie is already complete

* eth, trie: add more metrics

* eth, trie: address comment

---------

Co-authored-by: Martin Holst Swende <martin@swende.se>
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
sduchesneau pushed a commit to streamingfast/go-ethereum that referenced this pull request Dec 19, 2023
Dergarcon pushed a commit to specialmechanisms/mev-geth-0x2mev that referenced this pull request Jan 31, 2024
…in stacktrie (ethereum#28327)

* core, eth, trie: filter out boundary nodes in stacktrie

* eth/protocol/snap: add comments

* Update trie/stacktrie.go

Co-authored-by: Martin Holst Swende <martin@swende.se>

* eth, trie: remove onBoundary callback

* eth/protocols/snap: keep complete boundary nodes

* eth/protocols/snap: skip healing if the storage trie is already complete

* eth, trie: add more metrics

* eth, trie: address comment

---------

Co-authored-by: Martin Holst Swende <martin@swende.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants