Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Root encoding is on the hot path for block verification both in the consensus (when syncing) and execution clients and oddly constitutes a significant part of resource usage even though it is not that much work.
While the trie code is capable of producing a transaction root and similar feats, it turns out that it is quite inefficient - even for small work loads.
This PR brings in a helper for the specific use case of building tries of lists of values whose key is the RLP-encoded index of the item.
As it happens, such keys follow a particular structure where items end up "almost" sorted, with the exception for the item at index 0 which gets encoded as
[0x80]
, ie the empty list, thus moving it to a new location.Armed with this knowledge and the understanding that inserting ordered items into a trie easily can be done with a simple recursion, this PR brings a ~100x improvement in CPU usage (360ms vs 33s) and a ~50x reduction in memory usage (70mb vs >3gb!) for the simple test of encoding 1000000 keys.
In part, the memory usage reduction is due to a trick where the hash of the item is computed as the item is being added instead of storing it in the value.
There are further reductions possible such as maintaining a hasher per level instead of storing hash values as well as using a direct-to-hash rlp encoder.