Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trie: enable batch updates #24556

Closed
wants to merge 2 commits into from
Closed

trie: enable batch updates #24556

wants to merge 2 commits into from

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Mar 17, 2022

A long time ago, I did an experiment (here) with using the trie in a mode where it didn't always copy-on-change. That way of using the trie is unsafe from a concurrency perspective, but in many situations, that's fine.
In the end, I didn't push that feature, I found that it didn't make a lot of difference since most of the time spent during the intermediate root wasn't related to gc, rather disk IO.

However, when we verify proofs (that are large enough that the full trie contents do not fit into a snap response), then the proof verifier uses a trie, shoves in the left and right side bounds, and fills it with e.g. ~10K values, to obtain the trie root. In this case, the batched mode could make a pretty significant change.

So I cherry-picked and tested:

name                      old time/op    new time/op    delta
VerifyProof-6               19.8µs ±34%    19.3µs ±25%     ~     (p=1.000 n=5+5)
VerifyRangeProof10-6         190µs ±88%     121µs ±18%  -36.59%  (p=0.032 n=5+5)
VerifyRangeProof100-6       1.02ms ±20%    0.36ms ±15%  -64.79%  (p=0.008 n=5+5)
VerifyRangeProof1000-6      8.45ms ±16%    4.58ms ±19%  -45.73%  (p=0.008 n=5+5)
VerifyRangeProof5000-6      33.3ms ±17%    19.8ms ±33%  -40.56%  (p=0.008 n=5+5)
VerifyRangeNoProof10-6       410µs ±20%     443µs ±22%     ~     (p=0.690 n=5+5)
VerifyRangeNoProof500-6     1.73ms ±32%    1.64ms ±17%     ~     (p=1.000 n=5+5)
VerifyRangeNoProof1000-6    3.22ms ± 7%    3.33ms ±11%     ~     (p=0.690 n=5+5)

name                      old alloc/op   new alloc/op   delta
VerifyProof-6               5.09kB ± 5%    5.11kB ± 4%     ~     (p=0.690 n=5+5)
VerifyRangeProof10-6        41.0kB ±13%    21.6kB ± 1%  -47.31%  (p=0.008 n=5+5)
VerifyRangeProof100-6        269kB ± 1%      78kB ± 2%  -71.07%  (p=0.008 n=5+5)
VerifyRangeProof1000-6      1.99MB ± 3%    0.87MB ± 1%  -56.15%  (p=0.008 n=5+5)
VerifyRangeProof5000-6      9.52MB ± 0%    4.41MB ± 1%  -53.73%  (p=0.008 n=5+5)
VerifyRangeNoProof10-6      34.6kB ± 3%    34.4kB ± 2%     ~     (p=0.690 n=5+5)
VerifyRangeNoProof500-6      114kB ± 2%     114kB ± 1%     ~     (p=0.841 n=5+5)
VerifyRangeNoProof1000-6     212kB ± 1%     211kB ± 1%     ~     (p=0.841 n=5+5)

name                      old allocs/op  new allocs/op  delta
VerifyProof-6                  104 ± 4%       105 ± 2%     ~     (p=0.905 n=5+5)
VerifyRangeProof10-6           428 ± 8%       366 ± 2%  -14.53%  (p=0.008 n=5+5)
VerifyRangeProof100-6        1.96k ± 1%     1.34k ± 1%  -31.62%  (p=0.008 n=5+5)
VerifyRangeProof1000-6       16.0k ± 2%     12.4k ± 1%  -22.43%  (p=0.008 n=5+5)
VerifyRangeProof5000-6       78.3k ± 0%     61.8k ± 1%  -20.98%  (p=0.008 n=5+5)
VerifyRangeNoProof10-6         937 ± 2%       938 ± 1%     ~     (p=0.548 n=5+5)
VerifyRangeNoProof500-6      2.90k ± 1%     2.92k ± 1%     ~     (p=0.421 n=5+5)
VerifyRangeNoProof1000-6     5.31k ± 1%     5.30k ± 1%     ~     (p=0.690 n=5+5)

Note: this PR contains some changes to statedb, which comes from previous experiment. Those changes would need to be checked if they're (still) ok.

@rjl493456442
Copy link
Member

I think it's worthwhile for experiment. Not only the Insert, but also Get operation can apply the same optimization.

For Get, we can load the node from the disk and link it with its parent without copy. I think it's fine.
For Update, we do need to ensure that the entire trie path(from root to leave) are updated in an 'atomic' operation.

@rjl493456442
Copy link
Member

rjl493456442 commented Mar 18, 2022

And also the Hash operation, we also do an extra copy for linking the generated hash to the node.

Semantically, all the operations in trie right now will always create a copy, apply the mutations(e.g. add the hash flag, replace the hashNode with a resolved node, etc) and eventually replace to root node, this procedure is treated as "atomic". The only concurrency issue will happen at "root" node.

We can explore to change the semantics. For example, when the root is resolved from the disk by Get, we can directly link it with its parent without changing all the ancestor nodes. The concurrency happens at this particular node instead of root node.

@holiman
Copy link
Contributor Author

holiman commented Mar 18, 2022

Just one thing I'd like to point out... So right now, the range prover seems like it shaves off ~50% on a 10K slice of leafs. However, we could do it even better.

Currently, we

  1. take the first-leaf-path, and last-leaf-path, construct the proofs, and dump the nodes into the trie.
  2. unset internals,
  3. Fill with remaining leafs to recreate the original trie with hashes on the sides.
  4. verify root, done

This method will always require a moderately large trie (~15-20k nodes) balooning up during the verification, regardless of how much we improve the trie. The alternative way to do this is to

  1. take the first leaf-path, construct the proof, and initialize a stacktrie along that path (avoiding the right-hand side data)
  2. fill remaining leafs
  3. take the last leaf-path, finish up the stacktrie with the hashes/data on the right-hand side.
  4. verify root, done

This method will have around 50-100 nodes at any time, and the pool makes it close to alloc-free. So it's intrinsically much better.
I started implementing that almost a year ago, and have an old stale branch with that attempt. I didn't follow through on it -- got the 'init'-part done, but didn't complete the finalization-part.

@holiman holiman mentioned this pull request Mar 18, 2022
@rjl493456442
Copy link
Member

This method will always require a moderately large trie (~15-20k nodes) balooning up during the verification

You mean we will always have that many nodes in memory during the verification? If so, yes. But most of them are created in runtime, not resolved from the disk

@holiman
Copy link
Contributor Author

holiman commented Sep 23, 2022

I have now updated this PR, and modified it so that it does not touch the core state transition functions, but is only used when verifying range proof. For a somewhat large range, it shaves off 50% of allocations, and 30-50% of the time.

Specifically this case as mentioned in the original description:

However, when we verify proofs (that are large enough that the full trie contents do not fit into a snap response), then the proof verifier uses a trie, shoves in the left and right side bounds, and fills it with e.g. ~10K values, to obtain the trie root. In this case, the batched mode could make a pretty significant change.

name                      old time/op    new time/op    delta
VerifyRangeProof10-6        80.7µs ±64%    46.3µs ±11%  -42.59%  (p=0.000 n=10+8)
VerifyRangeProof100-6        585µs ±32%     298µs ±40%  -49.06%  (p=0.000 n=10+10)
VerifyRangeProof1000-6      4.27ms ±29%    2.91ms ±75%  -31.82%  (p=0.019 n=10+10)
VerifyRangeProof5000-6      15.2ms ±42%    14.0ms ±39%     ~     (p=0.631 n=10+10)
VerifyRangeNoProof10-6       334µs ±49%     223µs ± 8%     ~     (p=0.089 n=10+10)
VerifyRangeNoProof500-6     1.41ms ±19%    1.49ms ± 7%     ~     (p=0.139 n=9+8)
VerifyRangeNoProof1000-6    2.73ms ± 3%    2.10ms ±49%     ~     (p=0.156 n=9+10)

name                      old alloc/op   new alloc/op   delta
VerifyRangeProof10-6        41.5kB ± 1%    22.2kB ± 1%  -46.62%  (p=0.000 n=9+9)
VerifyRangeProof100-6        282kB ± 7%      78kB ± 2%  -72.18%  (p=0.000 n=10+9)
VerifyRangeProof1000-6      1.97MB ± 1%    0.86MB ± 2%  -56.13%  (p=0.000 n=10+10)
VerifyRangeProof5000-6      9.47MB ± 0%    4.41MB ± 1%  -53.44%  (p=0.000 n=9+10)
VerifyRangeNoProof10-6      34.2kB ± 3%    34.4kB ± 4%     ~     (p=0.796 n=10+10)
VerifyRangeNoProof500-6      114kB ± 2%     114kB ± 2%     ~     (p=0.315 n=10+10)
VerifyRangeNoProof1000-6     212kB ± 1%     212kB ± 2%     ~     (p=0.190 n=10+10)

name                      old allocs/op  new allocs/op  delta
VerifyRangeProof10-6           377 ± 1%       315 ± 1%  -16.36%  (p=0.000 n=9+9)
VerifyRangeProof100-6        1.94k ± 4%     1.29k ± 1%  -33.61%  (p=0.000 n=10+10)
VerifyRangeProof1000-6       15.8k ± 1%     12.3k ± 1%  -22.40%  (p=0.000 n=10+10)
VerifyRangeProof5000-6       77.6k ± 0%     61.7k ± 1%  -20.48%  (p=0.000 n=9+10)
VerifyRangeNoProof10-6         933 ± 2%       937 ± 2%     ~     (p=0.698 n=10+10)
VerifyRangeNoProof500-6      2.90k ± 1%     2.90k ± 1%     ~     (p=0.591 n=10+10)
VerifyRangeNoProof1000-6     5.29k ± 1%     5.28k ± 1%     ~     (p=0.101 n=10+10)

@holiman
Copy link
Contributor Author

holiman commented Sep 29, 2022

This probably adds more complexity than is warranted.

@holiman holiman closed this Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants