[WIP] smt: implement parallel mutation computations #336

Qyriad · 2024-10-16T23:56:00Z

Describe your changes

This is a draft providing a cursory implementation of recomputing SMT nodes in parallel, by splitting the tree into depth-8 subtree tasks which each recursively process their nodes.

Some benchmark results on my Ryzen 7950X:

Pairs	Parallel Construction Time
500	6.206 seconds
1000	25.065 seconds
2000	11.60 seconds
3000	369.465 seconds

The total time increases exponentially as the number of key-value pairs being inserted increases. With some added eprintln!()s, we can also see each individual task's time also increases as we move up the tree:

$ cargo run --all-features --release -- --size 1000
Running a parallel construction benchmark:
joined 500 tasks for depth 56 in 4.606 milliseconds
joined 500 tasks for depth 48 in 18.616 milliseconds
joined 500 tasks for depth 40 in 41.756 milliseconds
joined 500 tasks for depth 32 in 63.146 milliseconds
joined 500 tasks for depth 24 in 109.348 milliseconds
joined 380 tasks for depth 16 in 154.977 milliseconds
joined 30 tasks for depth 8 in 1572.014 milliseconds
joined 1 tasks for depth 0 in 4205.459 milliseconds

assertion checks complete
Constructed an SMT in parallel with 1000 key-value pairs in 25.219 seconds

A profile indicates that we're spending almost half our time just determining if a node needs to be recomputed or not, in is_index_dirty(). I'm not sure how to mitigate this and still compute fixed subtrees. We could walk up from each modified leaf and mark each node in the path as dirty, upfront. However, a similar up-front computation to identify node indices of note was a considerable bottleneck in the previous approach to this parallelization. There may also be some heuristic we could apply to node indices to quickly estimate if they are ancestors of a modified index and sink a few duplicate calculations for when it fails.

bobbinth · 2024-10-17T03:12:09Z

How do these results compare to the single-threaded construction times? Doing some back-of-the-envelope estimations, constructing a tree from 1000 key-value pairs would require about 64K hashes in the worst case. We should be able to do this many hashes in under 1 second on a single core. On 32 cores, this should be an order of magnitude faster - so, 25 seconds seems way too slow. But maybe I'm missing something?

Qyriad · 2024-11-15T02:50:26Z

Closing in favor of #341.

Qyriad added 8 commits September 24, 2024 10:39

WIP(smt): allow inner_nodes: to be wrapped in an Arc for async

e5dd7c6

fix(merkle): fix overflow in to_scalar_index for nodes at depth 64

0e7e670

feat(merkle): impl constructing NodeIndex from scalar index

c414a87

feat(smt): impl lowest common ancestor for leaf indices

b289e7e

WIP: add many helper methods on NodeIndex

1632ec5

WIP(smt): allow leaves to be wrapped in an Arc for async

1ca498b

WIP: remove unused helpers for NodeIndex

aa88e29

WIP: add experimental parallel subtree support

3817ddb

Qyriad closed this Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] smt: implement parallel mutation computations #336

[WIP] smt: implement parallel mutation computations #336

Qyriad commented Oct 16, 2024

bobbinth commented Oct 17, 2024

Qyriad commented Nov 15, 2024

[WIP] smt: implement parallel mutation computations #336

[WIP] smt: implement parallel mutation computations #336

Conversation

Qyriad commented Oct 16, 2024

Describe your changes

bobbinth commented Oct 17, 2024

Qyriad commented Nov 15, 2024