Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] smt: implement parallel mutation computations #336

Closed

Conversation

Qyriad
Copy link
Contributor

@Qyriad Qyriad commented Oct 16, 2024

Describe your changes

This is a draft providing a cursory implementation of recomputing SMT nodes in parallel, by splitting the tree into depth-8 subtree tasks which each recursively process their nodes.

Some benchmark results on my Ryzen 7950X:

Pairs Parallel Construction Time
500 6.206 seconds
1000 25.065 seconds
2000 11.60 seconds
3000 369.465 seconds

The total time increases exponentially as the number of key-value pairs being inserted increases. With some added eprintln!()s, we can also see each individual task's time also increases as we move up the tree:

$ cargo run --all-features --release -- --size 1000
Running a parallel construction benchmark:
joined 500 tasks for depth 56 in 4.606 milliseconds
joined 500 tasks for depth 48 in 18.616 milliseconds
joined 500 tasks for depth 40 in 41.756 milliseconds
joined 500 tasks for depth 32 in 63.146 milliseconds
joined 500 tasks for depth 24 in 109.348 milliseconds
joined 380 tasks for depth 16 in 154.977 milliseconds
joined 30 tasks for depth 8 in 1572.014 milliseconds
joined 1 tasks for depth 0 in 4205.459 milliseconds

assertion checks complete
Constructed an SMT in parallel with 1000 key-value pairs in 25.219 seconds

A profile indicates that we're spending almost half our time just determining if a node needs to be recomputed or not, in is_index_dirty(). I'm not sure how to mitigate this and still compute fixed subtrees. We could walk up from each modified leaf and mark each node in the path as dirty, upfront. However, a similar up-front computation to identify node indices of note was a considerable bottleneck in the previous approach to this parallelization. There may also be some heuristic we could apply to node indices to quickly estimate if they are ancestors of a modified index and sink a few duplicate calculations for when it fails.

@bobbinth
Copy link
Contributor

How do these results compare to the single-threaded construction times? Doing some back-of-the-envelope estimations, constructing a tree from 1000 key-value pairs would require about 64K hashes in the worst case. We should be able to do this many hashes in under 1 second on a single core. On 32 cores, this should be an order of magnitude faster - so, 25 seconds seems way too slow. But maybe I'm missing something?

@Qyriad
Copy link
Contributor Author

Qyriad commented Nov 15, 2024

Closing in favor of #341.

@Qyriad Qyriad closed this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants