Implements concurrent `Smt::compute_mutations` #365

krushimir · 2025-01-15T22:22:46Z

This PR introduces a concurrent implementation of Smt::compute_mutations, leveraging an approach similar to the existing parallel construction logic.

Benchmark results were collected on a 64-core (128-thread) AMD EPYC 7662 processor, with Rayon’s thread pool explicitly limited to the specified thread counts.

For context, construction benchmarks are also included for performance comparison.

1. Construction Benchmark

10k key-value pairs

Threads	Parallel Time (s)	Sequential Time (s)	Speedup
16	0.5	5.7	11.11x
32	0.4	5.7	15.22x
64	0.3	5.7	17.35x
128	0.4	5.7	16.90x

Optimal performance was achieved with 64 threads.
Diminishing returns were observed with 128 threads

2. Batched Insertion Benchmark

10k key-value pairs

Threads	Parallel Time (ms)	Sequential Time (ms)	Speedup	Avg Insert Time (μs)
16	517.0	6308.7	12.20x	52
32	395.8	6334.5	16.00x	40
64	333.0	6321.6	18.98x	33
128	383.7	6300.7	16.42x	38

64 threads offered the best performance, reducing average insertion time to 33 μs.
Scaling beyond 64 threads led to slight performance degradation.

3. Batched Update Benchmark

10k key-value pairs

Threads	Parallel Time (ms)	Sequential Time (ms)	Speedup	Avg Update Time (μs)
16	482.7	6369.8	13.20x	48
32	357.7	6351.5	17.76x	36
64	304.7	6378.5	20.93x	30
128	273.5	6418.8	23.47x	27

Batched updates scaled better with increased threads.
128 threads achieved the fastest update speed, reducing average time to 27 μs.

PhilippGackstatter

Looks great to me! I think the logic itself looks good. My comments are mostly about naming, docs and deduplication. I might have to take another look anyway, since I first had to understand how the Smt is implemented in sequential code 😅, so I'll just comment for now.

In general, I think adding comments to code parts that are not easy to understand would improve readability and understandability.

Regarding the approach, please correct me if I have misunderstandings, but my understanding of the approach is the following.

Assuming a tree of depth 64 with subtrees of depth 8 and mutations of just two (for example's sake) leaves at indices 0 and 65536, compute_mutations would do this, on a high-level and making some simple assumptions about how rayon assigns threads:

Compute subtrees that were modified. This happens in sorted_pairs_to_mutated_leaves. This would yield two subtrees, covering the column ranges 0..256 and 65536..65792.
Then in build_subtree_mutations, the subtrees are updated in parallel.
- 1st iteration:
  - Thread 0: Compute updates for leaves with indices 0..256 at depth 64. Then updates for leaves at depth 63 within this subtree, and so on, until it eventually results in new root at depth 56, column 0.
  - Thread 1: Compute updates for leaves with indices 65536..65792 at depth 64. Then updates for leaves at depth 63 within this subtree, and so on, until it eventually results in new root at depth 56, column 256 (= 65536 >> 8).
- 2nd iteration:
  - Thread 0: Compute updates for leaves with indices 0..256 at depth 56 (only root 0 has changed). Eventually this results in a new root at depth 48, column 0.
  - Thread 1: Compute updates for leaves with indices 256..512 at depth 56 (only root 256 has changed). Eventually this results in a new root at depth 48, column 1.
- 3rd iteration:
  - Thread 0: Compute updates for leaves with indices 0..256 at depth 48 (only root 0 has changed). Eventually this results in a new root at depth 40, column 0.
- More iterations like the 3rd until the root at depth 0 has been reached.

Is this accurate? Would it make sense to add something like this as a doc comment to compute_mutations_subtree (with corrections if it's inaccurate)?

src/merkle/smt/mod.rs

src/merkle/smt/tests.rs

src/merkle/smt/mod.rs

krushimir · 2025-01-17T21:37:36Z

10M entries tree.

batch insertions (10k inserts):
without smt_hashmaps: 383.3 ms (~38 μs per insert)
with smt_hashmaps: 281.9 ms (~28 μs per insert)
~26% faster
concurrent vs. sequential: 17.7x faster
concurrent with smt_hashmaps vs. sequential: 24.1x faster

batch updates (10k updates):
without smt_hashmaps: 287.9 ms (~29 μs per update)
without smt_hashmaps: 265.5 ms (~27 μs per update)
~8% faster
concurrent vs. sequential: 23.6x faster
concurrent with smt_hashmaps vs. sequential: 25.6x faster

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

src/merkle/smt/full/mod.rs

PhilippGackstatter · 2025-01-22T09:46:41Z

Hey @krushimir, quick question: Is this still Work-In-Progress or can it be marked as ready for review?

krushimir · 2025-01-22T11:11:22Z

Hi @PhilippGackstatter, I'll push some more changes today and then I'll mark it ready.

src/main.rs

src/merkle/smt/simple/mod.rs

src/merkle/smt/mod.rs

PhilippGackstatter

Looks good to me!

src/main.rs

src/merkle/smt/tests.rs

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

src/merkle/smt/full/mod.rs

polydez · 2025-01-28T13:26:40Z

src/merkle/smt/mod.rs

+
+        // Collect and sort key-value pairs by their corresponding leaf index
+        let mut sorted_kv_pairs: Vec<_> = kv_pairs.into_iter().collect();
+        sorted_kv_pairs.sort_unstable_by_key(|(key, _)| Self::key_to_leaf_index(key).value());


Should we use parallel sorting here? par_sort_unstable_by_key

Thanks for the tip. I did some benchmarking on EPYC 7662 (128 threads) and M1 Pro (10 threads), and for a batch of 10K elements, the benefits are unclear. Nevertheless, I pushed the change.

Also, I've implemented the same improvement into parallel tree construction - currently
tracked here.

bobbinth

Looks good! Thank you! I left a couple of comments inline. The main one is about code organization - i.e., potentially moving the parallel mutation functions to the Smt struct.

src/main.rs

bobbinth · 2025-01-30T05:20:32Z

src/merkle/smt/mod.rs

+    fn compute_mutations_concurrent(
+        &self,
+        kv_pairs: impl IntoIterator<Item = (Self::Key, Self::Value)>,
+    ) -> MutationSet<DEPTH, Self::Key, Self::Value>


One thing I'm wondering (and maybe this is not a good idea): should we move all the concurrent mutation logic into Smt struct? The thinking is that it applies primarily to Smt use case and while we do support it for SimpleSmts with depth of multiples of 8, arguably, such support is more confusing than helpful.

On the other hand, if we do move the logic to the Smt struct, it will make code comprehension a bit easier. Specifically, compute_mutations_concurrent(), sorted_pairs_to_mutated_subtree_leaves(), build_subtree_mutations(), and fetch_sibling_pair() would move there. The compute_mutations() method in this module would look like:

fn compute_mutations( &self, kv_pairs: impl IntoIterator<Item = (Self::Key, Self::Value)>, ) -> MutationSet<DEPTH, Self::Key, Self::Value> where Self: Sized + Sync, Self::Key: Send + Sync, Self::Value: Send + Sync, { self.compute_mutations_sequential(kv_pairs) }

And then in the Smt struct we'd have the conditional compilation based on the feature flag. We could maybe encapsulate all methods related to concurrency in a separate impl block - e.g.,:

#[cfg(feature = "concurrent")] impl Smt { fn compute_mutations_concurrent( &self, kv_pairs: impl IntoIterator<Item = (Self::Key, Self::Value)>, ) -> MutationSet<DEPTH, Self::Key, Self::Value> where Self: Sized + Sync, Self::Key: Send + Sync, Self::Value: Send + Sync, { ... } fn sorted_pairs_to_mutated_subtree_leaves( ... } }

I agree with your suggestion. Moving the concurrent mutation logic to the Smt struct makes sense if we're okay with the 8-depth limitation for now.
Would you like me to do the same for parallel construction methods, or leave that for another PR?

If it's simple enough, we could move them in this PR as well.

sonarqubecloud · 2025-01-30T12:17:02Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

feat: adds concurrent Smt::compute_mutations

c3bbe1c

bobbinth requested a review from PhilippGackstatter January 16, 2025 17:35

PhilippGackstatter reviewed Jan 17, 2025

View reviewed changes

chore: cleanup bench

c447c6f

chore: adds comment

f42d597

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

bobbinth reviewed Jan 22, 2025

View reviewed changes

src/merkle/smt/full/mod.rs Outdated Show resolved Hide resolved

chore: addressing comments

a76506f

krushimir marked this pull request as ready for review January 23, 2025 07:12

krushimir changed the title ~~[WIP] implements concurrent Smt::compute_mutations~~ Implements concurrent Smt::compute_mutations Jan 23, 2025

Mirko-von-Leipzig reviewed Jan 23, 2025

View reviewed changes

src/main.rs Show resolved Hide resolved

src/merkle/smt/simple/mod.rs Show resolved Hide resolved

src/merkle/smt/mod.rs Show resolved Hide resolved

PhilippGackstatter reviewed Jan 23, 2025

View reviewed changes

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

PhilippGackstatter approved these changes Jan 23, 2025

View reviewed changes

src/main.rs Outdated Show resolved Hide resolved

src/merkle/smt/tests.rs Show resolved Hide resolved

krushimir and others added 2 commits January 23, 2025 17:32

chore: update docs

ec35f28

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

chore: linting and addressing comments

e89daa9

krushimir force-pushed the krushimir/subtree_mutations branch from 9242cff to e89daa9 Compare January 23, 2025 17:03

krushimir added 2 commits January 23, 2025 18:17

Merge branch 'next' into krushimir/subtree_mutations

c1bcd6d

docs: SimpleSmt::compute_mutations note

17b03a8

krushimir force-pushed the krushimir/subtree_mutations branch from 11ad605 to 17b03a8 Compare January 27, 2025 20:26

bobbinth mentioned this pull request Jan 28, 2025

Improve performance of Smt::sorted_pairs_to_leaves #348

Open

polydez reviewed Jan 28, 2025

View reviewed changes

chore: addressing comments

1eb7769

bobbinth reviewed Jan 30, 2025

View reviewed changes

chore: change the benchmark params default values

4f6f431

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements concurrent `Smt::compute_mutations` #365

Implements concurrent `Smt::compute_mutations` #365

krushimir commented Jan 15, 2025

PhilippGackstatter left a comment

krushimir commented Jan 17, 2025

PhilippGackstatter commented Jan 22, 2025

krushimir commented Jan 22, 2025

PhilippGackstatter left a comment

polydez Jan 28, 2025

krushimir Jan 29, 2025

bobbinth left a comment

bobbinth Jan 30, 2025

krushimir Jan 30, 2025

bobbinth Jan 30, 2025

sonarqubecloud bot commented Jan 30, 2025

Implements concurrent Smt::compute_mutations #365

Are you sure you want to change the base?

Implements concurrent Smt::compute_mutations #365

Conversation

krushimir commented Jan 15, 2025

1. Construction Benchmark

2. Batched Insertion Benchmark

3. Batched Update Benchmark

PhilippGackstatter left a comment

Choose a reason for hiding this comment

krushimir commented Jan 17, 2025

PhilippGackstatter commented Jan 22, 2025

krushimir commented Jan 22, 2025

PhilippGackstatter left a comment

Choose a reason for hiding this comment

polydez Jan 28, 2025

Choose a reason for hiding this comment

krushimir Jan 29, 2025

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth Jan 30, 2025

Choose a reason for hiding this comment

krushimir Jan 30, 2025

Choose a reason for hiding this comment

bobbinth Jan 30, 2025

Choose a reason for hiding this comment

sonarqubecloud bot commented Jan 30, 2025

Quality Gate passed

Implements concurrent `Smt::compute_mutations` #365

Implements concurrent `Smt::compute_mutations` #365