Reduce zero-padding and cache zero-hashes in k-ary MerkleTree #2407

ljedrz · 2024-03-23T16:18:53Z

When creating k-ary Merkle trees, we currently pad all the "missing" leaves, which can result in a large number of empty hashes that all end up being allocated, and then repeatedly included in the calculation of further hashes. In order to improve performance for all such scenarios, we can omit some of the padding, and cache the hashes that derive from the zero hash, which is what this PR proposes.

While there are still ways in which the performance can be improved further, these changes aren't exactly trivial already, and it seems like a good moment to open them to discussion. Below are some benchmarks that I'll be referring to when outlining the potential for future improvement.

creation of a KaryMerkleTree with ARITY = 8, DEPTH = 8, NUM_LEAVES = ARITY^(DEPTH-1)+1, and SHA3 hashers

before:

alloc count: 48,061,051
heap use:    5.12 GiB
max heap:    10.17 GiB

after:

alloc count: 21,084,574 (-56%)
heap use:    1.62 GiB (-68%)
max heap:    3.17 GiB (-69%)

These are all wins, with a significantly lower both maximum and current (once the tree is created) heap use.

console/collections/benches/kary_merkle_tree:

KaryMerkleTree/new/0 (0% full)
                        time:   [21.247 µs 21.308 µs 21.383 µs]
                        change: [+93.568% +95.333% +96.860%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

Benchmarking KaryMerkleTree/new/4194304 (25% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 20.4s.
KaryMerkleTree/new/4194304 (25% full)
                        time:   [2.0185 s 2.0246 s 2.0308 s]
                        change: [-51.205% -51.016% -50.847%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/8388608 (50% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 38.2s.
KaryMerkleTree/new/8388608 (50% full)
                        time:   [3.8503 s 3.8668 s 3.8835 s]
                        change: [-25.467% -25.025% -24.593%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/12582912 (75% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 57.7s.
KaryMerkleTree/new/12582912 (75% full)
                        time:   [5.7279 s 5.7525 s 5.7726 s]
                        change: [-8.8066% -8.3742% -7.9833%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) low mild

Benchmarking KaryMerkleTree/new/16777216 (100% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 75.1s.
KaryMerkleTree/new/16777216 (100% full)
                        time:   [7.5454 s 7.5707 s 7.5963 s]
                        change: [+3.6674% +4.2182% +4.7320%] (p = 0.00 < 0.05)
                        Performance has regressed.

The creation of an empty and a full tree see regressions, as those scenarios don't benefit from the caching of empty hashes. I have an idea on how to improve them, which will be posted as an additional commit if the diff isn't too large.

Signed-off-by: ljedrz <ljedrz@gmail.com>

ljedrz · 2024-03-23T16:50:41Z

The regressions turned out to be simpler to iron out than expected:

KaryMerkleTree/new/0 (0% full)
                        time:   [12.495 µs 12.524 µs 12.564 µs]
                        change: [+12.772% +13.612% +14.468%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

KaryMerkleTree/new/16777216 (100% full)
                        time:   [7.4351 s 7.4754 s 7.5136 s]
                        change: [-4.6663% -3.8930% -3.0814%] (p = 0.00 < 0.05)
                        Performance has improved.

The empty tree case is still a bit worse than before, but it's a tiny difference measured in µs and not the typical scenario. Also, that one seems likely to fluctuate - I just re-ran it (without any changes), and saw a -6% difference.

howardwu · 2024-03-27T17:07:44Z

console/collections/src/kary_merkle_tree/tests/mod.rs

@@ -170,7 +170,7 @@ fn check_merkle_tree_depth_3_arity_3_padded<LH: LeafHash<Hash = PH::Hash>, PH: P

    // Construct the Merkle tree for the given leaves.
    let merkle_tree = KaryMerkleTree::<LH, PH, 3, ARITY>::new(leaf_hasher, path_hasher, &leaves)?;
-    assert_eq!(40, merkle_tree.tree.len());
+    assert_eq!(25, merkle_tree.tree.len());


Why did this change?

With this PR, the nodes that would only contain empty hashes (padding) no longer need to allocate them at the leaf level, which means that the tree ends up being smaller; you can see a calculation of the difference in the current number of leaves - padded for the entire leaf level - and the one this PR proposes (only padded up to the arity) here. In case of this particular test case (arity = 3, depth = 3, leaves = 10), the number of tree members (at the leaf level) that are just padding is currently 17, which this PR takes down to 2, which is a reduction of 15, leading to the corresponding reduction in tree size.

Does this change apply to the merkle-tree too? (non-k-ary one) If so we shoudl optimize and profile that one too.

I've already done that; the wins aren't as great, but I'm happy to file a PR soon if this direction is desirable.

https://github.com/AleoHQ/snarkVM/pull/2415

howardwu · 2024-04-02T16:30:42Z

@ljedrz is this a breaking PR change? Does the Merkle tree's expected outputs and roots change in any way?

ljedrz · 2024-04-02T17:04:40Z

I don't think so - hashes for the nodes containing empty hashes are calculated as if the empty hashes were there, they are only not physically present in the tree; the number of leaves field is not affected either; are there any test vectors I could double-check against?

ljedrz · 2024-04-03T14:44:37Z

@howardwu I've been running equivalence tests (both KaryMerkleTree and MerkleTree) by @d0cd for a large number of iterations and haven't encountered any failures.

d0cd · 2024-04-03T15:15:56Z

For those interested, the equivalence tests can be found in this repo. (currently private but the AleoHQ/snarkVM team has access)

A sample equivalence test for the Puzzle is below.
Note that snarkvm_old points to mainnet-staging, and snarkvm_new points to ljedrz:perf/kary_merkle_tree_padding

use snarkvm_old::ledger::puzzle::Puzzle as OldPuzzle;
use snarkvm_old::ledger::store::helpers::memory::ConsensusMemory as OldMemory;
use snarkvm_old::prelude::{
    Address as OldAddress, MainnetV0 as OldNetwork, Network as OldNetworkTrait, Uniform,
};
use snarkvm_old::synthesizer::vm::VM as OldVM;

use snarkvm_new::ledger::puzzle::Puzzle as NewPuzzle;
use snarkvm_new::ledger::store::helpers::memory::ConsensusMemory as NewMemory;
use snarkvm_new::prelude::{
    Address as NewAddress, MainnetV0 as NewNetwork, Network as NewNetworkTrait,
};
use snarkvm_new::synthesizer::vm::VM as NewVM;

use anyhow::Result;
use std::str::FromStr;

const ITERATIONS: u128 = 100;

fn check_equivalency(
    old_puzzle: &OldPuzzle<OldNetwork>,
    new_puzzle: &NewPuzzle<NewNetwork>,
) -> Result<()> {
    // Initialize an RNG.
    let rng = &mut rand::thread_rng();
    // Check that the puzzles produce equivalent solutions.
    for _ in 0..ITERATIONS {
        // Generate a random epoch hash.
        let old_epoch_hash = <OldNetwork as OldNetworkTrait>::BlockHash::rand(rng);
        let new_epoch_hash =
            <NewNetwork as NewNetworkTrait>::BlockHash::from_str(&format!("{old_epoch_hash}"))?;

        // Generate a random address and counter.
        let old_address = OldAddress::<OldNetwork>::rand(rng);
        let new_address = NewAddress::<NewNetwork>::from_str(&format!("{old_address}"))?;

        let old_counter = u64::rand(rng);
        let new_counter = u64::from_str(&format!("{old_counter}"))?;

        let old_solution = old_puzzle.prove(old_epoch_hash, old_address, old_counter, None)?;
        let new_solution = new_puzzle.prove(new_epoch_hash, new_address, new_counter, None)?;

        let old_proof_target = old_puzzle.get_proof_target(&old_solution)?;
        let new_proof_target = new_puzzle.get_proof_target(&new_solution)?;

        assert_eq!(old_proof_target, new_proof_target);
    }
    Ok(())
}

#[test]
fn test_puzzles_are_equivalent() -> Result<()> {
    // Construct the old puzzle.
    let old_puzzle = OldVM::<OldNetwork, OldMemory<OldNetwork>>::new_puzzle()?;

    // Construct the new puzzle.
    let new_puzzle = NewVM::<NewNetwork, NewMemory<NewNetwork>>::new_puzzle()?;

    // Check that the old and new puzzles produce equivalent solutions.
    check_equivalency(&old_puzzle, &new_puzzle)?;

    Ok(())
}

howardwu · 2024-04-03T21:04:55Z

@vicsn @mdelle1 can each of you independently review this design change for correctness/completeness and soundness?

console/collections/src/kary_merkle_tree/mod.rs

Signed-off-by: ljedrz <ljedrz@gmail.com>

ljedrz · 2024-04-04T14:10:32Z

I'm going to check if any of these review comments apply to https://github.com/AleoHQ/snarkVM/pull/2415 as well.

Edit: it seems that way, but let's finish this PR first so as to make sure that everything is in good shape.

vicsn

As far as I can tell: this PR does not impact completeness/soundness of the merkle tree opening proof, because it produces merkle tree nodes which are identical to before.

Note that due to the particular imlementation, we can't prove membership anymore of empty leaves outside of range of the originally supplied leaves, which seems like a desirable bonus.

Signed-off-by: ljedrz <ljedrz@gmail.com>

ljedrz · 2024-04-08T15:29:23Z

I'll rerun the benchmarks for this and the other PR shortly.

ljedrz · 2024-04-08T17:11:24Z

The updated benchmark results (skipping the 0% full case, as it's inconsistent) are:

Benchmarking KaryMerkleTree/new/4194304 (25% full): Warming up for 3.0000 s
KaryMerkleTree/new/4194304 (25% full)
                        time:   [1.0458 s 1.0503 s 1.0551 s]
                        change: [-73.821% -73.661% -73.488%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/8388608 (50% full): Warming up for 3.0000 s
KaryMerkleTree/new/8388608 (50% full)
                        time:   [1.9405 s 1.9510 s 1.9611 s]
                        change: [-58.942% -58.687% -58.449%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/12582912 (75% full): Warming up for 3.0000 s
KaryMerkleTree/new/12582912 (75% full)
                        time:   [2.8457 s 2.8569 s 2.8664 s]
                        change: [-47.101% -46.725% -46.350%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) low mild

Benchmarking KaryMerkleTree/new/16777216 (100% full): Warming up for 3.0000 s
KaryMerkleTree/new/16777216 (100% full)
                        time:   [3.7669 s 3.8062 s 3.8520 s]
                        change: [-39.205% -38.257% -37.313%] (p = 0.00 < 0.05)
                        Performance has improved.

KaryMerkleTree/verify   time:   [14.508 µs 14.608 µs 14.737 µs]
                        change: [-7.9402% -6.7762% -5.4263%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  10 (10.00%) high severe

Which is a further improvement across the board.

vicsn

ACK on the new commits

perf: reduce zero-padding and cache zero-hashes in k-ary MerkleTree

3627720

Signed-off-by: ljedrz <ljedrz@gmail.com>

ljedrz requested a review from d0cd March 23, 2024 16:18

perf: only cache the zero hash for the previous level

3afcadd

Signed-off-by: ljedrz <ljedrz@gmail.com>

ljedrz marked this pull request as ready for review March 23, 2024 17:40

howardwu reviewed Mar 27, 2024

View reviewed changes

ljedrz mentioned this pull request Apr 1, 2024

Reduce zero-padding and cache zero-hashes in MerkleTree #2415

Merged

howardwu requested review from vicsn and mdelle1 April 3, 2024 21:03

raychu86 reviewed Apr 3, 2024

View reviewed changes

console/collections/src/kary_merkle_tree/mod.rs Outdated Show resolved Hide resolved

d0cd reviewed Apr 3, 2024

View reviewed changes

console/collections/src/kary_merkle_tree/mod.rs Outdated Show resolved Hide resolved

d0cd reviewed Apr 3, 2024

View reviewed changes

console/collections/src/kary_merkle_tree/mod.rs Outdated Show resolved Hide resolved

ljedrz added 5 commits April 4, 2024 14:45

docs: add an extra comment in KaryMerkleTree::new

fb429a4

Signed-off-by: ljedrz <ljedrz@gmail.com>

cleanup: remove a redundant check in KaryMerkleTree::new

6bc448c

Signed-off-by: ljedrz <ljedrz@gmail.com>

cleanup: remove another redundant check in KaryMerkleTree::new

cd2fe47

Signed-off-by: ljedrz <ljedrz@gmail.com>

docs: fix the doc comment for hash_all_children

7646a98

Signed-off-by: ljedrz <ljedrz@gmail.com>

clippy: improve iteration

650271d

Signed-off-by: ljedrz <ljedrz@gmail.com>

vicsn approved these changes Apr 5, 2024

View reviewed changes

ljedrz added 2 commits April 8, 2024 17:06

perf: stop iterating the leaves sooner in KaryMerkleTree::new

2d2b810

Signed-off-by: ljedrz <ljedrz@gmail.com>

perf: don't allocate the children in KaryMerkleTree::now

e7031f4

Signed-off-by: ljedrz <ljedrz@gmail.com>

ljedrz requested a review from vicsn April 8, 2024 15:15

vicsn approved these changes Apr 9, 2024

View reviewed changes

d0cd approved these changes Apr 9, 2024

View reviewed changes

howardwu merged commit bbdb745 into ProvableHQ:mainnet-staging Apr 12, 2024
80 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce zero-padding and cache zero-hashes in k-ary MerkleTree #2407

Reduce zero-padding and cache zero-hashes in k-ary MerkleTree #2407

ljedrz commented Mar 23, 2024

ljedrz commented Mar 23, 2024 •

edited

Loading

howardwu Mar 27, 2024

ljedrz Mar 27, 2024 •

edited

Loading

howardwu Mar 27, 2024

ljedrz Mar 27, 2024

ljedrz Apr 1, 2024

howardwu commented Apr 2, 2024

ljedrz commented Apr 2, 2024 •

edited

Loading

ljedrz commented Apr 3, 2024

d0cd commented Apr 3, 2024 •

edited

Loading

howardwu commented Apr 3, 2024

ljedrz commented Apr 4, 2024 •

edited

Loading

vicsn left a comment

ljedrz commented Apr 8, 2024

ljedrz commented Apr 8, 2024

vicsn left a comment

Reduce zero-padding and cache zero-hashes in k-ary MerkleTree #2407

Reduce zero-padding and cache zero-hashes in k-ary MerkleTree #2407

Conversation

ljedrz commented Mar 23, 2024

ljedrz commented Mar 23, 2024 • edited Loading

howardwu Mar 27, 2024

Choose a reason for hiding this comment

ljedrz Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

howardwu Mar 27, 2024

Choose a reason for hiding this comment

ljedrz Mar 27, 2024

Choose a reason for hiding this comment

ljedrz Apr 1, 2024

Choose a reason for hiding this comment

howardwu commented Apr 2, 2024

ljedrz commented Apr 2, 2024 • edited Loading

ljedrz commented Apr 3, 2024

d0cd commented Apr 3, 2024 • edited Loading

howardwu commented Apr 3, 2024

ljedrz commented Apr 4, 2024 • edited Loading

vicsn left a comment

Choose a reason for hiding this comment

ljedrz commented Apr 8, 2024

ljedrz commented Apr 8, 2024

vicsn left a comment

Choose a reason for hiding this comment

ljedrz commented Mar 23, 2024 •

edited

Loading

ljedrz Mar 27, 2024 •

edited

Loading

ljedrz commented Apr 2, 2024 •

edited

Loading

d0cd commented Apr 3, 2024 •

edited

Loading

ljedrz commented Apr 4, 2024 •

edited

Loading