-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] benchmarks for hashing single subtrees recursively #334
Conversation
Not a review yet, but looking briefly through the code I think I had something way simpler in mind. What I think we need as a basic building block of the algorithm is constructing a logical SMT (i.e., not our specific /// Builds a set of nodes for a Merkle tree of depth 8 from the specified set of leaves. The nodes are
/// appended to the `inner_nodes` map. The leaves are assumed to be located at the specified depth.
pub fn build_subtree(
leaves: impl IntoIterator<Item = (u64, Digest)>
leaf_depth: u8,
&mut inner_nodes: BTreeMap<NodeIndex, InnerNode>,
) Once we have this working (and assuming it is efficient), we can use it to build various levels of the actual SMTs. For example, for our
|
Actually, this may not be very parallelizable since pub fn build_subtree(
leaves: impl IntoIterator<Item = (u64, Digest)>
leaf_depth: u8,
) -> BTreeMap<NodeIndex, InnerNode> And then we can merge |
56087c7
to
9638969
Compare
Alright, I've pushed a simpler implementation much closer to what you suggested, and micro-benchmarks for it. This implementation takes the leaves pre-sorted, since presumably we'll want to only sort at the beginning. The benchmarks don't include the sort time, though I can easily change that. The benchmarks look like this:
There seems to always be several outliers, no matter how quiet I make my system. Here's the output without the outlier diagnostic-noise, for easier reading:
It also turns out that I did the math for roughly-evenly distributed leaves incorrectly, for the benchmarks this PR had originally. I at first made this mistake in this new benchmark too, and was astonished to see the performance jump from microsecond figures to millisecond figures going from supposedly evenly distributed data to random data. I was accidentally generating far too many leaves with the same index, which were then getting de-duplicated. After fixing that, the even benchmarks are now in the same order of magnitude as the random ones. |
Thank you! A couple of follow up questions: How much time does it take to build a tree for a single leaf? The reason I'm asking is that vast majority of the time we'd be building trees that have just one leaf in them. For example, assuming the leaves are randomly distributed, if we have 100M leaves, the subtrees up until depth 24 are very likely to be just single-leaf trees. How does the timing for building a tree from 256 leaves compare to the timing for building a fully balanced MerkleTree with 256 leaves? I'm curious because the fully-balanced case should give us the lower bound on performance as most of the time there should be spent hashing. |
Good questions! I'll find out! |
Quality Gate passedIssues Measures |
And here are the results:
|
41 microseconds for a single-leaf case is pretty good! A bit surprising though that hashing a fully-balanced 256-leaf tree is about 4x more efficient than building a subtree with 256 leaves (I was thinking it'd be closer to 2x). I think this is fine for now and we can definitely optimize this more in the future (let's create an issue for this). The next step would be to use this method as a building block for building a full tree. |
This code is a benchmark for hashing a single, depth-8 subtree of a sparse Merkle tree by recursively establishing child hashes, adapted from work in progress code using this method to compute subtrees in parallel.
Raw results, the number of the left indicating the number of new key-value pairs:
The time it takes to hash a subtree increases linearly with respect to the amount of key-value pairs being added to the tree as a whole.