Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute expected ancestry proof size #6

Closed
wants to merge 5 commits into from

Conversation

serban300
Copy link

@serban300 serban300 commented Aug 16, 2024

Related to: paritytech/polkadot-sdk#4523

This is a follow-up to paritytech/polkadot-sdk#5188

The main purpose of this PR is to add logic for computing the expected ancestry proof size .

This PR also aligns some functions from the ancestry_proof.rs file with the mmr.rs file, which should make it easier to upstream the ancestry proofs logic in the future.

@serban300
Copy link
Author

Converted to draft. Need to check some test failures

@serban300 serban300 changed the title Accept only optimal ancestry proofs Compute expected ancestry proof size Aug 17, 2024
@serban300 serban300 marked this pull request as ready for review August 17, 2024 12:23
@serban300
Copy link
Author

Converted to draft. Need to check some test failures

Fixed

@acatangiu
Copy link
Collaborator

cc @Lederstrumpf

Copy link
Collaborator

@acatangiu acatangiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level looks good, would like an audit too for peace of mind

src/ancestry_proof.rs Outdated Show resolved Hide resolved
src/ancestry_proof.rs Outdated Show resolved Hide resolved
src/ancestry_proof.rs Show resolved Hide resolved
@serban300 serban300 requested a review from ordian August 20, 2024 12:32
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since i'm new to the code, left a couple of questions mainly about algorithmic complexity.
still need some time to get an understanding of why this change preserves validity


let mut sibs_processed_from_back = Vec::new();
{
if !queue.insert_sorted_by(value, cmp_nodes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously, the q was sorted by the pos first and now it's (height, pos)
my understanding is that the nodes will be sorted by pos, so this algorithm might be running in $\mathcal{O}({nˆ2})$, is that correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the worst case scenario we want to prove all the nodes. Let's say the number of nodes = n. The complexity of inserting in an ordered queue is O(log(n)). So I think the total complexity would be O(n * log(n)).

But if we want to prove just one node, we need to start from it and go up until we reach the root. Let's say the number of leaves = l. The height of the MMR is log(l). The number of nodes in the proof is at most the height of the MMR (we need at most one node per level) so the queue length is at most log(l). So the complexity should be O(log(l) * log(l))

I hope I'm not missing anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity of inserting in an ordered queue is O(log(n)).

This is not true. Insertion at a random position in a VecDeque is $\mathcal{O}(n)$, because if you insert in the middle, you need to shift every element on the right. If you want to guarantee logarithmic complexity, I'd suggest using a BTreeSet instead of a VecDeque.
But if the number in the proof is always bounded to a small number, that's probably fine as is. Just want to make sure we are not introducing a DoS vector in the worst case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, you're right. It's O(n), because it's a vec-based structure. I missed that.

For our use case (ancestry proofs), yes the number of nodes in the proof should be small, but we should probably make this scalable in general. I will change it to a BTreeSet or a LinkedList.

Copy link
Member

@ordian ordian Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkedList would still have $\mathcal{O}(n)$ for finding the insertion position, so the right choice is BTreeSet or BinaryHeap 👍🏽

Copy link
Author

@serban300 serban300 Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkedList would still have O ( n ) for finding the insertion position,

If it's sorted I think it should have O(log(n)). But yes, BTreeSet would have O(1) amortized. It's only that for small numbers it would be worse than a sorted structure. And usually the number of nodes should be small.
BinaryHeap is also a good option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or sorry, in a linked list it's not that easy to take advantage of the fact that it's sorted. You're right.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, BTreeSet/BTreeMap since to be the best option. Done.

@@ -321,3 +294,41 @@ fn calculate_root<
let peaks_hashes = calculate_peaks_hashes::<_, M, _>(nodes, mmr_size, proof_iter)?;
bagging_peaks_hashes::<_, M>(peaks_hashes)
}

pub fn expected_ancestry_proof_size(prev_mmr_size: u64, mmr_size: u64) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the title of the PR, this seems like the main change in the PR. however, the rest seems like a totally unrelated rewriting of algorithm/refactoring. any reason to mix the two in one PR? or is it not a refactoring and actually this function depends on it?

also, what is the complexity of this function (seems non-trivial to guesstimate given nested loops)

Copy link
Author

@serban300 serban300 Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the title of the PR, this seems like the main change in the PR.

Yes, exactly.

however, the rest seems like a totally unrelated rewriting of algorithm/refactoring. any reason to mix the two in one PR? or is it not a refactoring and actually this function depends on it?

Yes, it's just a refactoring. No particular reason. Just did it for readability. I tried to follow the previous implementation to understand how to compute the expected proof size and it was just very hard. So I thought it might be helpful to refactor that as well. Also if we're doing a security audit for this PR, maybe it would be good to cover both changes. Also we'll need to upstream ancestry proofs at some point, and this change would help there, because it makes the code more similar to the upstream.

also, what is the complexity of this function (seems non-trivial to guesstimate given nested loops)

Worst case scenario, for each peak we need to walk from a node inside it to the peak height. So should be something like O (num_peaks * log(num_leafs))

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still don't fully grasp the details (probably best to ask @Lederstrumpf), but let's kick off the audit.

src/util.rs Outdated Show resolved Hide resolved
Comment on lines +291 to +325
let mut expected_proof_size: usize = 0;
let mut prev_peaks = get_peaks(prev_mmr_size);
let peaks = get_peaks(mmr_size);

for (peak_idx, peak) in peaks.iter().enumerate() {
let mut local_prev_peaks: Vec<u64> = take_while_vec(&mut prev_peaks, |pos| *pos <= *peak);

// Pop lowest local prev peak under the current peak
match local_prev_peaks.pop() {
Some(mut node) => {
let mut height = pos_height_in_tree(node);
while node < *peak {
if pos_height_in_tree(node + 1) > height {
// Node is right sibling
node += 1;
height += 1;
} else {
// Node is left sibling
expected_proof_size += 1;
node += parent_offset(height);
height += 1;
}
}
}
None => {
if peak_idx <= peaks.len() {
// Account for rhs bagging peaks
expected_proof_size += 1;
}
break;
}
}
}

expected_proof_size

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works by checking the frequency of left-sibling-hood when iterating up from the last prev_peak.
In some sense, it's a myopic climber ascending a dark mountain and feeling out its shape from relative location :P
The mountain peaks here are the co-path items for proving said last prev_peak.
We can also determine the exact shape of the mountain from just knowing its width (i.e. the number of leaves beneath it) though, and skip the climbing step.

I've opened up #7 to show the changes, but feel free to cherry pick the new algorithm over here instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ! Sounds good ! I'll take a look.

@serban300
Copy link
Author

Closing in favor of #7

@serban300 serban300 closed this Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants