Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixmmr part2 #3666

Merged
merged 50 commits into from
Nov 26, 2021
Merged

Fixmmr part2 #3666

merged 50 commits into from
Nov 26, 2021

Conversation

tromp
Copy link
Contributor

@tromp tromp commented Nov 20, 2021


name: fixmmr_part2
about: Pull Request checklist
title: 'Fixmmr part2
labels: ''
assignees: ''


The MMR (Merkle Mountain Range) was originally designed with 0-based leaf and node indices.
This is reflected in the consensus model since hashes are computed with a 0-based node index prepended to the leaf data or the two child hashes.
0-based indices also simplify conversion of a leaf to a node index, which becomes
2 * leaf_index - count_ones(leaf_index).
The other C++ Grin code base, grinplusplus, uses 0-bases indices exclusively.
Unfortunately, 1-based indices have crept into the Grin code base resulting in a mix of both.
Use of 1-based indices has the downside that we often need to test that an index argument isn't 0.
This PR tries to convert as many uses of 1-based indices as possible to 0-based indices.
A few instances remain in storage related code where bitmaps and leaf sets are read from and written to disk.
The changes are not consensus breaking.
Each changed method is in its own commit, except for some pairs of related methods.
Tested with a regular sync as well as a sync from scratch.

tromp added 30 commits November 13, 2021 18:31
…tree_postorder_height; add round_up_to_leaf_pos method
@tromp tromp requested a review from yeastplume November 25, 2021 07:51
Copy link
Member

@yeastplume yeastplume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, spent a good while looking over changes here as well as coming to terms with them. It's a very large PR but obviously needs to be giving how sweeping a simple change to insertion_to_pmmr_index is :D - A few general thoughts:

  • It's unfortunate that we can't universally replace everything with 0-indexed pmmrs (for obvious reasons,) but at least this pushes a lot of the +/- 1 conversions out towards the extremities where the code is simpler and less likely to cause confusion.
  • I really wish we had more tests and data test sets, particularly ones that exercise changes against older versions of chain data. Most of our PMMR-related tests work on trivially small, easy to understand constructed sample data instead of exercising them against live data volumes, which means much more manual testing is required for changes like these than there should be.
  • Also really wish the API module had a series of doctests, similar to how the wallet APIs.
  • A lot of manual testing thus required before this can go live. However, despite our automated testing deficiencies, there will be much manual testing over the next few months, (particularly with ongoing PIBD work,) so there is plenty of time to shake out any issues this might introduce.

A few comments throughout, but nothing major I could find, with the caveat that there's a lot here to digest so that's no guarantee there aren't any issues..

Thanks for doing all of this.. A lot of fiddly work but it should hopefully make future work less confusing for myself and anyone else who picks up any work on this.

let header_pmmr = self.header_pmmr.read();
let txhashset = self.txhashset.read();
txhashset::utxo_view(&header_pmmr, &txhashset, |utxo, _| {
utxo.get_unspent_output_at(pos)
utxo.get_unspent_output_at(pos0)
})
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There is a lot to review here, so some of this comments will be more about me keeping track of what I've looked at than feedback)
Looked at all uses of get_unspent_output_at in chain.rs, seems to be covered.

@@ -1272,7 +1272,7 @@ impl Chain {
let txhashset = self.txhashset.read();
let last_index = match max_pmmr_index {
Some(i) => i,
None => txhashset.highest_output_insertion_index(),
None => txhashset.output_mmr_size(),
};
let outputs = txhashset.outputs_by_pmmr_index(start_index, max_count, max_pmmr_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change the name of this to outputs_by_pmmr_pos and change the names of the arguments to be start_pos, max_pos. Same with rangeproof_by_pmmr_index since we settled on terminology, it's even more confusing now when they're mixed up. We should be scrubbing the term 'pmmr index' wherever it appears anywhere (unless it's in an external API)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pmmr in outputs_by_pmmr_pos is redundant. I personally like leaf index vs node index. And then we can use position as a one word alternative to node index.

@@ -1301,13 +1301,14 @@ impl Chain {
None => self.head_header()?.height,
};
// Return headers at the given heights
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block_height_range_to_pmmr_positions for this fn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with either block_height_range_to_node_indices or block_height_range_to_positions.

.output_mmr_size + 1
};
let end_mmr_size = self.get_header_by_height(end_block_height)?.output_mmr_size;
Ok((start_mmr_size, end_mmr_size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked logic change seems okay

@@ -92,6 +92,7 @@ impl BitmapAccumulator {
let mut idx_iter = idx.into_iter().filter(|&x| x < size).peekable();
while let Some(x) = idx_iter.peek() {
if *x < chunk_idx * 1024 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangental, but we should probably use BitmapChunk::LEN_BITS here instead of 1024 for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will change

@@ -28,7 +28,7 @@ where
B: Backend<T>,
{
/// The last position in the PMMR
last_pos: u64,
size: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's potential for confusion here now.. e.g. the last pos in a zero-indexed mmr with 4 leaves is 6 but its size is 7 (they were the same in a 1-indexed pmmr). Is there an implicit conversion somewhere when this is called from a higher up (1-based) API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said above, we now equate size with next available position.
We can't be troubled by implicit conversion since using a size as position requires calling one of the affected methods.

// I don't see how 1 + 0-based index corresponds to a completed size
// E.g. output_mmr_count==3 gives size 5
block.header.output_mmr_size = 1 + pmmr::insertion_to_pmmr_index(prev.output_mmr_count());
block.header.kernel_mmr_size = 1 + pmmr::insertion_to_pmmr_index(prev.kernel_mmr_count());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly wrong; (output_mmr_count==3 size should be 7 with last position being 6). probably want to check why this is passing at some stage, possibly the test data is constructed so that outputs being appended only result in odd numbers of leaves? Also feeling like size != position could cause more ambiguity reading the code in places, but that might just be me.

Copy link
Contributor Author

@tromp tromp Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code works correctly if and only if output_mmr_count is even. An even leaf index corresponds to a left-child, so the size of the mmr after its addition is just its node index + 1.

Ok; here's the WEIRDEST thing: fixing the code amount to keeping the original code:-)
Although non-sensical for 1-based positions, that code was actually correct for 0-base positions!

size is position, but it's the position of the next node to be added, rather than the last node added.
this has the advantage that it also works for an empty MMR, where the old rule had an exception.

let elmt_pos = self.last_pos + 1;
let mut current_hash = elmt.hash_with_index(elmt_pos - 1);
pub fn push(&mut self, leaf: &T) -> Result<u64, String> {
let leaf_pos = self.size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, think I'm seeing it now, thinking about sizes in terms of size n containing indices 0..n-1, so when you're adding something to the mmr you're simply using index 'size' (as you would with any array) instead of 'last_pos' + 1 in the 1-indexed world. Maybe there was some thought earlier that 1 indexing meant last_position was always equal to size and therefore easier to reason about? (Just thinking aloud, doesn't matter).

/// is defined by size.
/// NOTE this function and the Path struct is UNUSED and
/// mostly redundant as family_branch has similar functionality
pub fn path(pos0: u64, size: u64) -> impl Iterator<Item = u64> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably just remove them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok; will do

pub fn includes(&self, pos: u64) -> bool {
self.bitmap.contains(pos as u32)
pub fn includes(&self, pos0: u64) -> bool {
self.bitmap.contains(1 + pos0 as u32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maddening not being able to change the bitmap indexing

Copy link
Contributor Author

@tromp tromp Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts exactly:-!
We can change the bitmap indexing, but then we need to put in stubs for the bitmap reading and writing functions to do the conversion, at some runtime cost...

Copy link
Member

@yeastplume yeastplume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and also meant to add that we should probably also manually test this against a set of chain data created by the last release version of grin.

@yeastplume yeastplume changed the title [WIP] Fixmmr part2 Fixmmr part2 Nov 26, 2021
@yeastplume yeastplume merged commit 53414ae into mimblewimble:master Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants