Fixmmr part2 #3666

tromp · 2021-11-20T22:41:15Z

name: fixmmr_part2
about: Pull Request checklist
title: 'Fixmmr part2
labels: ''
assignees: ''

The MMR (Merkle Mountain Range) was originally designed with 0-based leaf and node indices.
This is reflected in the consensus model since hashes are computed with a 0-based node index prepended to the leaf data or the two child hashes.
0-based indices also simplify conversion of a leaf to a node index, which becomes
2 * leaf_index - count_ones(leaf_index).
The other C++ Grin code base, grinplusplus, uses 0-bases indices exclusively.
Unfortunately, 1-based indices have crept into the Grin code base resulting in a mix of both.
Use of 1-based indices has the downside that we often need to test that an index argument isn't 0.
This PR tries to convert as many uses of 1-based indices as possible to 0-based indices.
A few instances remain in storage related code where bitmaps and leaf sets are read from and written to disk.
The changes are not consensus breaking.
Each changed method is in its own commit, except for some pairs of related methods.
Tested with a regular sync as well as a sync from scratch.

…tree_postorder_height; add round_up_to_leaf_pos method

yeastplume

Okay, spent a good while looking over changes here as well as coming to terms with them. It's a very large PR but obviously needs to be giving how sweeping a simple change to insertion_to_pmmr_index is :D - A few general thoughts:

It's unfortunate that we can't universally replace everything with 0-indexed pmmrs (for obvious reasons,) but at least this pushes a lot of the +/- 1 conversions out towards the extremities where the code is simpler and less likely to cause confusion.
I really wish we had more tests and data test sets, particularly ones that exercise changes against older versions of chain data. Most of our PMMR-related tests work on trivially small, easy to understand constructed sample data instead of exercising them against live data volumes, which means much more manual testing is required for changes like these than there should be.
Also really wish the API module had a series of doctests, similar to how the wallet APIs.
A lot of manual testing thus required before this can go live. However, despite our automated testing deficiencies, there will be much manual testing over the next few months, (particularly with ongoing PIBD work,) so there is plenty of time to shake out any issues this might introduce.

A few comments throughout, but nothing major I could find, with the caveat that there's a lot here to digest so that's no guarantee there aren't any issues..

Thanks for doing all of this.. A lot of fiddly work but it should hopefully make future work less confusing for myself and anyone else who picks up any work on this.

yeastplume · 2021-11-25T10:56:47Z

chain/src/chain.rs

 		let header_pmmr = self.header_pmmr.read();
 		let txhashset = self.txhashset.read();
 		txhashset::utxo_view(&header_pmmr, &txhashset, |utxo, _| {
-			utxo.get_unspent_output_at(pos)
+			utxo.get_unspent_output_at(pos0)
 		})
 	}



(There is a lot to review here, so some of this comments will be more about me keeping track of what I've looked at than feedback)
Looked at all uses of get_unspent_output_at in chain.rs, seems to be covered.

yeastplume · 2021-11-25T11:05:44Z

chain/src/chain.rs

@@ -1272,7 +1272,7 @@ impl Chain {
 		let txhashset = self.txhashset.read();
 		let last_index = match max_pmmr_index {
 			Some(i) => i,
-			None => txhashset.highest_output_insertion_index(),
+			None => txhashset.output_mmr_size(),
 		};
 		let outputs = txhashset.outputs_by_pmmr_index(start_index, max_count, max_pmmr_index);


can we change the name of this to outputs_by_pmmr_pos and change the names of the arguments to be start_pos, max_pos. Same with rangeproof_by_pmmr_index since we settled on terminology, it's even more confusing now when they're mixed up. We should be scrubbing the term 'pmmr index' wherever it appears anywhere (unless it's in an external API)

I think the pmmr in outputs_by_pmmr_pos is redundant. I personally like leaf index vs node index. And then we can use position as a one word alternative to node index.

yeastplume · 2021-11-25T11:08:19Z

chain/src/chain.rs

@@ -1301,13 +1301,14 @@ impl Chain {
 			None => self.head_header()?.height,
 		};
 		// Return headers at the given heights


block_height_range_to_pmmr_positions for this fn

I'd go with either block_height_range_to_node_indices or block_height_range_to_positions.

yeastplume · 2021-11-25T11:10:20Z

chain/src/chain.rs

+				.output_mmr_size + 1
+		};
+		let end_mmr_size = self.get_header_by_height(end_block_height)?.output_mmr_size;
+		Ok((start_mmr_size, end_mmr_size))


checked logic change seems okay

yeastplume · 2021-11-25T11:32:07Z

chain/src/txhashset/bitmap_accumulator.rs

@@ -92,6 +92,7 @@ impl BitmapAccumulator {
 		let mut idx_iter = idx.into_iter().filter(|&x| x < size).peekable();
 		while let Some(x) = idx_iter.peek() {
 			if *x < chunk_idx * 1024 {


tangental, but we should probably use BitmapChunk::LEN_BITS here instead of 1024 for clarity

ok, will change

yeastplume · 2021-11-25T13:03:33Z

core/src/core/pmmr/readonly_pmmr.rs

@@ -28,7 +28,7 @@ where
 	B: Backend<T>,
 {
 	/// The last position in the PMMR
-	last_pos: u64,
+	size: u64,


I'm wondering if there's potential for confusion here now.. e.g. the last pos in a zero-indexed mmr with 4 leaves is 6 but its size is 7 (they were the same in a 1-indexed pmmr). Is there an implicit conversion somewhere when this is called from a higher up (1-based) API

Like I said above, we now equate size with next available position.
We can't be troubled by implicit conversion since using a size as position requires calling one of the affected methods.

yeastplume · 2021-11-25T13:13:19Z

chain/tests/process_block_cut_through.rs

+		// I don't see how 1 + 0-based index corresponds to a completed size
+		// E.g. output_mmr_count==3 gives size 5
+		block.header.output_mmr_size = 1 + pmmr::insertion_to_pmmr_index(prev.output_mmr_count());
+		block.header.kernel_mmr_size = 1 + pmmr::insertion_to_pmmr_index(prev.kernel_mmr_count());


This is clearly wrong; (output_mmr_count==3 size should be 7 with last position being 6). probably want to check why this is passing at some stage, possibly the test data is constructed so that outputs being appended only result in odd numbers of leaves? Also feeling like size != position could cause more ambiguity reading the code in places, but that might just be me.

The current code works correctly if and only if output_mmr_count is even. An even leaf index corresponds to a left-child, so the size of the mmr after its addition is just its node index + 1.

Ok; here's the WEIRDEST thing: fixing the code amount to keeping the original code:-)
Although non-sensical for 1-based positions, that code was actually correct for 0-base positions!

size is position, but it's the position of the next node to be added, rather than the last node added.
this has the advantage that it also works for an empty MMR, where the old rule had an exception.

yeastplume · 2021-11-25T13:27:53Z

core/src/core/pmmr/pmmr.rs

-		let elmt_pos = self.last_pos + 1;
-		let mut current_hash = elmt.hash_with_index(elmt_pos - 1);
+	pub fn push(&mut self, leaf: &T) -> Result<u64, String> {
+		let leaf_pos = self.size;


Okay, think I'm seeing it now, thinking about sizes in terms of size n containing indices 0..n-1, so when you're adding something to the mmr you're simply using index 'size' (as you would with any array) instead of 'last_pos' + 1 in the 1-indexed world. Maybe there was some thought earlier that 1 indexing meant last_position was always equal to size and therefore easier to reason about? (Just thinking aloud, doesn't matter).

yeastplume · 2021-11-25T13:32:37Z

core/src/core/pmmr/pmmr.rs

+/// is defined by size.
+/// NOTE this function and the Path struct is UNUSED and
+/// mostly redundant as family_branch has similar functionality
+pub fn path(pos0: u64, size: u64) -> impl Iterator<Item = u64> {


Could probably just remove them

ok; will do

yeastplume · 2021-11-25T13:43:16Z

store/src/leaf_set.rs

-	pub fn includes(&self, pos: u64) -> bool {
-		self.bitmap.contains(pos as u32)
+	pub fn includes(&self, pos0: u64) -> bool {
+		self.bitmap.contains(1 + pos0 as u32)


Maddening not being able to change the bitmap indexing

My thoughts exactly:-!
We can change the bitmap indexing, but then we need to put in stubs for the bitmap reading and writing functions to do the conversion, at some runtime cost...

yeastplume

Oh, and also meant to add that we should probably also manually test this against a set of chain data created by the last release version of grin.

tromp added 30 commits November 13, 2021 18:31

use 0-based positions in methods pmmr_leaf_to_insertion_index and bin…

ce81ac8

…tree_postorder_height; add round_up_to_leaf_pos method

use 0-based positions in method insertion_to_pmmr_index

d5b4774

use 0-based positions in method is_leaf

f129836

use 0-based positions in method family()

a0a4a23

use 0-based positions in method is_left_sibling

9ad6230

use 0-based positions in method family_branch

8ff56d0

use 0-based positions in methods bintree_{left,right}most

b6228cf

use 0-based positions in method bintree_pos_iter

9bf04a9

use 0-based positions in method bintree_range

d25dcee

use 0-based positions in method bintree_leaf_pos_iter

347d60b

rename last_pos in MMR related structs to size

c27a318

use 0-based positions in method prune

d59706c

use 0-based positions in method push and apply_output return value

d9a6c72

use 0-based position argument of method merkle_proof

7bb0a32

use 0-based outputs in method pmmr::peaks

2ca45b3

fix peaks() code comments

32cb55f

refix peaks() code comments

65ac0e1

use 0-based positions in method get_peak_from_file

d2d6c5b

use 0-based positions in methods get_data_from_file

7906770

use 0-based positions in methods get_from_file

d8f9711

use 0-based positions in methods get_data

5f2addd

use 0-based positions in methods get_hash

85db5b2

use 0-based positions in method peak_path

8350c6c

use 0-based positions in method bag_the_rhs

18a554f

use 0-based positions in method Backend::remove

8b88c13

use 0-based positions in method leaf_pos_iter

0be50ea

use 0-based positions in method self.LeafSet::includes

0d8c858

use 0-based positions in methods self.LeafSet::{add,remove}

9f5a6d1

use 0-based positions in methods is_pruned,is_pruned_root,is_compacted

6d22f78

use 0-based positions in methods PruneList::append

c58a901

tromp added 19 commits November 19, 2021 11:15

use 0-based positions in methods append_pruned_subtree

30512a8

use 0-based positions in method calculate_next_leaf_shift

b2ee363

use 0-based positions in method append_single

b863fce

use 0-based positions in method calculate_next_shift

1ab3156

use 0-based positions in method segment_pos_range

515b56e

use 0-based positions in method reconstruct_root

62bc732

use 0-based positions in method validate_with

c6f5a43

use 0-based positions in method validate

7615674

rename size (formerly last_pos) to mmr_size

88d6ad3

use 0-based positions in Segment's hash_pos and leaf_pos

4b30e72

minimize use of saturating_sub(1) and rename some pos/idx to size

7461dbb

use 0-based positions in methods get_output_pos

1421814

use 0-based positions in method get_unspent_output_at

f2a223e

use 0-based positions in method get_header_hash

3842bb8

use 0-based positions in methods MerkleProof::verify{,_consume}

1c947dd

use 0-based positions in method cleanup_subtree

01cc88e

don't allow 0 in prunelist bitmap

23db87b

use 0-based positions in methods get_{,leaf_}shift

57a5554

rename some 1-based pos to pos1; identify TODO

d348d6f

tromp requested a review from yeastplume November 25, 2021 07:51

yeastplume approved these changes Nov 25, 2021

View reviewed changes

yeastplume reviewed Nov 25, 2021

View reviewed changes

Address yeastplume's PR review comments

c38a913

yeastplume changed the title ~~[WIP] Fixmmr part2~~ Fixmmr part2 Nov 26, 2021

yeastplume merged commit 53414ae into mimblewimble:master Nov 26, 2021

yeastplume mentioned this pull request Nov 29, 2021

[PIBD] PMMR Desegmenter Structure (Pt. 1) #3667

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixmmr part2 #3666

Fixmmr part2 #3666

tromp commented Nov 20, 2021 •

edited

Loading

yeastplume left a comment

yeastplume Nov 25, 2021

yeastplume Nov 25, 2021

tromp Nov 25, 2021

yeastplume Nov 25, 2021

tromp Nov 25, 2021

yeastplume Nov 25, 2021

yeastplume Nov 25, 2021

tromp Nov 25, 2021

yeastplume Nov 25, 2021

tromp Nov 25, 2021

yeastplume Nov 25, 2021

tromp Nov 25, 2021 •

edited

Loading

yeastplume Nov 25, 2021

yeastplume Nov 25, 2021

tromp Nov 25, 2021

yeastplume Nov 25, 2021

tromp Nov 25, 2021 •

edited

Loading

yeastplume left a comment

Fixmmr part2 #3666

Fixmmr part2 #3666

Conversation

tromp commented Nov 20, 2021 • edited Loading

yeastplume left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tromp Nov 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tromp Nov 25, 2021 • edited Loading

Choose a reason for hiding this comment

yeastplume left a comment

Choose a reason for hiding this comment

tromp commented Nov 20, 2021 •

edited

Loading

tromp Nov 25, 2021 •

edited

Loading

tromp Nov 25, 2021 •

edited

Loading