WIP: Add all necessary sled trees to zebra-state as defined in RFC0005 #1172

yaahc · 2020-10-16T21:36:26Z

Motivation

At least for the near future, zebrad will not have the full test infrastructure
that we need to properly test new code paths that only get called on blocks
created after the sapling network upgrade, such as the script::Verifier. As a
result we are likely going to rely heavily on manual testing to verify new
features work as intended. We will then improve the speed with which we can
manually test zebra by backing up a copy of the zebra-state sled database
synced up to the sapling activation height, where we can easily reset by wiping
out the current sled database and replacing it with the backed up copy.

This plan however requires makes changes to the sled database format costly.
Each time we add a new tree to the sled database the backed up database must be
deleted and re-created from genesis. To alleviate this issue we need to get
sled as close to the final format as we can as quickly as possible.

Solution

This change adds the expected sled trees as defined in the RFC. At the time of
writing the precise set of trees needed is a known unknown. Once we've finished
implementing incremental merkle trees however we should be able to determine
the proper format of the rest of the trees needed and include them in this PR,
hopefully bringing us to our final stable sled database format.

As part of this PR I've introduced a set of helper traits for consolidating the
definitions for how our types are stored and retrieved from sled. This should
help improve auditability by isolating related logic to one part of the file,
and ensuring that we use a single definition for the conversion across
sled_state.rs.

Related Issues

Tracking Issue

Test Plan

The introduction of IntoSled and FromSled makes it possible to unit test
our conversion formats in a way that wasn't possible before. Now it should be
relatively trivial to add a set of proptests that test each type that has both
an IntoSled and a FromSled implementation to verify that the round trip
conversion always returns the correct value.

Move serialization helpers into a submodule of sled_state.rs
Add prop tests for each type that implements both IntoSled and FromSled

TODOs

Add a version number to the path (e.g., ~/.cache/zebra/v1/mainnet/), detect when an older version is present, and delete it

Reviews

rebase on main and force-push, so GitHub's Commits and Files Changed only show changes from this PR

Bugs

If the block is a genesis block, skip any transaction updates. (Due to a bug in zcashd, genesis block transactions are ignored during validation.)
- Note: this fix requires a state version increment, because states containing genesis transactions allow some transactions that zcashd rejects.

Incomplete

For each TransparentInput::PrevOut { outpoint, .. } in the transaction's inputs(), remove outpoint from utxo_by_output
Insert (transaction_hash, BE32(block_height) || BE32(tx_index)) to tx_by_hash
For each JoinSplit description in the transaction, insert (nullifiers[0],()) and (nullifiers[1],()) into sprout_nullifiers
For each Spend description in the transaction, insert (nullifier,()) into sapling_nullifiers

When we can compute anchors:

Add sprout_anchors and sapling_anchors in FinalizedState

teor2345 · 2020-10-19T01:01:47Z

Once we've finished
implementing incremental merkle trees however we should be able to determine
the proper format of the rest of the trees needed and include them in this PR,
hopefully bringing us to our final stable sled database format.

I'm not sure how long our database format will stay the same. It's probably going to be stable for at least 6 months. But future network upgrades could change that. And if we fix a data bug, we might need to rebuild the database from Genesis.

What's the plan for future database format upgrades / bug fixes?

Should we also add some kind of database file or tree path versioning as part of this PR?

Or should we schedule database versioning in a future alpha/beta release, before the first stable?
(Then we'd have more time to write an RFC for database versioning and updates.)

hdevalence · 2020-10-22T17:09:31Z

The plan for future database format changes is to blow away the database and start over. Because the database is a cache of chain state, we don't lose any information by doing so, and it's much simpler and more reliable than trying to perform a migration. Adding a version number to the path (e.g., ~/.cache/zebra/v1/mainnet/) seems like a good idea, since it allows zebra-state to detect when an older version is present and delete it.

That said, it would be nice to minimize the frequency of changes to the database format, which I think is the goal of this PR.

teor2345 · 2020-10-26T11:47:57Z

Adding a version number to the path (e.g., ~/.cache/zebra/v1/mainnet/) seems like a good idea, since it allows zebra-state to detect when an older version is present and delete it.

I have added this task to the TODO list for this PR.

teor2345 · 2020-10-26T11:50:41Z

Can Zebra compute sprout and sapling anchors yet?

If not, can we exclude them from the scope of this PR and the first alpha release?
We can open a tracking issue for follow-up.

hdevalence · 2020-10-26T19:51:35Z

I don't think it's important for this PR that we check whether an older version is present and delete it, or implement any version handling at all. We just need to have a version separator in the path, so that it's possible to implement that in later changes.

I think it would be sufficient to define a

const SLED_FORMAT_VERSION: u32 = 0;

and change config.rs so that instead of

let path = self.cache_dir.join(net_dir).join("state");

we do

let path = self.cache_dir
    .join("state")
    .join(format!("v{}", SLED_FORMAT_VERSION))
    .join(net_dir);

(I changed the order of the path entries in a way that made more sense to me, so that someone who opens the cache directory sees state/v0/mainnet or state/v3/testnet etc. I think that this makes more sense than putting state last.)

Then, in a later change, we can implement logic that handles different versions, but that work doesn't need to block this change.

teor2345 · 2020-10-26T21:13:42Z

Can Zebra compute sprout and sapling anchors yet?

If not, can we exclude them from the scope of this PR and the first alpha release?
We can open a tracking issue for follow-up.

These tracking items are currently in the state tracking issue.

teor2345 · 2020-10-26T21:19:04Z

Then, in a later change, we can implement logic that handles different versions, but that work doesn't need to block this change.

I opened #1213 for the "delete previous state versions" implementation.

teor2345 · 2020-10-27T10:26:19Z

@yaahc can you please rebase on main and force-push, so GitHub's Commits and Files Changed only show changes from this PR?

teor2345 · 2020-10-27T10:27:50Z

I've added the following TODOs to this PR, because they impact the state format (or state validity):

In commit_finalized_direct, if the block is a genesis block, skip any transaction updates. (Due to a bug in zcashd, genesis block transactions are ignored during validation.)
- Note: this fix requires a state version increment, because states containing genesis transactions allow some transactions that zcashd rejects.

When we can compute anchors:
- Add sprout_anchor and sapling_anchor to QueuedBlock
- Add sprout_anchors and sapling_anchors to FinalizedState

teor2345 · 2020-10-27T10:47:02Z

Actually there are a few more changes, see the TODO list in the PR summary for details.

hdevalence · 2020-10-27T20:01:20Z

Picking this up before finishing the request/response implementation because I'm concerned about merge conflicts.

Also removes a redundant test.

hdevalence · 2020-10-27T21:31:13Z

I rebased this onto main, and pushed the original contents as sled-trees-1, which can be compared against sled-trees-2 (pushed onto this branch) here: sled-trees-1...sled-trees-2

hdevalence · 2020-10-27T21:35:17Z

@teor2345 I noticed that your TODO list included inserting sprout and sapling nullifiers, and indexing each transaction. I think those items were already included in @yaahc's commits, so I checked them off -- did you think that there was some missing functionality on those three items?

teor2345 · 2020-10-27T22:49:53Z

@teor2345 I noticed that your TODO list included inserting sprout and sapling nullifiers, and indexing each transaction. I think those items were already included in @yaahc's commits, so I checked them off -- did you think that there was some missing functionality on those three items?

I was working off main, because there were multiple in-progress PRs.

If they're done in this PR, that's all I wanted to double-check.

hdevalence · 2020-10-28T00:09:43Z

@teor2345 Great, I just wanted to check to make sure that there wasn't an issue identified with the existing code. I think that all of the TODO items are addressed, except for the ones that are to be done later (when we can compute anchors).

teor2345 · 2020-10-28T00:17:41Z

@teor2345 Great, I just wanted to check to make sure that there wasn't an issue identified with the existing code. I think that all of the TODO items are addressed, except for the ones that are to be done later (when we can compute anchors).

So to be clear, the current database format is v1...
Will we merge this PR, and switch to v2 when we can compute anchors?
Or will we wait until we can compute anchors, finish this PR, and stay on v1?

yaahc · 2020-10-28T20:58:39Z

Will we merge this PR, and switch to v2 when we can compute anchors?
Or will we wait until we can compute anchors, finish this PR, and stay on v1?

I think we should do the former

teor2345 · 2020-10-29T10:15:39Z

Will we merge this PR, and switch to v2 when we can compute anchors?
Or will we wait until we can compute anchors, finish this PR, and stay on v1?

I think we should do the former

@hdevalence what do you think about merging this PR as-is?

teor2345 · 2020-10-29T10:22:01Z

zebra-client/src/lib.rs

-
-#[cfg(test)]
-mod tests {
-    #[test]
-    fn it_works() {
-        assert_eq!(2 + 2, 4);
-    }
-}


This change could go in a separate PR, if you'd like - along with the identical change in zebra-rpc/src/lib.rs

teor2345 · 2020-10-29T10:25:17Z

zebra-state/src/sled_state.rs

-            // sprout_nullifiers: db.open_tree(b"sprout_nullifiers").unwrap(),
-            // sapling_nullifiers: db.open_tree(b"sapling_nullifiers").unwrap(),
+            sprout_nullifiers: db.open_tree(b"sprout_nullifiers").unwrap(),
+            sapling_nullifiers: db.open_tree(b"sapling_nullifiers").unwrap(),
            debug_stop_at_height: config.debug_stop_at_height.map(block::Height),


Suggested change

debug_stop_at_height: config.debug_stop_at_height.map(block::Height),

// TODO: sprout and sapling anchors

debug_stop_at_height: config.debug_stop_at_height.map(block::Height),

teor2345 · 2020-10-29T10:25:28Z

zebra-state/src/sled_state.rs

-        // total_flushed += self.sprout_nullifiers.flush()?;
-        // total_flushed += self.sapling_nullifiers.flush()?;
+        total_flushed += self.sprout_nullifiers.flush()?;
+        total_flushed += self.sapling_nullifiers.flush()?;



Suggested change

// TODO: sprout and sapling anchors

teor2345 · 2020-10-29T10:27:11Z

zebra-state/src/sled_state.rs

+                    // TODO: sprout and sapling anchors (per block)

+                    // Consensus-critical bug in zcashd: transactions in the
+                    // genesis block are ignored.
+                    if block.header.previous_block_hash == block::Hash([0; 32]) {
+                        return Ok(hash);
+                    }


~~I think the sprout anchor from the genesis block is also ignored by zcashd, but let's confirm that.~~

Edit: there are no sprout anchors until block 100, due to the coinbase rules.

@dconnolly do we only need to store one sprout anchor per block?
Or is every sprout treestate, after every transaction, a valid sprout anchor for future transactions?

Hypothetically, if there were Sapling anchors at genesis, zcashd would ignore them.

I don't know what Zebra should do with both the sprout and sapling edge cases, see PR #1232 for a potential RFC update.

teor2345

I have some minor suggestions about comments and PR structure, which shouldn't block merging.

We also need to work out if the genesis sprout anchor is valid in zcashd, but we shouldn't block the PR on finding that out. (I suspect that it isn't, as zcashd doesn't do a lot of the normal block processing on genesis blocks.)

Edit: there are no sprout anchors until block 100, due to the coinbase rules.

yaahc marked this pull request as draft October 16, 2020 21:36

yaahc self-assigned this Oct 22, 2020

yaahc force-pushed the sled-trees branch from ee3284d to 6d83b69 Compare October 23, 2020 21:55

yaahc changed the base branch from main to state-sync-reads October 23, 2020 21:55

hdevalence force-pushed the state-sync-reads branch from 1e15f7a to d2dd26c Compare October 24, 2020 00:36

Base automatically changed from state-sync-reads to download-set October 25, 2020 00:09

hdevalence force-pushed the download-set branch from 7b8f8f7 to a689832 Compare October 26, 2020 18:55

Base automatically changed from download-set to main October 26, 2020 19:05

yaahc marked this pull request as ready for review October 26, 2020 20:46

teor2345 mentioned this pull request Oct 26, 2020

Detect when an older state version is present, and delete it #1213

Closed

4 tasks

yaahc requested a review from a team October 26, 2020 23:58

teor2345 mentioned this pull request Oct 27, 2020

Tracking: State Update RFC Implementation #1049

Closed

12 tasks

yaahc added 4 commits October 27, 2020 14:07

state: create SledSerialize, SledDeserialize traits

be2a873

state: insert into nullifier trees

Verified

This commit was signed with the committer’s verified signature. The key has expired.

MylesBorins Myles Borins

GPG key ID: 933B01F40B5CA946
Expired

Verified
Learn about vigilant mode

e46291b

clippy: remove generated tests in empty crates

8ceeb29

state: add SLED_FORMAT_VERSION prefix to db path

207562d

Also removes a redundant test.

hdevalence force-pushed the sled-trees branch from 5f03e23 to 9f5314e Compare October 27, 2020 21:28

state: clean up SledState query impls

85d2645

hdevalence force-pushed the sled-trees branch from 9f5314e to 85d2645 Compare October 27, 2020 21:29

hdevalence added 2 commits October 27, 2020 14:55

state: remove spent UTXOs in commit_finalized_direct

d9d7289

state: skip transactions in a genesis block

d547ea1

teor2345 reviewed Oct 29, 2020

View reviewed changes

teor2345 approved these changes Oct 29, 2020

View reviewed changes

teor2345 mentioned this pull request Oct 29, 2020

State RFC: Clarify genesis sprout anchors #1232

Merged

hdevalence merged commit 68b9a70 into main Oct 29, 2020

hdevalence deleted the sled-trees branch October 29, 2020 16:58

yaahc mentioned this pull request Nov 5, 2020

Tracking: Finalized State Implementation #1249

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add all necessary sled trees to zebra-state as defined in RFC0005 #1172

WIP: Add all necessary sled trees to zebra-state as defined in RFC0005 #1172

yaahc commented Oct 16, 2020 •

edited by hdevalence

Loading

teor2345 commented Oct 19, 2020

hdevalence commented Oct 22, 2020

teor2345 commented Oct 26, 2020

teor2345 commented Oct 26, 2020

hdevalence commented Oct 26, 2020 •

edited

Loading

teor2345 commented Oct 26, 2020

teor2345 commented Oct 26, 2020 •

edited

Loading

teor2345 commented Oct 27, 2020

teor2345 commented Oct 27, 2020

teor2345 commented Oct 27, 2020

hdevalence commented Oct 27, 2020

hdevalence commented Oct 27, 2020

hdevalence commented Oct 27, 2020

teor2345 commented Oct 27, 2020

hdevalence commented Oct 28, 2020

teor2345 commented Oct 28, 2020 •

edited

Loading

yaahc commented Oct 28, 2020

teor2345 commented Oct 29, 2020

teor2345 Oct 29, 2020 •

edited

Loading

teor2345 Oct 29, 2020

teor2345 Oct 29, 2020

teor2345 Oct 29, 2020 •

edited

Loading

teor2345 Oct 29, 2020

teor2345 Oct 29, 2020

teor2345 left a comment •

edited

Loading

	debug_stop_at_height: config.debug_stop_at_height.map(block::Height),
	// TODO: sprout and sapling anchors
	debug_stop_at_height: config.debug_stop_at_height.map(block::Height),

WIP: Add all necessary sled trees to zebra-state as defined in RFC0005 #1172

WIP: Add all necessary sled trees to zebra-state as defined in RFC0005 #1172

Conversation

yaahc commented Oct 16, 2020 • edited by hdevalence Loading

Motivation

Solution

Related Issues

Test Plan

TODOs

Reviews

Bugs

Incomplete

teor2345 commented Oct 19, 2020

hdevalence commented Oct 22, 2020

teor2345 commented Oct 26, 2020

teor2345 commented Oct 26, 2020

hdevalence commented Oct 26, 2020 • edited Loading

teor2345 commented Oct 26, 2020

teor2345 commented Oct 26, 2020 • edited Loading

teor2345 commented Oct 27, 2020

teor2345 commented Oct 27, 2020

teor2345 commented Oct 27, 2020

hdevalence commented Oct 27, 2020

hdevalence commented Oct 27, 2020

hdevalence commented Oct 27, 2020

teor2345 commented Oct 27, 2020

hdevalence commented Oct 28, 2020

teor2345 commented Oct 28, 2020 • edited Loading

yaahc commented Oct 28, 2020

teor2345 commented Oct 29, 2020

teor2345 Oct 29, 2020 • edited Loading

Choose a reason for hiding this comment

teor2345 Oct 29, 2020

Choose a reason for hiding this comment

teor2345 Oct 29, 2020

Choose a reason for hiding this comment

teor2345 Oct 29, 2020 • edited Loading

Choose a reason for hiding this comment

teor2345 Oct 29, 2020

Choose a reason for hiding this comment

teor2345 Oct 29, 2020

Choose a reason for hiding this comment

teor2345 left a comment • edited Loading

Choose a reason for hiding this comment

yaahc commented Oct 16, 2020 •

edited by hdevalence

Loading

hdevalence commented Oct 26, 2020 •

edited

Loading

teor2345 commented Oct 26, 2020 •

edited

Loading

teor2345 commented Oct 28, 2020 •

edited

Loading

teor2345 Oct 29, 2020 •

edited

Loading

teor2345 Oct 29, 2020 •

edited

Loading

teor2345 left a comment •

edited

Loading