Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add all necessary sled trees to zebra-state as defined in RFC0005 #1172

Merged
merged 7 commits into from
Oct 29, 2020

Conversation

yaahc
Copy link
Contributor

@yaahc yaahc commented Oct 16, 2020

Motivation

At least for the near future, zebrad will not have the full test infrastructure
that we need to properly test new code paths that only get called on blocks
created after the sapling network upgrade, such as the script::Verifier. As a
result we are likely going to rely heavily on manual testing to verify new
features work as intended. We will then improve the speed with which we can
manually test zebra by backing up a copy of the zebra-state sled database
synced up to the sapling activation height, where we can easily reset by wiping
out the current sled database and replacing it with the backed up copy.

This plan however requires makes changes to the sled database format costly.
Each time we add a new tree to the sled database the backed up database must be
deleted and re-created from genesis. To alleviate this issue we need to get
sled as close to the final format as we can as quickly as possible.

Solution

This change adds the expected sled trees as defined in the RFC. At the time of
writing the precise set of trees needed is a known unknown. Once we've finished
implementing incremental merkle trees however we should be able to determine
the proper format of the rest of the trees needed and include them in this PR,
hopefully bringing us to our final stable sled database format.

As part of this PR I've introduced a set of helper traits for consolidating the
definitions for how our types are stored and retrieved from sled. This should
help improve auditability by isolating related logic to one part of the file,
and ensuring that we use a single definition for the conversion across
sled_state.rs.

Related Issues

Test Plan

The introduction of IntoSled and FromSled makes it possible to unit test
our conversion formats in a way that wasn't possible before. Now it should be
relatively trivial to add a set of proptests that test each type that has both
an IntoSled and a FromSled implementation to verify that the round trip
conversion always returns the correct value.

  • Move serialization helpers into a submodule of sled_state.rs
  • Add prop tests for each type that implements both IntoSled and FromSled

TODOs

  • Add a version number to the path (e.g., ~/.cache/zebra/v1/mainnet/), detect when an older version is present, and delete it

Reviews

  • rebase on main and force-push, so GitHub's Commits and Files Changed only show changes from this PR

Bugs

  • If the block is a genesis block, skip any transaction updates. (Due to a bug in zcashd, genesis block transactions are ignored during validation.)
    • Note: this fix requires a state version increment, because states containing genesis transactions allow some transactions that zcashd rejects.

Incomplete

  • For each TransparentInput::PrevOut { outpoint, .. } in the transaction's inputs(), remove outpoint from utxo_by_output
  • Insert (transaction_hash, BE32(block_height) || BE32(tx_index)) to tx_by_hash
  • For each JoinSplit description in the transaction, insert (nullifiers[0],()) and (nullifiers[1],()) into sprout_nullifiers
  • For each Spend description in the transaction, insert (nullifier,()) into sapling_nullifiers

When we can compute anchors:

  • Add sprout_anchors and sapling_anchors in FinalizedState

@yaahc yaahc marked this pull request as draft October 16, 2020 21:36
@teor2345
Copy link
Contributor

Once we've finished
implementing incremental merkle trees however we should be able to determine
the proper format of the rest of the trees needed and include them in this PR,
hopefully bringing us to our final stable sled database format.

I'm not sure how long our database format will stay the same. It's probably going to be stable for at least 6 months. But future network upgrades could change that. And if we fix a data bug, we might need to rebuild the database from Genesis.

What's the plan for future database format upgrades / bug fixes?

Should we also add some kind of database file or tree path versioning as part of this PR?

Or should we schedule database versioning in a future alpha/beta release, before the first stable?
(Then we'd have more time to write an RFC for database versioning and updates.)

@hdevalence
Copy link
Contributor

The plan for future database format changes is to blow away the database and start over. Because the database is a cache of chain state, we don't lose any information by doing so, and it's much simpler and more reliable than trying to perform a migration. Adding a version number to the path (e.g., ~/.cache/zebra/v1/mainnet/) seems like a good idea, since it allows zebra-state to detect when an older version is present and delete it.

That said, it would be nice to minimize the frequency of changes to the database format, which I think is the goal of this PR.

@yaahc yaahc self-assigned this Oct 22, 2020
@yaahc yaahc changed the base branch from main to state-sync-reads October 23, 2020 21:55
Base automatically changed from state-sync-reads to download-set October 25, 2020 00:09
@teor2345
Copy link
Contributor

Adding a version number to the path (e.g., ~/.cache/zebra/v1/mainnet/) seems like a good idea, since it allows zebra-state to detect when an older version is present and delete it.

I have added this task to the TODO list for this PR.

@teor2345
Copy link
Contributor

Can Zebra compute sprout and sapling anchors yet?

If not, can we exclude them from the scope of this PR and the first alpha release?
We can open a tracking issue for follow-up.

Base automatically changed from download-set to main October 26, 2020 19:05
@hdevalence
Copy link
Contributor

hdevalence commented Oct 26, 2020

I don't think it's important for this PR that we check whether an older version is present and delete it, or implement any version handling at all. We just need to have a version separator in the path, so that it's possible to implement that in later changes.

I think it would be sufficient to define a

const SLED_FORMAT_VERSION: u32 = 0;

and change config.rs so that instead of

let path = self.cache_dir.join(net_dir).join("state");

we do

let path = self.cache_dir
    .join("state")
    .join(format!("v{}", SLED_FORMAT_VERSION))
    .join(net_dir);

(I changed the order of the path entries in a way that made more sense to me, so that someone who opens the cache directory sees state/v0/mainnet or state/v3/testnet etc. I think that this makes more sense than putting state last.)

Then, in a later change, we can implement logic that handles different versions, but that work doesn't need to block this change.

@yaahc yaahc marked this pull request as ready for review October 26, 2020 20:46
@teor2345
Copy link
Contributor

Can Zebra compute sprout and sapling anchors yet?

If not, can we exclude them from the scope of this PR and the first alpha release?
We can open a tracking issue for follow-up.

These tracking items are currently in the state tracking issue.

@teor2345
Copy link
Contributor

teor2345 commented Oct 26, 2020

Then, in a later change, we can implement logic that handles different versions, but that work doesn't need to block this change.

I opened #1213 for the "delete previous state versions" implementation.

@yaahc yaahc requested a review from a team October 26, 2020 23:58
@teor2345
Copy link
Contributor

@yaahc can you please rebase on main and force-push, so GitHub's Commits and Files Changed only show changes from this PR?

@teor2345
Copy link
Contributor

I've added the following TODOs to this PR, because they impact the state format (or state validity):

  • In commit_finalized_direct, if the block is a genesis block, skip any transaction updates. (Due to a bug in zcashd, genesis block transactions are ignored during validation.)
    • Note: this fix requires a state version increment, because states containing genesis transactions allow some transactions that zcashd rejects.

When we can compute anchors:
- Add sprout_anchor and sapling_anchor to QueuedBlock
- Add sprout_anchors and sapling_anchors to FinalizedState

@teor2345
Copy link
Contributor

Actually there are a few more changes, see the TODO list in the PR summary for details.

@hdevalence
Copy link
Contributor

Picking this up before finishing the request/response implementation because I'm concerned about merge conflicts.

@hdevalence
Copy link
Contributor

I rebased this onto main, and pushed the original contents as sled-trees-1, which can be compared against sled-trees-2 (pushed onto this branch) here: sled-trees-1...sled-trees-2

@hdevalence
Copy link
Contributor

@teor2345 I noticed that your TODO list included inserting sprout and sapling nullifiers, and indexing each transaction. I think those items were already included in @yaahc's commits, so I checked them off -- did you think that there was some missing functionality on those three items?

@teor2345
Copy link
Contributor

@teor2345 I noticed that your TODO list included inserting sprout and sapling nullifiers, and indexing each transaction. I think those items were already included in @yaahc's commits, so I checked them off -- did you think that there was some missing functionality on those three items?

I was working off main, because there were multiple in-progress PRs.

If they're done in this PR, that's all I wanted to double-check.

@hdevalence
Copy link
Contributor

@teor2345 Great, I just wanted to check to make sure that there wasn't an issue identified with the existing code. I think that all of the TODO items are addressed, except for the ones that are to be done later (when we can compute anchors).

@teor2345
Copy link
Contributor

teor2345 commented Oct 28, 2020

@teor2345 Great, I just wanted to check to make sure that there wasn't an issue identified with the existing code. I think that all of the TODO items are addressed, except for the ones that are to be done later (when we can compute anchors).

So to be clear, the current database format is v1...
Will we merge this PR, and switch to v2 when we can compute anchors?
Or will we wait until we can compute anchors, finish this PR, and stay on v1?

@yaahc
Copy link
Contributor Author

yaahc commented Oct 28, 2020

Will we merge this PR, and switch to v2 when we can compute anchors?
Or will we wait until we can compute anchors, finish this PR, and stay on v1?

I think we should do the former

@teor2345
Copy link
Contributor

Will we merge this PR, and switch to v2 when we can compute anchors?
Or will we wait until we can compute anchors, finish this PR, and stay on v1?

I think we should do the former

@hdevalence what do you think about merging this PR as-is?

Comment on lines -4 to -11

#[cfg(test)]
mod tests {
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
}
Copy link
Contributor

@teor2345 teor2345 Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change could go in a separate PR, if you'd like - along with the identical change in zebra-rpc/src/lib.rs

// sprout_nullifiers: db.open_tree(b"sprout_nullifiers").unwrap(),
// sapling_nullifiers: db.open_tree(b"sapling_nullifiers").unwrap(),
sprout_nullifiers: db.open_tree(b"sprout_nullifiers").unwrap(),
sapling_nullifiers: db.open_tree(b"sapling_nullifiers").unwrap(),
debug_stop_at_height: config.debug_stop_at_height.map(block::Height),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debug_stop_at_height: config.debug_stop_at_height.map(block::Height),
// TODO: sprout and sapling anchors
debug_stop_at_height: config.debug_stop_at_height.map(block::Height),

// total_flushed += self.sprout_nullifiers.flush()?;
// total_flushed += self.sapling_nullifiers.flush()?;
total_flushed += self.sprout_nullifiers.flush()?;
total_flushed += self.sapling_nullifiers.flush()?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: sprout and sapling anchors

Comment on lines +244 to +250
// TODO: sprout and sapling anchors (per block)

// Consensus-critical bug in zcashd: transactions in the
// genesis block are ignored.
if block.header.previous_block_hash == block::Hash([0; 32]) {
return Ok(hash);
}
Copy link
Contributor

@teor2345 teor2345 Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sprout anchor from the genesis block is also ignored by zcashd, but let's confirm that.

Edit: there are no sprout anchors until block 100, due to the coinbase rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dconnolly do we only need to store one sprout anchor per block?
Or is every sprout treestate, after every transaction, a valid sprout anchor for future transactions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypothetically, if there were Sapling anchors at genesis, zcashd would ignore them.

I don't know what Zebra should do with both the sprout and sapling edge cases, see PR #1232 for a potential RFC update.

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor suggestions about comments and PR structure, which shouldn't block merging.

We also need to work out if the genesis sprout anchor is valid in zcashd, but we shouldn't block the PR on finding that out. (I suspect that it isn't, as zcashd doesn't do a lot of the normal block processing on genesis blocks.)

Edit: there are no sprout anchors until block 100, due to the coinbase rules.

@hdevalence hdevalence merged commit 68b9a70 into main Oct 29, 2020
@hdevalence hdevalence deleted the sled-trees branch October 29, 2020 16:58
@yaahc yaahc mentioned this pull request Nov 5, 2020
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants