Chain State Design #682

teor2345 · 2020-07-18T11:44:48Z

Update: See #763 for the latest design.

This description is outdated:

Our chain state data structures need to support verification, and chain reorganisation.

Here's one possible design, which uses immutable, (partially) shared data structures:

Zebra holds the chain states for the last ~100 blocks in memory. (The zcashd reorg limit is 100 blocks behind the current best tip.) That way, chain reorgs are quick and reliable. Take the chain state of the fork root block, update the chain state using the alternate child block, and change the best tip.

We delay writing the chain state to disk until we know it's permanent. That way, reorgs won't touch the disk. (And we never write orphaned or redundant chain states to disk.)

Here's a summary of the overall design:

Memory contains a tree of recent verified blocks, and each block's state.

Disk has a linear chain of older blocks by height, and the state for the latest linear block.

Disk also has a pool of recent blocks, indexed by hash (not height, because recent blocks can have duplicate heights)

Relaunches should be quick, because we only need to replay the last 100 blocks on top of the persistent state. Plus whatever new blocks we get from the network.

There are a few different kinds of disk updates:

add a block to the set of blocks on disk (by hash)

append a block hash to the chain on disk (by height)

add or remove a hash from the set of recent blocks (add by hash, remove by height)

update the current chain state atomically

most state changes will modify or delete entries, but some of the state is append-only

Open Questions:

which module is responsible for the chain state?
- the state data structures and basic operations can go in zebra-state
- zebra-consensus makes the decision about the best tip, and when to write state to disk

yaahc · 2020-07-18T18:26:25Z

Take the chain state of the fork root block, update the chain state using the alternate child block, and change the best tip.

Memory contains a tree of recent verified blocks, and each block's state.

What do we mean by "state" in these lines?

yaahc · 2020-07-18T19:11:44Z

I'm not sure I'm understanding the design for the in memory chain state so let me try describing my understanding of how I figured it would work and we can figure out how close it is to your design.

Assuming the state service is storing the last 100 heights worth of blocks in memory and not commiting them to the disk here's how I'd expect it to be represented and modified.

state service stores a vec of "chains"
each chain is an im ordered map where the key = the height and the value is the block (or maybe the blocks hash)
each block add will find the chain where the parent block lives, how that search is done doesn't matter because the number and length of the chains is likely small
- if the ancestor is the tip of its chain the block is just appended to that chain
- if the ancestor is not the tip of its chain then the chain up to and including that ancestor is cloned and inserted to the vec of chains as a new chain and the new block is appended to that new chain.
after each update we sort the vec of chains by height
after the sort we take the first block of the longest chain, if it is now over 100 blocks long, and commit that to the state layer
we then pop that block from the head of all other chains and discard any chains that don't start with that block.

teor2345 · 2020-07-19T03:14:27Z

What do we mean by "state" in these lines?

The info we need to verify the next block and all its transactions.

And any info we need to do any other tasks, such as wallet support. (In later releases.)

yaahc · 2020-07-19T03:23:13Z

okay but you say the "the chain state for each block" , so does that mean just "the entire chain that leads up to a given block"

teor2345 · 2020-07-19T04:03:49Z

You described the kind of design I was thinking of. But there are a few tweaks that make the algorithm and data structures a bit simpler.

each chain is an im ordered map where the key = the height and the value is the block (or maybe the blocks hash)

I don't think we need to explicitly track heights or lists of blocks.

We can key blocks by hash, because the parent hash field points to a unique block. This map is what I called the "block pool".

(If we subtract one from the height to find the parent height, there can be multiple blocks at that height. But we just want the unique parent block. And we don't need to check the child block's height at this stage, because it has already been checked as part of verification.)

The map's state value contains whatever information we need to verify a child block. For example, the unspent transaction outputs and note commitment tree.

We might also keep a copy of the data we use to calculate the time and difficulty medians, but we could also iterate back through the parent blocks for that info. (Unlike the transaction state, the depth of the medians is limited.)

each block add will find the chain where the parent block lives, how that search is done doesn't matter because the number and length of the chains is likely small

if the ancestor is the tip of its chain the block is just appended to that chain

if the ancestor is not the tip of its chain then the chain up to and including that ancestor is cloned and inserted to the vec of chains as a new chain and the new block is appended to that new chain.

A block add is the same operation, regardless of the chain structure:

Find the parent block's state
Clone that state
Update the cloned state based on the content of the new block
Add the new block to the block pool

Since forks can happen in the middle of a chain, we need to keep state for every block that could potentially fork. (The zcashd reorg limit is 100 blocks, blocks can't fork after that limit.)

after each update we sort the vec of chains by height

after the sort we take the first block of the longest chain, if it is now over 100 blocks long, and commit that to the state layer

we then pop that block from the head of all other chains and discard any chains that don't start with that block.

The main chain consensus rule is "the chain with the greatest proof of work, breaking ties with the block this node received first". It's usually the longest chain, but that's not guaranteed.

After every update, we modify the set of chain tips, which is a standard mutable set data structure. We also keep a best tip. We can track tips using Arc references to their chain states.

Here's how updates work:

If the new block is a child of an existing tip, replace that old tip with this new tip
Otherwise, add a new tip to the state
If any tip has more proof of work than the current tip, update the best tip.

Requiring a strictly greater proof of work breaks ties using the first block that the node verified. Since the tie-breaker is non-deterministic, we can use "verified time" rather than "downloaded time", and still converge with other zcashd and zebrad instances. (The ordering will almost always be the same, except if the earlier block is very slow to verify.)

The zcashd pruning rule is "288 blocks behind the best tip". But I think zebrad can prune much earlier, because we can just go fetch the blocks from disk as needed. (And we don't need to fetch the state from disk.)

Here's how pruning old block states works:

Find all in-memory block states that are at least 100 blocks behind the new tip. Let's call them "mature states".
Find the mature states in the main chain, and add their block hashes to the on-disk height-based chain index.
Find the highest mature state in the main chain, and write it to disk as the on-disk state.
Remove all the mature states from the in-memory chain tips and block pool. This drops old side-chain and main chain states.

After a chain reorg, there could be zero, one, or many mature blocks. Some may be child blocks of other mature blocks. So the algorithm has to be generic.

We can use this network upgrade to implement different consensus rules and chain context handling for genesis blocks. Part of the chain state design in ZcashFoundation#682.

We can use this network upgrade to implement different consensus rules and chain context handling for genesis blocks. Part of the chain state design in #682.

teor2345 · 2020-08-07T10:12:12Z

We're doing this through the RFC process now.

teor2345 added Poll::Ready C-design Category: Software design work A-consensus Area: Consensus rule updates S-needs-design Status: Needs a design decision labels Jul 18, 2020

teor2345 added this to the Sync and validate zcash mainnet milestone Jul 18, 2020

teor2345 self-assigned this Jul 18, 2020

This was referenced Jul 20, 2020

Implement Consensus Chain Handling Services #516

Closed

rfc: Parallel Verification #763

Merged

dconnolly pushed a commit that referenced this issue Jul 27, 2020

feature: Add a "Genesis" network upgrade

993532b

We can use this network upgrade to implement different consensus rules and chain context handling for genesis blocks. Part of the chain state design in #682.

teor2345 closed this as completed Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chain State Design #682

Chain State Design #682

teor2345 commented Jul 18, 2020 •

edited

Loading

yaahc commented Jul 18, 2020

yaahc commented Jul 18, 2020

teor2345 commented Jul 19, 2020

yaahc commented Jul 19, 2020

teor2345 commented Jul 19, 2020 •

edited

Loading

teor2345 commented Aug 7, 2020

Chain State Design #682

Chain State Design #682

Comments

teor2345 commented Jul 18, 2020 • edited Loading

yaahc commented Jul 18, 2020

yaahc commented Jul 18, 2020

teor2345 commented Jul 19, 2020

yaahc commented Jul 19, 2020

teor2345 commented Jul 19, 2020 • edited Loading

teor2345 commented Aug 7, 2020

teor2345 commented Jul 18, 2020 •

edited

Loading

teor2345 commented Jul 19, 2020 •

edited

Loading