Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise the summary and motivation, add high level flow. #527

Merged
merged 2 commits into from
Jan 26, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 76 additions & 27 deletions neps/nep-0509.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
NEP: 509
Title: Stateless validation stage 0
Title: Stateless validation Stage 0
Authors: Robin Cheng, Anton Puhach, Alex Logunov, Yoon Hong
Status: Draft
DiscussionsTo: https://docs.google.com/document/d/1C-w4FNeXl8ZMd_Z_YxOf30XA1JM6eMDp5Nf3N-zzNWU/edit?usp=sharing, https://docs.google.com/document/d/1TzMENFGYjwc2g5A3Yf4zilvBwuYJufsUQJwRjXGb9Xc/edit?usp=sharing
Expand All @@ -12,58 +12,107 @@ LastUpdated: 2023-09-19

## Summary

The NEP proposes an solution to achieve phase 2 of sharding (where none of the validators needs to track all shards), with stateless validation while not relying of the traditional approach of fraud proof and state rollback.
The NEP proposes an solution to achieve phase 2 of sharding (where none of the validators needs to track all shards), with stateless validation, instead of the traditionally proposed approach of fraud proof and state rollback.

The fundamental idea is that validators do not need to have state locally to validate chunks.
* Under stateless validation, responsibility of a chunk producer extends to package transactions and receipts and annotate them with state witnesses (we will call them with the new role name, chunk proposers).
* State witness of a chunk is defined to be the state alongside proof of inclusion that is needed to execute a chunk. State witness allows anyone to execute a chunk without having the state of a shard locally.
* Then, validators will not store state locally and be randomly assigned to a shard at a given block height. Once a validator receives a chunk, along with its state witnesses, it verifies the state transition of the chunk, signs an approval, and sends it to the next block producer, similar to how it works today.
* Under stateless validation, the responsibility of a chunk producer extends to packaging transactions and receipts and annotating them with state witnesses. This extended role will be called "chunk proposers".
* The state witness of a chunk is defined to be a subset of the trie state, alongside its proof of inclusion in the trie, that is needed to execute a chunk. A state witness allows anyone to execute the chunk without having the state of its shard locally.
* * Then, at each block height, validators will be randomly assigned to a shard, to validate the state witness for that shard. Once a validator receives both a chunk and its state witness, it verifies the state transition of the chunk, signs a chunk endorsement and sends it to the block producer. This is similar to, but separate from, block approvals and consensus.
* The block producer waits for sufficient chunk endorsements before including a chunk into the block it produces, or omits the chunk if not enough endorsements arrive in time.

## Motivation

As phase 1 of sharding requires block producers to track all shards due to underlying security concerns, the team explored potential ways to achieve phase 2 of sharding, where none of the validators has to track all shards.

Initial design of phase 2 relied on the security assumption that as long as there is one honest validator or fisherman tracking a shard, the shard is secure; by doing so, it naturally relied on protocol's ability to challenge (when an honest validator or fisherman detects a malicious behavior), rollback (when validators agree that the submitted challenge is valid), slashing (to punish the malicious validator), and rewarding (for chllenger). While it sounds straigtforward and simple on paper, the detailed implication and implementation of these features turned out to be extremely complicated.
The early design of phase 2 relied on the security assumption that as long as there is one honest validator or fisherman tracking a shard, the shard is secure; by doing so, it naturally relied on protocol's ability to handle challenges (when an honest validator or fisherman detects a malicious behavior and submits a proof of such), state rollbacks (when validators agree that the submitted challenge is valid), and slashing (to punish the malicious validator). While it sounds straightforward and simple on paper, the complex interactions between these abilities and the rest of the protocol led to concrete designs that were extremely complicated, involving several specific problems we still don't know how to solve.

As a result, the team sought alternative approach and concluded that stateless validation is the most realistic and promising one; stateless validation approach does not assume the existence of fishermen and assumes that a shard is secure if every single chunk in that shard is validated by a randomly sampled subset of all validators.
As a result, the team sought alternative approaches and concluded that stateless validation is the most realistic and promising one; the stateless validation approach does not assume the existence of a fishermen, does not rely on challenges, and never rolls back state. Instead, it relies on the assumption that a shard is secure if every single chunk in that shard is validated by a randomly sampled subset of all validators, to always produce valid chunks in the first place.

## Specification

### Assumptions
* In memory trie is enabled - [REF](https://docs.google.com/document/d/1_X2z6CZbIsL68PiFvyrasjRdvKA_uucyIaDURziiH2U/edit?usp=sharing)
* State sync is enabled
* Merkle Patricia Trie is underlying data structure
* State sync is enabled (so that nodes can track different shards across epochs)
* Merkle Patricia Trie continues to be the state trie implementation
* TBD

### High level requirements
* Block processing time should not take more than what it takes today.
* Additional load on network and node should not affect other functionalities.
* Security of protocol must not degrade.
* Validator assignment for both chunk validation and block validation should not create any security vulnerability.
* Validators should be rewarded with the same amount as they are now.
* No validator needs to track all shards.
* Only chunk producers need to maintain state locally.
* State witness size should be small enough to delivered over network.
* Majority of undercharging issues should be mitigated.
* Validator roles are updated accordingly to reflect the necessary changes with stateless validation.
* [TBD] Resharding should work as expected after stateless validation in place.
* Security of protocol must not degrade.
* Validator assignment for both chunk validation and block validation should not create any security vulnerabilities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the old point "Moreover, majority of undercharging issues should be mitigated." deserves a point here. Because it is not tracking all shards what allows in-memory trie to be used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we are removing the following statement?

Only chunk producers need to maintain state locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mostly technicalities :)

I removed the undercharging point because it's not a "high level requirement". It's more of a good-to-have side effect. It's not a requirement for stateless validation to solve "a majority of undercharging issues" (which isn't even well defined).

For "only chunk producers need to maintain state locally", I also don't think this is a requirement. Even if everybody is required to track some state (and it's not all the state because there is already a point saying no validators needs to track all shards), stateless validation is still successful. If we're talking about the desire that chunk validators who don't also act as chunk producers do not need to track any shard at all, I think the motivation there is that they don't need to have beefy machines to track any state, but that is also covered by the point "Any additional load on network and compute should not negatively affect existing functionalities of any node in the blockchain", and that the cost of these should be acceptable. Also, "only chunk producers need to maintain state locally" seems to be a particular solution rather than a requirement, as we're also changing the definition of chunk producers.

* Block processing time should not take significantly more than what it takes today.
* Any additional load on network and compute should not negatively affect existing functionalities of any node in the blockchain.
* The cost of additional network and compute should be acceptable.
* Validator rewards should not be reduced.
* Resharding should still be possible after stateless validation is in place.
* TBD

### Out of scope
* Data compression of both chunk data and state witness
* Separation of consensus and execution: we can run consensus independently from execution and let chunk producers to provide state witnesses to validators once every X blocks. This helps reduce both the amortized size of state witness and pressure on the p2p network. Validators send attestations to the state witness they receive and attestations are aggregated onchain as a finality gadget for state.
* More shards
* ZK integration
* Underlying data structure change (e.g. verkle tree)
* Validator reward change
* Data size optimizations such as compression, for both chunk data and state witnesses, except basic optimizations that are practically necessary.
* Separation of consensus and execution, where consensus runs independently from execution, and validators asynchronously perform state transitions after the transactions are proposed on the consensus layer, for the purpose of amortizing the computation and network transfer time.
* More shards - this is covered in the resharding project.
* ZK integration.
* Underlying data structure change (e.g. verkle tree).
* Change to validator rewards.
* TBD

## High level flow

TBD. [Explain the proposal as if you were teaching it to another developer. This generally means describing the syntax and semantics, naming new concepts, and providing clear examples. The specification needs to include sufficient detail to allow interoperable implementations getting built by following only the provided specification. In cases where it is infeasible to specify all implementation details upfront, broadly describe what they are.]
We propose a change to the following parts of the chunk and block production flow:

* When a chunk producer produces a chunk, in addition to collecting transactions and receipts for the chunk, it will also produce a `ChunkStateWitness`.
* The `ChunkStateWitness` contains whatever data necessary to prove that this chunk's header should indeed be what is being produced:
* As it is today, all fields of the `ShardChunkHeaderInnerV2`, except `tx_root`, are uniquely determined by the blockchain's history based on where the chunk is located (i.e. its parent block and shard ID).
* The `tx_root` is based on the list of transactions proposed, which is at the discretion of the chunk producer. However, these transactions must be valid (i.e. the sender accounts have enough balance and the correct nonce, etc.).
walnut-the-cat marked this conversation as resolved.
Show resolved Hide resolved
* This `ChunkStateWitness` proves to anyone, including those who track only block data and no shards, that this chunk header is correct, meaning that the uniquely determined fields are exactly what should be expected, and the discretionary `tx_root` field corresponds to a valid set of transactions.
* The `ChunkStateWitness` is not part of the chunk itself; it is distributed separately and is considered transient data.
* The chunk producer then distributes the `ChunkStateWitness` to a subset of *Chunk Validators* assigned for this shard. This is in addition to, and independent of, the existing chunk distribution logic (implemented by `ShardsManager`) today.
* Chunk Validator is a new role described in the "Validator role change" section.
* The subset of chunk validators assigned to a shard is determined by a random shuffle, once per block. See the "Validator Shuffling" section.
* A chunk validator, upon receiving a `ChunkStateWitness`, validates the state witness and determines if the chunk header is indeed correctly produced. If so, it sends a `ChunkEndorsement` to the current block producer.
* A `ChunkEndorsement` contains the chunk hash along with a signature proving the endorsement by the chunk validator. It implicitly carries a weight equal to the amount of the chunk validator's stake that is assigned to this shard for this block. (See Chunk Validator Shuffling).
* As the existing logic is today, the block producer for this block waits until either all chunks are ready, or a timeout occurs, and then proposes a block containing whatever chunks are ready. Now, the notion of readiness here is expanded to also having more than 2/3 of chunk endorsements by weight.
* This means that if a chunk does not receive enough chunk endorsements by the timeout, it will not be included in the block. In other words, the block only contains chunks for which there is already a consensus of validity. **This is the key reason why we will no longer need challenges**.
* The 2/3 fraction has the denominator being the total stake assigned to validate this shard, *not* the total stake of all validators. See Chunk Validator Shuffling.
* The block producer, when producing the block, additionally includes the chunk endorsements (at least 2/3 needed for each chunk) in the block's body. The validity of the block is expanded to also having valid 2/3 chunk endorsements for each chunk included in the block.
* This necessitates a new block format.
* If a block fails validation because of not having the required chunk endorsements, it is considered a block validation failure for the purpose of Doomslug consensus, just like any other block validation failure. In other words, nodes will not apply the block on top of their blockchain, and (block) validators will not endorse the block.

We also propose a change to the validator roles and responsibilities. This is the list of roles after the proposal, with same and new behavior clearly labelled:

* Block producers:
* (Same as today) Produce blocks, (new) including waiting for chunk endorsements
* (Same as today) Maintain chunk parts (i.e. participates in data availability based on Reed-Solomon erasure encoding)
* (Same as today) Do not require tracking any shard
* (Same as today) Should have a higher barrier of entry for security reasons (e.g. to make block double signing harder)
* Chunk producers:
* (Same as today) Produce chunks, (new) including producing chunk state witnesses
* (New) Distributes state witnesses to chunk validators
* (Same as today) Must track the shard it produces the chunk for
* (Same as today) Rotate shards across epoch boundaries, (new) but at a lower rate (e.g. 1 week)
* Block validators:
* (Same as today) Validate blocks, (new) including verifying chunk endorsements
* (Same as today) Vote for blocks with endorsement or skip messages
* (New) No longer require tracking any shard
* (Same as today) Must collectively have a majority of all the validator stake, for security reasons.
* (New) Chunk validators:
* Validate state witnesses, and sends chunk endorsements to block producers
* Do not require tracking any shard
* Must collectively have a majority of all the validator stake, to ensure the security of chunk validation.

See the Validator Role Change section for more details.

## Chunk Validator Shuffling
Chunk validators will be randomly assigned to validate shards, for each block (or as we may decide later, for multiple blocks in a row, if required for performance reasons). A chunk validator may be assigned multiple shards at once, if it has sufficient stake.

Each chunk validator's stake is divided into "mandates". There are full and partial mandates. The amount of stake for a full mandate is a fixed parameter determined by the stake distribution of all validators, and any remaining amount smaller than a full mandate is a partial mandate. A chunk validator therefore has zero or more full mandates plus up to one partial mandate. The list of full mandates and the list of partial mandates are then separately shuffled and partitioned equally (as in, no more than one mandate in difference between any two shards) across the shards. Any mandate assigned to a shard means that the chunk validator who owns the mandate is assigned to validate that shard. Because a chunk validator may have multiple mandates, it may be assigned multiple shards to validate.

We have done research to show that the security of this algorithm is sufficient with a reasonable number of chunk validators and a reasonable number of shards, assuming a reasonable bound for the total stake of malicious nodes. TODO: Include or link to that research here.

## Reference Implementation

TODO: This is essentially going to be describing the exact structure of `ChunkStateWitness`, `ChunkEndorsement`, and describing the exact algorithm to be used for the chunk validator shuffling.

[This technical section is required for Protocol proposals but optional for other categories. A draft implementation should demonstrate a minimal implementation that assists in understanding or implementing this proposal. Explain the design in sufficient detail that:

- Its interaction with other features is clear.
Expand All @@ -74,7 +123,7 @@ TBD. [Explain the proposal as if you were teaching it to another developer. This

The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.]

## Validatior role change
## Validator Role Change
Currently, there are two different types of validators and their responsibilities are as follows:
| | Top ~50% validators | Remaining validatiors (Chunk only producers) |
|-----|:-----:|:----:|
Expand Down
Loading