Doubts about the monolithic BeaconState #582

protolambda · 2019-02-07T20:37:04Z

Doubts about the monolithic BeaconState

So, I'm having doubts about the current BeaconState approach in the spec.

I am not so familiar with the status-quo, outside of the spec. But there is not enough resources and too many questions popping up when implementing my own beacon-chain simulation/client to not start a good discussion on this.

What's the reasoning behind designing Eth 2.0 based on a single object that carries all the different state information, besides ease in computing a state_root (i.e. serialize everything and hash it)?

Wouldn't it make more sense to store things based on usage category, change interval, and special needs?

Because of the current spec-design, many clients just implemented state as a single object that is read/written for every block, or slot. As far as I know at least, please point me to clients that have different approaches (and the respective commit / branch).
All I'm finding is one-go serialization with protobufs etc., is this really affordable enough to afford the monolithic simplicity?

Some of these state variables don't even change between slots or blocks, so why repeatedly store them? Would a reference to the last change of some category of sub-state not be enough?

And I would expect things like validator balances to be much more similar to the Eth 1.0 design, with a separate merkle-tree. What's the rationale here?

Excuse me for my ignorance/lack of google fu if this is already worked out somewhere.
Having doubts about something is the start of knowledge, and documenting the rationale of current choices (and possibly improving) is something I think is very important. For all folks that want to contribute, and as a reference during future discussions about rationale in state design.

The problem

Big state size by design
- Long serialize time
- Long deserialize time
- The lifecycle is easy, but operations take longer. Retrieving, hashing (state root!), and storing are slower.
No encapsulation by design
- A simple API to query a balance would already require you to deserialize the full state.
- It is harder to verify changes when they are not isolated
- Adding new state while being backwards compatible may be harder without encapsulation guidelines/rules in the design.
- Encapsulation based on write-interval, may be beneficial to send state over a network connection. Some of it simply doesn't need to be transferred all the time.
Duplication of data that is encouraged by design
- The easiest way to implement the spec is to simply serialize the whole state and put it in some key-value store. Many clients already do this
- Data that has not changed is stored again. Every block, or every slot even.

About serialization/deserialization/hashing: I do not know how well it works, or scales. All I see is that it can be better. See trade-offs section. Documenting the affordability of choices like these would be nice at least.

Outline of the state variables

This is the current BeaconState, with my comments on each part, to share my sense of amount of changes in the state, and alternatives. If you have suggestions/corrections, please share.

    # Misc
    'slot': 'uint64'

Derived from block storage.

    'genesis_time': 'uint64',

Stored only once, loaded on initalization

    'fork': Fork,  # For versioning hard forks

Small data, can be repeated, but could also just be a reference to fork-data, stored elsewhere.

    # Validator registry
    'validator_registry': [Validator],

Can be VERY large in worst case. As in 4M entries. Each with pub-key data. And the only thing that changes every now and then is epochs. Can't we separate activation/exit/withdrawal/penalized epochs + status flags from the identification data, similar to separation of account-data in Eth1? (i.e. the state trie) And then just store a list of participating validators per-epoch. One could even keep a separate lookup table of pubkey<->numerical validator ID, if you really want to reduce storage in a long-term worst-case scenario.

    'validator_balances': ['uint64'],

Like above, also very large, and continuously changing, I would expect balances to be in a state-trie.


    'validator_registry_update_epoch': 'uint64',

Something small, no problem

    # Randomness and committees
    'latest_randao_mixes': ['bytes32'],
    'previous_epoch_start_shard': 'uint64',
    'current_epoch_start_shard': 'uint64',
    'previous_calculation_epoch': 'uint64',
    'current_calculation_epoch': 'uint64',
    'previous_epoch_seed': 'bytes32',
    'current_epoch_seed': 'bytes32',

All this RANDAO data only changes each epoch, just store this in an per-epoch state? Keyed by the first block in the epoch, that introduces the changes?

    # Finality
    'previous_justified_epoch': 'uint64',
    'justified_epoch': 'uint64',
    'justification_bitfield': 'uint64',
    'finalized_epoch': 'uint64',

Also purely per-epoch written data.

    # Recent state
    'latest_crosslinks': [Crosslink],

Changes every epoch, can be included with finality data above.

    'latest_block_roots': ['bytes32'],

Changes every slot, there is LATEST_BLOCK_ROOTS_LENGTH of them. 32 * 8,192 = 1/4 MB. Doubts here.

    'latest_index_roots': ['bytes32'],

Similar to block roots, current spec only changes it per epoch, but this may become per-block if we start supporting introduction of new validators during an epoch.

    'latest_penalized_balances': ['uint64'],  # Balances penalized at every withdrawal period

Something that changes every block. Just 8,192 numbers, but we could also just save 3, per block: the previous, current and last epoch number. And then trail the remainder every epoch transition, similar to what happens now.

    'latest_attestations': [PendingAttestation],

Updated every block (if it includes attestations). Only used to filter through, and derive current_epoch_attestations and previous_epoch_attestations. These may have been included in a later epoch however. The naming refers to the epoch that the attestation itself attests in, "pending" being that it needs to be checked for slashing. This could be stored per-epoch. An alternative to would be a DB that: indexes based on attestation-slot and inclusion-epoch-root. One can just query the attestations necessary for the epoch transition.

    'batched_block_roots': ['bytes32'],

Changes every LATEST_BLOCK_ROOTS_LENGTH slots. Can easily be in its own storage, and ignored. It's not used anywhere in the spec, and only serves purposes like syncing/verification to start a client.

    # Ethereum 1.0 chain data
    'latest_eth1_data': Eth1Data,

Even less than per epoch, it changes every ETH1_DATA_VOTING_PERIOD (16) epochs, but continuously read.

    'eth1_data_votes': [Eth1DataVote],

Only needs to be stored for unfinalized blocks (?) given the nature of the

Trade-offs

The big question before we change the state: Is the current monolithic simplicity warranted?

Do we finalize and throw away data quick enough to not care?
Is the state still small enough for hashing it fully, every slot?
What's the current freedom in constructing the state_root? Do we just cache hashes for unchanged data portions?

Alternative

I do not think an alternative has to be very complex, but there's a lot of room for optimization.

Easy

A good first step would be to simply split it up in a BeaconEpochState and a BeaconSlotState.
With the slot state referencing the epoch state.

The BeaconEpochState can be split up in more parts, but each could change each epoch, so outside of encapsulation, and hashing, there is not so much need to consider this much in the spec. If epoch-state hashing is defined by hashing the roots of all subcategories together category, bundling and storing related data would be easier however.

Note that this could make the block/slot-transition much faster, as you would only work on a subset of memory, and deal with less serialization/hashing. And it's not far from current state of things.

Medium

Split off the validator registry, and validator balances.

Create a literal validator-registry with static properties of validators, i.e. their pubkey, withdrawal_credentials and whatnot. Call this the ValidatorRegistry. This data-structure could be optmized for insertions/deletions, changes don't have to be considered (afaik.)
Create a something similar to the "state-trie" in Eth1 (the one that carries nonces and balances), call it the ValidatorTrie that carries the dynamic validator data: entry/exit/etc. epochs, status flags, and balance.

The ValidatorRegistry could be referenced in the epoch-state, or in the slot-state if we're adding/removing validators more often.
The ValidatorTrie could be referenced in the slot-state. This has all the data that changes often, and what is immediately affected when a validator is kicked, not the validator registry.

Hard

Attestations

An attestation DB that:

Abstracts away aggregation, to insert any collection of attestations, and lets you query total weight for a given target. (fork-choice)
Can be queried for attestations for a given slot, included at a given slot. (for slashing)

Block roots

Separate batched block-roots, and do this in in a tree-structure to account for forks if finalization before LATEST_BLOCK_ROOTS_LENGTH slots is not possible.
The pointer to the top of the batched roots list would be included in the state.

Pro: small state, easy to update and hash.

Con: batched block roots not included in state anymore. Do we really need to force them to be that available to anyone that wants to hash the state?

The text was updated successfully, but these errors were encountered:

mratsim · 2019-02-07T20:52:37Z

This is a concern that the Nimbus team shares and actually discussed today with @djrtwo, using the terminology God object.

protolambda · 2019-02-07T20:53:37Z

@mratsim Good to hear I'm not the only one. Can I find this discussion anywhere?

mratsim · 2019-02-07T20:56:24Z

It was a project sync call unfortunately.

benjaminion · 2019-02-07T21:13:33Z

What's the current freedom in constructing the state_root? Do we just cache hashes for unchanged data portions?

Yes. This is the point of the SSZ tree_hash algorithm.

AIUI, specifiying the state in one place just ring-fences the consensus critical parts in the spec. Implementers are free to optimise this in any way they see fit.

[Historical note, there used to be two states: crystallized_state and active_state for all the reasons you mention. See #122 for the thinking behind unifying the state. It is tree_hash that makes this practical.]

djrtwo · 2019-02-07T21:37:12Z

The short answer here is that SSZ hash tree does separate and encapsulate different objects by default. It allows a blank slate for clients to store/cache/update/hash in any number of ways. State objects can easily be stored locally as diffs with caches of hashes that prevent rehashing the majority of the state when calculating the state_root.

I see the argument for pulling out portions of the state into separate independent objects that all need to be committed to in a beacon block for the sake of cleanliness, but again, the SSZ tree hash natively handles this via commitment to one root.

All this RANDAO data only changes each epoch, just store this in an per-epoch state? Keyed by the first block in the epoch, that introduces the changes?

The goal of the state transition function is that it satisfies state_transition(prev_state, block) -> post_state where prev_state is committed to in the parent block. The more we pull things out into their own separate commitment, prev_state just becomes a tuple (epoch_state, active_state, validators, validator_balances, validator_epochs, etc) which arguably the BeaconState when using the tree hash already is.

Changes every LATEST_BLOCK_ROOTS_LENGTH slots. Can easily be in its own storage, and ignored. It's not used anywhere in the spec, and only serves purposes like syncing/verification to start a client.

Including info such as this directly committed to in the state allows for a host of techniques in serving light clients. The more we move things out into a separate DB, the less able we are to serve required info based upon the state_root to facilitate light syncing and validation.

Is the state still small enough for hashing it fully, every slot?

SSZ tree hash was designed for this purpose. Most of the state does not change. Hashes need only be recomputed for the portion of the state that changes (small).

What's the current freedom in constructing the state_root? Do we just cache hashes for unchanged data portions?

A client must only be able to compute the state root as the same value as other clients. They can (and should) make any number of design decisions locally to separate concerns, encapsulate, optimize, etc.

Easy

A good first step would be to simply split it up in a BeaconEpochState and a BeaconSlotState.
With the slot state referencing the epoch state.

We used to have this as ActiveState and CrystallizedState. Both roots were committed to at each block. I remember when we unified that Prysmatic Labs removed thousands of lines from their codebase. That said, a client can divide these things out as they see fit as long as they can still compute valid SSZ state roots.

Note that this could make the block/slot-transition much faster, as you would only work on a subset of memory, and deal with less serialization/hashing. And it's not far from current state of things.

With SSZ tree hash with caching, there is not a measurable reduction in serialization/hashing.

Also, regardless of how you slice it, a client is going to have to store the bulk of the current state (validator registry) in memory to validate attestations and block signatures on a per slot basis.

Medium

Same arguments as above.

Hard

Attestations

An attestation DB outside of the state consensus root sounds like a fine path for a client implementation, but this now mixes/aggregates attestations that have been included on chain (thus part of the state transition function and contributing to fork choice) with those that are not yet on chain (contributing to fork choice).

Block roots

These are committed to in the state_root to better serve light clients. Even if you pull them out, the root would still need to be committed to in the block for servability.

protolambda · 2019-02-07T22:16:11Z

CrystallizedState: yes, I know what it is and heard about the changes you went through, I was not involved at all however. The complexity reduction is an overstatement in my opinion, as it introduces a lot of new questions, harder to get into when you're not so familiar with SSZ, or your chosen platform does not work as well with the requirements that come with this.

Attestations: agree on separation of attestation types, a DB could still facilitate it, but probably better to handle in client-implementation if the spec is full "SSZ everything" anyway.

Block roots: ok, argumentation for light-clients is understandable.

protolambda · 2019-02-07T22:29:26Z

@djrtwo
I'm amazed that SSZ + tree hashing tackles so many problems, much more than I thought it was good for, but now the open questions are:

Where can I find info on networking? Does the monolithic approach force us to work out a separate different structure/bundling for efficient communication of non-duplicate data, or do we effectively just send diffs of SSZ output?
Where can I find notes on design patterns with SSZ, is implementation of caching and diffs really that easy?

vbuterin · 2019-02-08T00:53:14Z

Where can I find notes on design patterns with SSZ, is implementation of caching and diffs really that easy?

Here's my implementation of an SSZ list object that can compute updated Merkle roots in real time: https://github.com/ethereum/research/blob/master/ssz_hashable_list/hashable_list.py

That's pretty much the only really nontrivial part that needs to be optimized. That implementation is missing one optimization for making k updates in O(k + k * log(n / k)) time instead of O(k * log(n)) time, but that's relatively minor and may give you a 1.5x improvement at most.

vbuterin · 2019-02-08T00:59:00Z

Where can I find info on networking? Does the monolithic approach force us to work out a separate different structure/bundling for efficient communication of non-duplicate data, or do we effectively just send diffs of SSZ output?

When do you need to send state diffs?

protolambda · 2019-02-09T01:02:00Z

Was thinking more about what it takes to communicate state with other clients. But attestations/blocks would be sufficient for it to run. The monolithic approach doesn't make encapsulation easy, but with the proposed optimal usage of SSZ (caching encoding and hashes, storing it as diffs, tracking changes), it works well enough.

After my work on LMD-GHOST execution I'll start looking into these SSZ patterns, maybe it can be generalized enough to get a good storage/processing flow going. And with this, and previous work on LMD-GHOST and attestation batching, previous work on structuring (schematic) and prototyping the lifecycle (hackathon client), I may just have enough info and intuitions to get a new beacon-chain client going :)

protolambda mentioned this issue Feb 7, 2019

Research on optimal LMD-GHOST implementation: need help with network topology/expectations #570

Closed

protolambda closed this as completed Feb 9, 2019

This was referenced Feb 11, 2019

Hash tree memoization ethereum/py-ssz#39

Open

Rationale for RLP alternatives in Discovery v5? ethresearch/p2p#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubts about the monolithic BeaconState #582

Doubts about the monolithic BeaconState #582

protolambda commented Feb 7, 2019 •

edited

Loading

mratsim commented Feb 7, 2019

protolambda commented Feb 7, 2019

mratsim commented Feb 7, 2019

benjaminion commented Feb 7, 2019

djrtwo commented Feb 7, 2019

protolambda commented Feb 7, 2019

protolambda commented Feb 7, 2019

vbuterin commented Feb 8, 2019

vbuterin commented Feb 8, 2019

protolambda commented Feb 9, 2019

Doubts about the monolithic BeaconState #582

Doubts about the monolithic BeaconState #582

Comments

protolambda commented Feb 7, 2019 • edited Loading

Doubts about the monolithic BeaconState

The problem

Outline of the state variables

Trade-offs

Alternative

Easy

Medium

Hard

Attestations

Block roots

mratsim commented Feb 7, 2019

protolambda commented Feb 7, 2019

mratsim commented Feb 7, 2019

benjaminion commented Feb 7, 2019

djrtwo commented Feb 7, 2019

Easy

Medium

Hard

Attestations

Block roots

protolambda commented Feb 7, 2019

protolambda commented Feb 7, 2019

vbuterin commented Feb 8, 2019

vbuterin commented Feb 8, 2019

protolambda commented Feb 9, 2019

protolambda commented Feb 7, 2019 •

edited

Loading