Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Historical batches #2649

Merged
merged 2 commits into from
Jan 3, 2023
Merged

Historical batches #2649

merged 2 commits into from
Jan 3, 2023

Conversation

arnetheduck
Copy link
Contributor

This PR, a continuation of
#2428, simplifies and
replaces historical_roots with
historical_block_roots.

By keeping an accumulator of historical block roots in the state, it
becomes possible to validate the entire block history that led up to
that particular state without executing the transitions, and without
checking them one by one in backwards order using a parent chain.

This is interesting for archival purposes as well as when implementing
sync protocols that can verify chunks of blocks quickly, meaning they
can be downloaded in any order.

It's also useful as it provides a canonical hash by which such chunks of
blocks can be named, with a direct reference in the state.

In this PR, historical_roots is frozen at its current value and
historical_batches are computed from the merge epoch onwards.

After this PR, block_batch_root in the state can be used to verify an
era of blocks against the state with a simple root check.

The historical_roots values on the other hand can be used to verify
that a constant distributed with clients is valid for a particular
state, and therefore extends the block validation all the way back to
genesis without backfilling block_batch_root and without introducing
any new security assumptions in the client.

As far as naming goes, it's convenient to talk about an "era" being 8192
slots ~= 1.14 days. The 8192 number comes from the
SLOTS_PER_HISTORICAL_ROOT constant.

With multiple easily verifable blocks in a file, it becomes trivial to
offload block history to out-of-protocol transfer methods (bittorrent /
ftp / whatever) - including execution payloads, paving the way for a
future in which clients purge block history in p2p.

This PR can be applied along with the merge which simplifies payload
distribution from the get-go. Both execution and consensus clients
benefit because from the merge onwards, they both need to be able to
supply ranges of blocks in the sync protocol from what effectively is
"cold storage".

Another possibility is to include it in a future cleanup PR - this
complicates the "cold storage" mode above by not covering exection
payloads from start.

This PR, a continuation of
replaces `historical_roots` with
`historical_block_roots`.

By keeping an accumulator of historical block roots in the state, it
becomes possible to validate the entire block history that led up to
that particular state without executing the transitions, and without
checking them one by one in backwards order using a parent chain.

This is interesting for archival purposes as well as when implementing
sync protocols that can verify chunks of blocks quickly, meaning they
can be downloaded in any order.

It's also useful as it provides a canonical hash by which such chunks of
blocks can be named, with a direct reference in the state.

In this PR, `historical_roots` is frozen at its current value and
`historical_batches` are computed from the merge epoch onwards.

After this PR, `block_batch_root` in the state can be used to verify an
era of blocks against the state with a simple root check.

The `historical_roots` values on the other hand can be used to verify
that a constant distributed with clients is valid for a particular
state, and therefore extends the block validation all the way back to
genesis without backfilling `block_batch_root` and without introducing
any new security assumptions in the client.

As far as naming goes, it's convenient to talk about an "era" being 8192
slots ~= 1.14 days. The 8192 number comes from the
SLOTS_PER_HISTORICAL_ROOT constant.

With multiple easily verifable blocks in a file, it becomes trivial to
offload block history to out-of-protocol transfer methods (bittorrent /
ftp / whatever) - including execution payloads, paving the way for a
future in which clients purge block history in p2p.

This PR can be applied along with the merge which simplifies payload
distribution from the get-go. Both execution and consensus clients
benefit because from the merge onwards, they both need to be able to
supply ranges of blocks in the sync protocol from what effectively is
"cold storage".

Another possibility is to include it in a future cleanup PR - this
complicates the "cold storage" mode above by not covering exection
payloads from start.
avoids changing "header" fields in state
@@ -213,7 +227,7 @@ class BeaconState(Container):
latest_block_header: BeaconBlockHeader
block_roots: Vector[Root, SLOTS_PER_HISTORICAL_ROOT]
state_roots: Vector[Root, SLOTS_PER_HISTORICAL_ROOT]
historical_roots: List[Root, HISTORICAL_ROOTS_LIMIT]
historical_roots: List[Root, HISTORICAL_ROOTS_LIMIT] # Frozen in Merge, replaced by historical_batches
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
historical_roots: List[Root, HISTORICAL_ROOTS_LIMIT] # Frozen in Merge, replaced by historical_batches
historical_roots: List[Root, HISTORICAL_ROOTS_LIMIT] # Frozen in Capella, replaced by historical_batches

@ralexstokes
Copy link
Member

while this is a strict decrease in UX (more data to wrangle, process), we could not change the consensus protocol and simply supply merkle branches along w/ the block roots in the "era" format and still retain the verification property

@hwwhww hwwhww added the Capella label Dec 15, 2022
@arnetheduck
Copy link
Contributor Author

arnetheduck commented Dec 15, 2022

, we could not change the consensus protocol

Yes, though at that point I think we seriously need to consider what the historical_roots field is doing here at all - ie the status quo is the worst of both worlds: we have an accumulator that grows forever and that can't really be used for anything useful without jumping through hoops. The question "does this block belong" is a fundamental one and this PR brings the cost of answering that from O(N) to O(1) basically - mixing the state in there is not fundamental because the state is a derivative of the block except when nothing happened, ie the only raison d'etre for historical roots in its current shape is to show that something did not happen at the tail of a block history (ie to prove that the empty state transition was done correctly) - everything else is already baked into the block root as far has "accumulation" goes.

the "era" format

also, the "era" format doesn't actually need this PR - a design decision in "era" was to include a state for ever "epoch" which lines up with a historical root - in the era file, the individual block roots of each block in that era are natively available from the state - in era files, this makes sense because each era file can then serve as a "starting point" to compute an arbitrary state in the next era (again bringing the cost of computing an arbitrary beacon chain state from O(n) to O(1)), but it comes at a cost: we have to store a state every day.

This PR unlocks distinct use cases compared to what era files solve (in particular, a single state is enough to verify all history, instead of one per era).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Capella general:RFC Request for Comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants