Historical block/state roots #2428

arnetheduck · 2021-05-20T15:09:30Z

This PR simplifies and replaces historical_roots with
historical_block_roots.

By keeping an accumulator of historical block roots in the state, it
becomes possible to validate the entire block history that led up to a
particular state without executing the transitions.

This is interesting for archival purposes as well as when implementing
sync protocols that can then proceed to verify chunks of blocks quickly

it's also useful as it provides a canonical hash by which such chunks
of blocks can be named, with a direct reference in the state.

In order not to grow the state size further, the historical_roots
field is removed - the blocks already contain state roots, so the value
of repeating the state root in the historical accumulator is limited to
empty slots, and even then, the historical block root accumulator
differentiates states that differ by empty slots.

This is a sketch of the full PR - there are a few key decisions to make:

should the historical block roots be backfilled with phase0 data on
the transition? This simplifies future logic but could be omitted for
simplicity. It's easy to precalculate what the values should be, so it's
not a computational issue.
should historical roots remain?

As far as naming goes, it's convenient to talk about an "era" being 8192
slots ~= 1.14 days. The 8192 number comes from the SLOTS_PER_HISTORICAL_ROOT constant.

Edit: in my haste, I messed up the slot and epoch math ;) Original invalid text:

~~As far as naming goes, it's convenient to talk about an "era" being 8192
epoch = 256k slots ~= 36.4 days.~~

This PR simplifies and replaces `historical_roots` with `historical_block_roots`. By keeping an accumulator of historical block roots in the state, it becomes possible to validate the entire block history that led up to a particular state without executing the transitions. This is interesting for archival purposes as well as when implementing sync protocols that can then proceed to verify chunks of blocks quickly - it's also useful as it provides a canonical hash by which such chunks of blocks can be named, with a direct reference in the state. In order not to grow the state size further, the `historical_roots` field is removed - the blocks already contain state roots, so the value of repeating the state root in the historical accumulator is limited to empty slots, and even then, the historical block root accumulator differentiates states that differ by empty slots. This is a sketch of the full PR - there are a few key decisions to make: * should the historical block roots be backfilled with phase0 data on the transition? This simplifies future logic but could be omitted for simplicity. It's easy to precalculate what the values should be, so it's not a computational issue. * should historical roots remain? As far as naming goes, it's convenient to talk about an "era" being 8192 epoch = 256k slots ~= 36.4 days.

dankrad · 2021-05-20T16:34:34Z

I am not sure that I understand the purpose of this PR.

By keeping an accumulator of historical block roots in the state

Just to be clear, the historical block roots are already accessible right? The beacon state contains the last block root so you can access it via a Merkle proof.

, it
becomes possible to validate the entire block history that led up to a
particular state without executing the transitions.

I don't see how exactly that is possible. If you want to know that the history is valid, you still need to verify the transitions as well, even if the block headers are committed.

arnetheduck · 2021-05-20T20:09:45Z

I am not sure that I understand the purpose of this PR.

The purpose is, given a state, give clients the ability to efficiently verify that a block is part of the history that led up to that state, for the entire chain.

Just to be clear, the historical block roots are already accessible right? The beacon state contains the last block root so you can access it via a Merkle proof.

The BeaconState contains block_roots which contains the hashes of a whole bunch of blocks (8K of them) - this is actually one of the biggest contributors to the state size outside of the validator set, except for randao. In addition, it contains historical_roots which ever time block_roots wraps around creates a merkle tree of block_roots and state_roots combined. This unfortunately is not very useful: the state roots are not available when you receive a batch of blocks to verify - in theory, you can recover most of them from the blocks but not those where the block is missing. This PR makes the historical roots more useful by chaning it to be the merkle root of the blocks alone - this way, one can simply line up the blocks, compute their root and check it against the state (later, one also has to perform a signature verification since the signature is not part of the root stored in the state).

If you want to know that the history is valid, you still need to verify the transitions as well,

The assumption is that we know that a particular state is valid (for example because the user said so), and we now want to quickly verify each block. This is possible to do for the last 8192 blocks - this PR allows it to be done for all blocks without applying the state transition.

Notably, applying the state transition is slow sometimes - when there are deposits, the signature of each deposit must be verified which significantly slows down block application. In addition one must also compute the block root, so this PR represents a strict improvement over applying the blocks one by one.

dankrad · 2021-05-20T20:35:15Z

The purpose is, given a state, give clients the ability to efficiently verify that a block is part of the history that led up to that state, for the entire chain.

So you want a data structure that contains the latest block roots as a list. This could be easily done outside consensus, right?
Though I can see the justification of doing it in consensus if there is no downside to changing it to block roots instead of state roots.

The BeaconState contains block_roots which contains the hashes of a whole bunch of blocks (8K of them)

It also contains latest_block_header, which does immediately what you want.

The assumption is that we know that a particular state is valid (for example because the user said so), and we now want to quickly verify each block. This is possible to do for the last 8192 blocks - this PR allows it to be done for all blocks without applying the state transition.

Right, this seems like a good justification. So basically it could allow us faster weak subjectivity syncs?

As long as there is no good use case that needs the states, instead of block roots, to be easily accessible, then it sounds sensible to do this.

ajsutton · 2021-05-20T20:58:58Z

I'd say that if we make this change (and it does seem sensible) we should backfill the data so we have complete history rather than missing the start of the chain.

I suspect that would make it a two hard fork process. First add the new field without history then we can calculate offline the data to backfill and add it in a second fork. The old field would likely be removed in the second fork so that there's no loss of functionality during the transition (not sure anything is using the current roots though). The key thing is to avoid requiring clients to trawl back through old blocks to create the history as the fork activates and ideally keep that complexity out of client code entirely.

arnetheduck · 2021-05-21T06:20:35Z

It also contains latest_block_header, which does immediately what you want.

I'm unsure how to apply the latest block header in this scenario. It can indeed be used to compute the latest block root, and with it the genesis state and all blocks I can compute the full merkle tree - it does not give direct information about "intermediate" blocks however. With this proposal, we can verify groups of 8k blocks at a time without any further processing, without adding more information to the state than is there already (we're removing some information from the historical roots merkle tree that is duplicated and dirties it).

In more detail, the problem is the following:

Given a recent state, I want to verify whether some blocks that I have were part if the history that produced it (for example because I'm receiving them from an untrusted source). To do this currently, I have to take the genesis state and compute the state roots one by one by applying the blocks - my recent state, although it contains the field historical_roots does not help me validate the blocks without performing the transitions. The way clients work around this problem is that they store snapshots of the state along the way so that they can "replay" a shorter history.

The other possibility that exists today is to walk the chain of block roots backwards - this is also tricky because it still requires an unbroken chain of blocks from the state to the block I want to verify - it is possible, but the cost of doing this operation keeps growing.

So basically it could allow us faster weak subjectivity syncs?

Yes. Essentially for free, compared to the status quo - actually, it makes the state transition function a tiny bit cheaper. The downside is the additional fork complexity along with the fact that we're changing the beacon state definition - the latter is already happening, so the infrastructure is already there.

The key thing is to avoid requiring clients to trawl back through old blocks to create the history as the fork activates and ideally keep that complexity out of client code entirely.

I think this may perhaps be less of a problem than it seems - for the last era, we have all the necessary roots in the state already, so it's a matter of preparing the rest of the list ahead of time. I agree it's slightly "tight" in the sense that there's only 1 day of data in the state, but nonetheless, the correctness of the block root list can be verified on the day of the fork, ahead of the fork.

I'm curious how a two-fork solution would work - do you mean a quick succession of forks with a followup "filler" 2 weeks later or waiting for the next big one?

ajsutton · 2021-05-21T08:29:10Z

I'm curious how a two-fork solution would work - do you mean a quick succession of forks with a followup "filler" 2 weeks later or waiting for the next big one?

Either way really. Usually it's not an immediate fork because you want a bit of time to prepare things and get client releases out. So probably actually at least a month apart. Depends how pressing it is to have the backfill done - if the existing historical_roots is still there and functioning then it's probably not a rush to complete the migration. Also depends on what the schedule of forks would be otherwise I guess.

The other possibility that exists today is to walk the chain of block roots backwards - this is also tricky because it still requires an unbroken chain of blocks from the state to the block I want to verify - it is possible, but the cost of doing this operation keeps growing.

It's worth noting that the most expensive part of backfill is to verifying the signatures which would still be required because they aren't included in the block root. Compared to that the cost of walking backwards to verify block roots is pretty much irrelevant. But the point about requiring an unbroken chain of blocks is definitely a good one - it would be nice to verify each batch as it arrives even if it's out of order.

Even nicer would be if you could just request a historical root hash from something like IPFS and get the block content back - sadly that would require IPFS to use SSZ hashing which isn't the case. But the simpler the mapping from historical root to actual block content is the more likely verifications like that will be feasible in a system that distributes the actual blocks.

arnetheduck · 2021-05-21T08:37:31Z

if the existing historical_roots is still there and functioning

They're quite useless, for this purpose. I'm not sure they're useful for anything at all, outside of mixing in a summary of past states into the current state hash.

It's worth noting that the most expensive part of backfill is to verifying the signatures which would still be required because they aren't included in the block root.

Indeed, but we have a few tricks up our sleeve here - in particular, signature batch verification will help quite a bit as well, and the low-cost hash check serves as a sanity check that we're not getting complete junk.

Even nicer would be if you could just request a historical root hash from something like IPFS and get the block content back

There's a low-fi solution here actually: just name the file with the historical block root :) If this gets through, the next step is to establish a standard for that - we're well on our way to implement something like this: https://github.com/status-im/nimbus-eth2/blob/unstable/docs/e2store.md#era-files

These era files would serve as archive storage of past-wss-state-and-blocks - with the new roots proposed in this PR, it's trivial to distribute these files over any medium and they contain everything needed to get started, serve blocks, etc.

* enables proving the state for empty slots * allows verifying the backfilled fields against the previous `historical_roots` field * allows trivially computing `historical_roots`, for any existing use cases

protolambda · 2021-05-24T18:07:15Z

The separate block root accumulator would be great for sync, you have my support there. However, there are some things I like to address for state-roots.

Removing the old merged block-state-roots accumulator would make access to state roots considerably worse:

The state root of gap slots is not accumulated in history (only in the buffer of last batch), only state-roots of filled blocks are accessible
State roots would need to be accessed through block roots: thus relying on the header format to never change (unlikely to change, but an ugly dependency), and making the merkle proof longer. If we introduce shards with L2 that access historical beacon states, this needs to be clean.

The historical accumulator is not that large:

current mainnet slot (after ~6 months of beacon chain): 1254492
SLOTS_PER_HISTORICAL_ROOT = 8192
Batches so far: 1254492 / 8192 = ~ 153, each 32 bytes, forever. About 5 KB

So I suggest we just keep the accumulation of state-roots around, next to the new block roots accumulator.

So how will the backfill work? We either access 1254492+ database entries, and hash them all (long slow fork upgrade!), or... we just embed ~153 (probably 160 - 170 around altair release, but ok) hashes into the client release, and then ideally verify them.

And we can do better! We can make the merkle proof of past states compatible with the new state format, so it's seamless for historical merkle proof access.

How that would work:

# Define a summary of the phase0 HistoricalBatch type
class HistoricalBatchSummary(Container):
    block_batch_root: Root
    state_batch_root: Root

class BeaconState(Container):
    ...
    # redefine historical_roots to embed the summary variant, instead of root
    historical_roots: List[HistoricalBatchSummary, HISTORICAL_ROOTS_LIMIT]

# redefine the roots update to add the summary, instead of the root of the batch
def process_historical_roots_update(state: BeaconState) -> None:
    # Set historical root accumulator
    next_epoch = Epoch(get_current_epoch(state) + 1)
    if next_epoch % (SLOTS_PER_HISTORICAL_ROOT // SLOTS_PER_EPOCH) == 0:
        historical_batch = HistoricalBatchSummary(
            block_batch_root=hash_tree_root(state.block_roots),
            state_batch_root=hash_tree_root(state.state_roots))
        state.historical_roots.append(historical_batch)

# And define the fork:
def upgrade_to_altair(pre: phase0.BeaconState) -> BeaconState:
    embedded_data = open('embedded_historical_batches.ssz', 'rb').read()
    historical_batches = List[HistoricalBatchSummary, HISTORICAL_ROOTS_LIMIT].decode_bytes(embedded_data)

    assert pre.historical_roots.hash_tree_root() == historical_batches.hash_tree_root()

    epoch = phase0.get_current_epoch(pre)
    post = BeaconState(
        ...
        historical_roots=historical_batches,
        ...
    )
    ...

Why this way?

Nearly instant fork upgrade, no long DB access
Easy to verify embedded data
Merkle proof into state accumulator compatible, no future special cases
State roots stay visible, important for L2 usage (a theme with light-client update, but even more with data-only sharding)
No new functions, neat 1 line change to process_historical_roots_update

The tricky part

The one tricky thing here is embedding the right accumulator contents in the client to get ready for the hardfork.

What we can do, is update clients to keep track of the block_root, state_root pairs, which is as simple as appending them to a file, way before the altair release.
Then closer to the release, we embed the complete list. Merge those, and it should cover the full accumulator.

arnetheduck · 2021-05-24T20:51:27Z

it's seamless for historical merkle proof access.

neat :) agree!

No new functions, neat 1 line change to process_historical_roots_update

I've taken the liberty to update the names - it's easier to grep for fields and code when names are unique based on the definition - for example, we need to keep both phase0 and altair versions of the historical root update function around.

nisdas · 2021-05-25T00:36:15Z

This would be a very nice addition to the state, the current historical accumulator has limited benefits due to it being the root of the container of both the block and state root and you can't really verify either because of it. Making it a non-root and simply the container makes it more useful.

Although I am a bit worried on the fork complexity back-filling this would bring. Unlike the other changes tabled for altair, which can be done and tested in a deterministic transition. The process of backfilling will require clients to maintain this historical accumulator outside of consensus till the time of the fork and then write than into the state.

ajsutton · 2021-05-25T00:53:46Z

@protolambda I'm not sure I understand the upgrade process you're suggesting.

What we can do, is update clients to keep track of the block_root, state_root pairs, which is as simple as appending them to a file, way before the altair release.

This assumes that the clients are online and processing blocks already which isn't necessarily true. They may be starting from a state just prior to the fork epoch (reference tests do this a lot).

Then closer to the release, we embed the complete list. Merge those, and it should cover the full accumulator.

Is this assuming that we can calculate the entire backfill and specify it as a constant in the spec to use during fork activation? And that we could do that in enough time that it can then be embedded in clients and released before the update actually happens? If so, why do we need clients to record the roots ahead of time as well?

arnetheduck · 2021-05-25T06:28:47Z

complexity back-filling

We're essentially trading the one-off complexity at the fork for the complexity of not having a full history in all future releases.

I really like @protolambda's neat trick of nesting because it enables us to compute the old values from the new ones which simplifies storing and verifying them up to the point of transition.

Is this assuming that we can calculate the entire backfill and specify it as a constant in the spec to use during fork activation?

The assumption would be that clients have a "partial" constant list so as to not have to process the entire, then from the point where the pre-calculated backfill ends, the compute-and-store-while-processing logic takes over - this requires the ability to replay a bit of historical states as well in the client, in most cases, since the constant necessarily will be lagging whatever users have.

ajsutton · 2021-05-25T08:23:35Z

The assumption would be that clients have a "partial" constant list so as to not have to process the entire, then from the point where the pre-calculated backfill ends, the compute-and-store-while-processing logic takes over - this requires the ability to replay a bit of historical states as well in the client, in most cases, since the constant necessarily will be lagging whatever users have.

hmm, I think that's going to be highly problematic for Teku. By default we discard states prior to the finalized checkpoint so can't regenerate states prior to that point (unless we start from genesis again). We could load the block roots, but even then that's a very expensive operation to shove into an epoch transition or a decent chunk of complexity to pre-compute it and ensure it's ready for the epoch transition.

And I agree with Nishant that this feels like a lot of complexity to be adding into the fork transition as it's not going to be easy to test well with lots of variations of what can go wrong because of different client architectures.

I'd definitely still lean towards the two fork approach - first fork starts collecting history from the fork epoch onwards and a second fork adds the historic backfill which can then be fully calculated offline.

arnetheduck · 2021-05-25T08:36:46Z

hmm, I think that's going to be highly problematic for Teku.

Another option that was considered was a service or in-protocol RPC that that would serve the root data - ie in theory, serving the historical block and state roots could be done via a http URL since it can be fully verified against the state - this might fit some architectures better, but it's a bit fragile - a centralized service is not desireable since it creates a single point of failure, and in-protocol RPC would need a discovery mechanism to find those clients capable of serving it. Would that make a transition easier?

ajsutton · 2021-05-26T03:23:15Z

Another option that was considered was a service or in-protocol RPC that that would serve the root data - ie in theory, serving the historical block and state roots could be done via a http URL since it can be fully verified against the state - this might fit some architectures better, but it's a bit fragile - a centralized service is not desireable since it creates a single point of failure, and in-protocol RPC would need a discovery mechanism to find those clients capable of serving it. Would that make a transition easier?

Yes, we could make that work but it's an awful lot of extra complexity and the node gets knocked offline if for some reason the history can be retrieved in time.

nisdas · 2021-05-26T11:08:33Z

Another option that was considered was a service or in-protocol RPC that that would serve the root data - ie in theory, serving the historical block and state roots could be done via a http URL since it can be fully verified against the state - this might fit some architectures better, but it's a bit fragile - a centralized service is not desireable since it creates a single point of failure, and in-protocol RPC would need a discovery mechanism to find those clients capable of serving it. Would that make a transition easier?

That could work, but for a one-time fork upgrade it seems like a lot of complexity unless there is some other usecase for serving in-protocol rpc requests for historical roots. For Prysm, it would require quite a bit of work to have this ready alongside the current changes for Altair, as it will be a mostly one-off change and we would have to basically be tracking all historical block/state roots up till the fork epoch. The 2 fork approach would be easier to execute at the cost of adding more work onto a future fork.

protolambda · 2021-06-03T14:48:42Z

I am starting to be more concerned about the backfill, and the merkle-proof (just 1 root of the state-roots batch) alternative in p2p does not seem so bad.

arnetheduck · 2021-06-03T15:04:18Z

and the merkle-proof (just 1 root of the state-roots batch) alternative in p2p does not seem so bad.

who will supply it and where will they get it from?

two-fork

one could actually consider a solution where we don't go through with the backfill at all - rather, we leave the state_roots field and add the split field as a new field - what would have gone into the backfill can be kept as a constant somewhere (since it can trivially be verified against historical states at that point.

paulhauner · 2021-06-04T05:11:24Z

Weighing in on this has been hard, there's a few factors involved. Fundamentally, I think this is a good idea and that we should have done it originally (but we didn't, so I don't want to dwell on that). However, I don't think this is a good idea for Altair.

Here's my reasoning:

When we consider back-filling, this is actually rather complex. We need to get out-of-band information and put it in the state, maybe across two forks. This is rather different to anything we've collectively considered before.
This is a rather nascent idea. Whilst @arnetheduck might have been thinking about this for a while, it's only been in our collective brains for about two weeks.
We haven't yet fully spec'ed out how this works. Looking at it, it seems simple enough to just throw this into Altair and then back fill later. That being said, I generally dislike partially adding components with the intention of figuring out the rest later.
Bandwidth seems very limited among researchers and implementers at the moment. Unless there's someone willing to really champion this and make a prototype back fill implementation, I don't see this getting the forethought it deserves.
I understand the primary purpose of this is to allow easier back-filling when we get an out-of-band WS state. It seems pretty reasonable to just expect the out-of-band WS state to contain (beacon_state, historical_state_roots), where historical_state_roots is the tree_hash_root(state.state_roots) component of each HistoricalBatch; that single Root is all we're missing from doing these proofs right now. That would give us the full power of the changes in this PR, as if it were active from genesis.
Doing back fill by downloading blocks backward isn't really that bad. It's not as elegant as this solution, but I don't think this is one of the features that is going to make our users ecstatic or save our developers an outstanding amount of time.

So, the combination of all these things leads me to think that we shouldn't do this in Altair. If this feature is really that useful, I would expect clients to go with the approach of bundling historical_state_roots with their WS state initially. Then, once we deem that thing to be invaluable, we fully spec out this solution and then implement it in a future fork.

Once again, great idea and I wish we'd done it originally. I'm just not sure that now is the right time.

arnetheduck · 2021-06-04T07:14:20Z

If this feature is really that useful, I would expect clients to go with the approach of bundling historical_state_roots with their WS state initially.

While possible, I would generally think that this approach of out-of-band data is viable for a limited amount of time, but not over any longer period - ie it's hard to coordinate a community around solutions of this kind, specially if it's a client-specific feature.

For example, shipping the out-of-band state roots as a constant is not sustainable except for a constant time period, for example up to a fork - clients are already looking to discard states and blocks from before the WS period, so unless a trace of them is stored in the state, they will simply be gone - also because you can only ship constant historical state root data at the time of a client release, keeping up or getting access to the in-between data for anyone running an older release is difficult.

Doing back fill by downloading blocks backward isn't really that bad.

It's not that bad now, but it's something that will continue to become more difficult and lengthy process over time, thus slightly degrading the user experience - specially because in general we want to get rid of full historical data, from most nodes - in particular, we want the day-to-day operation of a client not to be burdened by the full block history - it's simply not needed - the key to safely get rid of it is to leave a minimal and actually useful trace in the part that we actually are committed to maintaining over time - the state.

If we consider block storage right now, we're already at around 10gb of raw block data - with this PR, it's feasible to stop burdening all clients with this data because we know that we have a minimal and feasibly verifiable representation in the state - this is a significant win for the user experience.

How to do the fork

In its simplest form, and I think this is the way to go, this is a one-fork feature where we do not backfill anything - we freeze historical_state_roots at whatever length it is at the fork, then simply switch to writing the new list. Clients can then choose to ship a constant or not, if they deem it useful, but by doing this, we stop discarding a useful part of the information we're already computing.

When this PR should be introduced

This PR is more or less a follow-up of the WS sync PR and the way it implicitly proposes that historical data past the WS point should no longer be backfilled - as long as we remain committed to storing blocks from genesis in all nodes, the usefulness of this PR is more limited, but after that, we start losing information in a way that is difficult to recover and coordinate. I would suggest that the two decisions go hand in hand.

paulhauner · 2021-06-07T04:46:27Z

I would generally think that this approach of out-of-band data is viable for a limited amount of time, but not over any longer period

I agree. My preference isn't to never include this; it's to delay it one more fork. If delaying until the next fork is 6 months, then we're looking at 162 additional entries in state.historical_roots.

you can only ship constant historical state root data at the time of a client release

I would have thought it can be included at any point a WS state is accepted (i.e., arbitrarily at runtime). For example, the weak subjectivity state "bundle" could look like this:

class Bundle:
  state: BeaconState,
  historical_state_roots: [Root]

Including all the historical_state_roots since genesis until 6 months in the future (aka "next fork") would add ~10KB to the size of this ~100MB bundle.

with this PR, it's feasible to stop burdening all clients with this data because we know that we have a minimal and feasibly verifiable representation in the state - this is a significant win for the user experience.

This PR makes these proofs easier in one scenario: when a user has 8,196 consecutive block roots and they're want to prove they're ancestors of the state without any more information.

I can see this scenario happening during back-filling, because the user is trying to obtain all the blocks and they're working within the bounds of existing P2P messages.

If you're trying to prove that some range of blocks which don't align exactly with a HistoricalBatch are ancestors then you're going to need a Merkle proof. This PR will make that proof 32-bytes smaller, but I expect that to be a marginal improvement.

I mentioned this on the call, but I figure I'd put it here too. For Lighthouse, this PR means we need to revise the schema for our freezer database. Presently state.historical_roots for finalized and frozen states is laid out in a single vector. This PR would require a slight change to this format and require the database to be fork-aware in this aspect.

These changes are totally doable, there's nothing inherently difficult about them. I'm just trying to highlight that this PR has second-order effects in client implementations that extend beyond its very light impact to the spec.

Including this in Altair isn't going to ruin us and I wouldn't consider it a completely unreasonable thing to do. However, my preferences current reside in not including it.

EDIT: If there's appetite to allow this PR to delay Altair then I would change my stance. However, my understanding is that the EF has no appetite for delays.

arnetheduck · 2021-06-07T07:27:17Z

I would have thought it can be included at any point a WS state is accepted (i.e., arbitrarily at runtime). For example, the weak subjectivity state "bundle" could look like this:

Sure - I mainly was referring to what a client can do on its own, uncoordinated, by including state roots in the binary and not relying on anything else.

In-WS-state bundles would need coordination for them to be useful, and that coordination in turn would need everyone to keep the roots around regardless - ie exactly this PR - limiting WS bundles to special providers is less fun.

If you're trying to prove that some range of blocks which don't align exactly with a HistoricalBatch are ancestors

It's probably easier to download the other 8191 blocks in the range than to build out the infrastructure for merkle proofs for arbitrary ranges :)

paulhauner · 2021-06-07T07:47:48Z

I'm in complete support of this PR being applied at some point. The pressing question is "should we include it in Altair?".

I think I've expressed my thoughts on this topic so I'll lay low until we hear some more opinions. After we gain consensus, I'll be happy to get involved in discussion about how to make it happen either in or after Altair.

djrtwo · 2021-06-07T21:52:30Z

Based on the discussion on the last call, the state of this conversation, and where we stand wrt Altair timelines, I think we should spend time well-spec'ing this for a future fork and lock down Altair at this point.

There is no opposition to a solution in this domain, and general agreement that is should have been done this way to start. For the time being, I suggest we bundle historical_state_roots as Paul suggested along with the ws state. This is trustless wrt the state and merkle proofs, even if a kludge.

I also acknowledge we do not have a good process for putting a feature in a holding pattern for a future fork because features are tangled together in fork-specs rather than independent like in EIPs. I suggest we put this in an issue for now, but that we also look toward how upgrades like this can/should be spec'd once we get past the merge.

arnetheduck · 2021-06-10T06:19:37Z

I suggest we put this in an issue for now,

Generally, I think a PR expresses the idea more clearly - specially since this is a bugfix more than a feature really - ie the general sentiment is that this is something that we should have done from the start.

If people are on board with the no-backfill idea, I'll clean up the PR for the next fork after altair and we can take it from there.

djrtwo · 2021-06-17T12:26:28Z

Will leave PR open then 👍

This PR, a continuation of ethereum#2428, simplifies and replaces `historical_roots` with `historical_block_roots`. By keeping an accumulator of historical block roots in the state, it becomes possible to validate the entire block history that led up to that particular state without executing the transitions, and without checking them one by one in backwards order using a parent chain. This is interesting for archival purposes as well as when implementing sync protocols that can verify chunks of blocks quickly, meaning they can be downloaded in any order. It's also useful as it provides a canonical hash by which such chunks of blocks can be named, with a direct reference in the state. In this PR, `historical_roots` is frozen at its current value and `historical_batches` are computed from the merge epoch onwards. After this PR, `block_batch_root` in the state can be used to verify an era of blocks against the state with a simple root check. The `historical_roots` values on the other hand can be used to verify that a constant distributed with clients is valid for a particular state, and therefore extends the block validation all the way back to genesis without backfilling `block_batch_root` and without introducing any new security assumptions in the client. As far as naming goes, it's convenient to talk about an "era" being 8192 slots ~= 1.14 days. The 8192 number comes from the SLOTS_PER_HISTORICAL_ROOT constant. With multiple easily verifable blocks in a file, it becomes trivial to offload block history to out-of-protocol transfer methods (bittorrent / ftp / whatever) - including execution payloads, paving the way for a future in which clients purge block history in p2p. This PR can be applied along with the merge which simplifies payload distribution from the get-go. Both execution and consensus clients benefit because from the merge onwards, they both need to be able to supply ranges of blocks in the sync protocol from what effectively is "cold storage". Another possibility is to include it in a future cleanup PR - this complicates the "cold storage" mode above by not covering exection payloads from start.

arnetheduck · 2021-10-05T15:25:48Z

Based on discussion, #2649 includes the following modifications:

no backfilling in PR that introduces feature - coordinating the backfill prior to merge is difficult, provides little value and can be solved either with a future PR or a separate change in a future hard fork
the old field is kept as a result of the above

hwwhww added Altair aka HF1 general:enhancement New feature or request labels May 21, 2021

arnetheduck mentioned this pull request May 21, 2021

Align fork epoch with the start of EPOCHS_PER_SYNC_COMMITTEE_PERIOD #2417

Closed

arnetheduck added 7 commits May 24, 2021 18:13

add constants, add backfill

ce928cd

move function to epoch processing

d711a79

Merge remote-tracking branch 'origin/dev' into block-root-accumulator

bf23f92

one more constant

a3ff34c

fixup

7854b27

fixup

6b20da3

add historical_state_roots

96dc7f6

* enables proving the state for empty slots * allows verifying the backfilled fields against the previous `historical_roots` field * allows trivially computing `historical_roots`, for any existing use cases

switch to a single list of historical batch summaries

3ea0b90

arnetheduck added 2 commits May 24, 2021 22:56

fixes

d1b46e2

return List, not list

67012a2

arnetheduck changed the title ~~Historical block roots~~ Historical block/state roots May 25, 2021

djrtwo added DO NOT MERGE and removed Altair aka HF1 labels Jun 17, 2021

arnetheduck mentioned this pull request Oct 5, 2021

Historical batches #2649

Merged

arnetheduck closed this Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Historical block/state roots #2428

Historical block/state roots #2428

arnetheduck commented May 20, 2021 •

edited

Loading

dankrad commented May 20, 2021 •

edited

Loading

arnetheduck commented May 20, 2021 •

edited

Loading

dankrad commented May 20, 2021

ajsutton commented May 20, 2021

arnetheduck commented May 21, 2021 •

edited

Loading

ajsutton commented May 21, 2021

arnetheduck commented May 21, 2021

protolambda commented May 24, 2021 •

edited

Loading

arnetheduck commented May 24, 2021

nisdas commented May 25, 2021

ajsutton commented May 25, 2021

arnetheduck commented May 25, 2021

ajsutton commented May 25, 2021

arnetheduck commented May 25, 2021

ajsutton commented May 26, 2021

nisdas commented May 26, 2021

protolambda commented Jun 3, 2021

arnetheduck commented Jun 3, 2021

paulhauner commented Jun 4, 2021

arnetheduck commented Jun 4, 2021

paulhauner commented Jun 7, 2021 •

edited

Loading

arnetheduck commented Jun 7, 2021

paulhauner commented Jun 7, 2021

djrtwo commented Jun 7, 2021

arnetheduck commented Jun 10, 2021

djrtwo commented Jun 17, 2021

arnetheduck commented Oct 5, 2021

Historical block/state roots #2428

Historical block/state roots #2428

Conversation

arnetheduck commented May 20, 2021 • edited Loading

dankrad commented May 20, 2021 • edited Loading

arnetheduck commented May 20, 2021 • edited Loading

dankrad commented May 20, 2021

ajsutton commented May 20, 2021

arnetheduck commented May 21, 2021 • edited Loading

ajsutton commented May 21, 2021

arnetheduck commented May 21, 2021

protolambda commented May 24, 2021 • edited Loading

Why this way?

The tricky part

arnetheduck commented May 24, 2021

nisdas commented May 25, 2021

ajsutton commented May 25, 2021

arnetheduck commented May 25, 2021

ajsutton commented May 25, 2021

arnetheduck commented May 25, 2021

ajsutton commented May 26, 2021

nisdas commented May 26, 2021

protolambda commented Jun 3, 2021

arnetheduck commented Jun 3, 2021

paulhauner commented Jun 4, 2021

arnetheduck commented Jun 4, 2021

How to do the fork

When this PR should be introduced

paulhauner commented Jun 7, 2021 • edited Loading

arnetheduck commented Jun 7, 2021

paulhauner commented Jun 7, 2021

djrtwo commented Jun 7, 2021

arnetheduck commented Jun 10, 2021

djrtwo commented Jun 17, 2021

arnetheduck commented Oct 5, 2021

arnetheduck commented May 20, 2021 •

edited

Loading

dankrad commented May 20, 2021 •

edited

Loading

arnetheduck commented May 20, 2021 •

edited

Loading

arnetheduck commented May 21, 2021 •

edited

Loading

protolambda commented May 24, 2021 •

edited

Loading

paulhauner commented Jun 7, 2021 •

edited

Loading