Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove alignment limitations from checkpoint sync #3210

Closed
michaelsproul opened this issue May 24, 2022 · 4 comments
Closed

Remove alignment limitations from checkpoint sync #3210

michaelsproul opened this issue May 24, 2022 · 4 comments
Assignees
Labels
database enhancement New feature or request tree-states Upcoming state and database overhaul

Comments

@michaelsproul
Copy link
Member

Description

Currently Lighthouse's checkpoint sync requires that the checkpoint state is not from a skipped slot, see: https://lighthouse-book.sigmaprime.io/checkpoint-sync.html#alignment-requirements

This limitation is somewhat artificial, and was taken in order to simplify the initial implementation. I've forgotten exactly why but I'll post more info on this issue when I dive back into it again.

@michaelsproul
Copy link
Member Author

Another desirable property: checkpoint sync from a state alone, without having to load the block.

@michaelsproul
Copy link
Member Author

I think the simplest way to solve this would be to advance the provided state to the nearest epoch boundary, although I might wait for the discussion here to be resolved until implementing that change.

@michaelsproul
Copy link
Member Author

Regarding the use of unaligned states, I went looking for some places where we look up states for block roots, which might fail if the only state stored is the advanced version of the state (e.g. block was at slot 30 and we've stored its state skipped through to 32).

  1. The main place where this is relevant is in the on-finalization database migration here:

// The store migration task requires the *state at the slot of the finalized epoch*,
// rather than the state of the latest finalized block. These two values will only
// differ when the first slot of the finalized epoch is a skip slot.
//
// Use the `StateRootsIterator` directly rather than `BeaconChain::state_root_at_slot`
// to ensure we use the same state that we just set as the head.
let new_finalized_slot = new_view
.finalized_checkpoint
.epoch
.start_slot(T::EthSpec::slots_per_epoch());
let new_finalized_state_root = process_results(
StateRootsIterator::new(&self.store, &new_snapshot.beacon_state),
|mut iter| {
iter.find_map(|(state_root, slot)| {
if slot == new_finalized_slot {
Some(state_root)
} else {
None
}
})
},
)?
.ok_or(Error::MissingFinalizedStateRoot(new_finalized_slot))?;

That code takes care to use the advanced form of the finalized block's state, which is exactly what we want.

  1. This usage in the committee cache lookup could be problematic if the fork choice node stores the unadvanced state root:

let state_root = head_block.state_root;
let state = self
.store
.get_inconsistent_state_for_attestation_verification_only(
&state_root,
Some(head_block.slot),
)?
.ok_or(Error::MissingBeaconState(head_block.state_root))?;
(state, state_root)

  1. In fork choice itself where we load the balances from the justified state could also be an issue. When we checkpoint sync we artificially set the justified and finalized blocks equal to each other, so this could result in a lookup of the finalized block's state:

let state = self
.store
.get_state(&justified_block.state_root(), Some(justified_block.slot()))
.map_err(Error::FailedToReadState)?
.ok_or_else(|| Error::MissingState(justified_block.state_root()))?;

  1. We load the state for the parent block's state root in block verification:

// Load the parent blocks state from the database, returning an error if it is not found.
// It is an error because if we know the parent block we should also know the parent state.
let parent_state_root = parent_block.state_root();
let parent_state = chain
.get_state(&parent_state_root, Some(parent_block.slot()))?
.ok_or_else(|| {
BeaconChainError::DBInconsistent(format!("Missing state {:?}", parent_state_root))
})?;

This would cause blocks that are descended from the finalized block (with skips) to error incorrectly. We should definitely fix this.

  1. We load the state for the head block's state root on start-up:

let beacon_block = store
.get_full_block(&beacon_block_root)?
.ok_or(Error::MissingBeaconBlock(beacon_block_root))?;
let beacon_state_root = beacon_block.state_root();
let beacon_state = store
.get_state(&beacon_state_root, Some(beacon_block.slot()))?
.ok_or(Error::MissingBeaconState(beacon_state_root))?;

This could error if the finalized block is the head (e.g. restarting immediately after checkpoint sync).


There are probably other places too.

Some ideas for fixes:

A. Handle each case independently using bespoke logic. E.g. for fork choice we could make sure the state_root stored in the nodes is the advanced state's state root (when initialising from a checkpoint). This might work, but might also break other things.
B. Store the unaligned state in the database, and keep track of this. @paulhauner suggested a separate column, but I think the BeaconState column would be fine (using store_full_state). We'd just need to keep track of the state root for this unaligned finalized state in the store, and have a special case for loading the unaligned state when get_state requests a state root that matches.
C. Alternatively we could do a variant of B where we don't actually store the unaligned state, but we do store its state root and redirect requests for this state root to the aligned/advanced state. I think this is strictly worse, aside from the disk usage, as it breaks a pretty fundamental property of get_state.

So far I'm in favour of B, with no new column, and a DB schema migration to add the unaligned_finalized_state_root to the DB. I think it could reasonably live in the AnchorInfo, as it is only relevant when the anchor is non-null (i.e. for non-archive nodes).

@michaelsproul
Copy link
Member Author

Adding the tree-states tag, because sorting out the alignment issue is required for #4481, which is required for a tree-states database migration.

@michaelsproul michaelsproul added database tree-states Upcoming state and database overhaul labels Jul 7, 2023
bors bot pushed a commit that referenced this issue Aug 21, 2023
…uning (#4610)

## Issue Addressed

Closes #3210
Closes #3211

## Proposed Changes

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

## Additional Info

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <seananderson33@gmail.com>
bors bot pushed a commit that referenced this issue Aug 21, 2023
…uning (#4610)

## Issue Addressed

Closes #3210
Closes #3211

## Proposed Changes

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

## Additional Info

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <seananderson33@gmail.com>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
…uning (sigp#4610)

Closes sigp#3210
Closes sigp#3211

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <seananderson33@gmail.com>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
…uning (sigp#4610)

Closes sigp#3210
Closes sigp#3211

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <seananderson33@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database enhancement New feature or request tree-states Upcoming state and database overhaul
Projects
None yet
Development

No branches or pull requests

1 participant