Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve next epoch shuffling computation #6268

Closed
twoeths opened this issue Jan 9, 2024 · 5 comments
Closed

Improve next epoch shuffling computation #6268

twoeths opened this issue Jan 9, 2024 · 5 comments
Assignees
Labels
meta-feature-request Issues to track feature requests. prio-high Resolve issues as soon as possible. scope-performance Performance issue and ideas to improve performance.
Milestone

Comments

@twoeths
Copy link
Contributor

twoeths commented Jan 9, 2024

Problem description

For now next epoch shuffling computation in afterProcessEpoch takes the most time of epoch transition in normal condition. In some gc intensive condition, processRewardsAndPenalties takes more time (I used to see it took up to 1.8s) see #6229

this is a holesky profile:

Screenshot 2024-01-09 at 11 05 15

Solution description

as discussed with @dapplion we can go with these 2 approaches:

  • Right after beforeProcessEpoch, we have nextEpochShufflingActiveValidatorIndices and a finalized randaoMixes, we should be able to offload next epoch shuffling computation to a worker thread, and epoch transition becomes async
  • unshuffleList is the main part of shuffling computation, we should be able to implement it in assembly script importing as-sha256 from there

Additional context

No response

@twoeths twoeths added the meta-feature-request Issues to track feature requests. label Jan 9, 2024
@dapplion
Copy link
Contributor

dapplion commented Jan 9, 2024

epoch transition becomes async

We can have a flag to skip the unshuffle computation and delegate it to the caller of the epoch transition, kind of like we do with execution payload verification

@philknows philknows added prio-high Resolve issues as soon as possible. scope-performance Performance issue and ideas to improve performance. labels Jan 9, 2024
@philknows philknows added this to the v1.16.0 milestone Jan 23, 2024
@twoeths
Copy link
Contributor Author

twoeths commented Apr 3, 2024

one proposal for this work with some criterias:

  • keep state-transition async, no need to change it, just add some simple apis
  • ShufflingCache is only managed by beacon-node

given the epoch transition mostly get triggered by prepareNextSlot while prepareNextSlot only need BeaconStateAllForks, we don't need to compute shuffling synchronously there

  • new PartialCachedBeaconState = CachedBeaconState without nextShuffling
    • may need to refactor EpochCache and BeaconStateCache or not
    • this should NOT be in any state caches, it’s short lived for ~1s per epoch until nextShuffling is computed
  • state-transition
    • new processSlotsSingleEpochTransition(state: CachedBeaconState) returning PartialCachedBeaconState in order not to have to compute nextShuffling
    • new processSlotsNoEpochTransition(state: BeaconStateAllForks): BeaconStateAllForks
  • regen: new api getPartialCachedBeaconState(headRoot, prepareSlot) returning PartialCachedBeaconState to prepare for next slot
    • new function processToCheckpoint which is similar to processSlotsToNeareshCheckpoint but specify an end epoch, here it returns a CachedBeaconState
    • getPartialCachedBeaconState(blockRoot, slot):
      • processToCheckpoint(slot % 32 - 32): to process multiple epoch transitions except the last one
      • processSlotsSingleEpochTransition(): to process the last epoch transition, this returns PartialCachedBeaconState, no shuffling computation involved
      • processSlotsNoEpochTransition: to update the state until a slot
      • once it’s done asynchronously compute nextShuffling, then create the full CachedBeaconState and add to the cache
  • prepareNextSlot
    • consume new api getPartialBeaconState(headRoot, prepareSlot)
    • all downstream apis need to use the BeaconStateAllForks. If not use the PartialCachedBeaconState (hopefully its not the case)

@twoeths
Copy link
Contributor Author

twoeths commented Apr 4, 2024

another idea is to model ShufflingSource for next epoch. We always need previous/current shuffling, not the next shuffling so it could be lazily computed at beacon-node side after an epoch transition

export type ShufflingSource = {
  epoch: Epoch;
  activeIndices: Uint32Array;

  // TODO: consider if we need this or not, in that case make it ShufflingSummary
  // committeesPerSlot: number;
  // committeeLens: number[][];
};

export type Shuffling = {
  shuffling: Uint32Array;
  committees: Uint32Array[][];
  committeesPerSlot: number;
};

export type EpochShuffling = ShufflingSource & Shuffling;

export function isEpochShuffling(shufflingOrIndices: ShufflingSource): shufflingOrIndices is EpochShuffling {
  return (shufflingOrIndices as EpochShuffling).shuffling !== undefined;
}

then type nextShuffling as EpochSource

  previousShuffling: EpochShuffling;
  currentShuffling: EpochShuffling;
  nextShuffling: ShufflingSource;

then in afterProcessEpoch we don't need to compute next shuffling

afterProcessEpoch(
    state: BeaconStateAllForks,
    epochTransitionCache: {
      nextEpochShufflingActiveValidatorIndices: ValidatorIndex[];
      nextEpochTotalActiveBalanceByIncrement: number;
    }
  ): void {
    this.previousShuffling = this.currentShuffling;
    this.currentShuffling = isEpochShuffling(this.nextShuffling)
      ? this.nextShuffling
      // should rarely/never happens
      // TODO: track this is a metric
      : computeEpochShuffling(state, this.nextShuffling.activeIndices, this.nextShuffling.epoch);
    const currEpoch = this.currentShuffling.epoch;
    const nextEpoch = currEpoch + 1;
    this.nextShuffling = {
      epoch: nextEpoch,
      activeIndices: new Uint32Array(epochTransitionCache.nextEpochShufflingActiveValidatorIndices),
    };

right after an epoch transition, beacon-node should compute nextShuffling and update CachedBeaconState, update ShufflingCache and state caches as well

@twoeths
Copy link
Contributor Author

twoeths commented Sep 24, 2024

after #6938 was merged

  • Epoch transition is reduced because we move the code to the next event loop
Screenshot 2024-09-24 at 09 12 14
  • But Prepare Next Epoch Duration is the same
Screenshot 2024-09-24 at 09 13 13

need to implement

callInNextEventLoop(() => {

@wemeetagain
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-feature-request Issues to track feature requests. prio-high Resolve issues as soon as possible. scope-performance Performance issue and ideas to improve performance.
Projects
None yet
Development

No branches or pull requests

5 participants