Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next slot state caching #8357

Merged
merged 16 commits into from
Jan 29, 2021
Merged

Next slot state caching #8357

merged 16 commits into from
Jan 29, 2021

Conversation

terencechain
Copy link
Member

@terencechain terencechain commented Jan 29, 2021

What type of PR is this?

Feature

What does this PR do? Why is it needed?

Credit to @paulhauner for the idea and the motivation

This PR adds the next slot state cache to enable state slot advancement during trialing slot. This is to optimize process epoch bottlenecks + late block arrivals that we have seen in the wild.

Accompany design doc: https://hackmd.io/BFHjQn4VTjO3JhlSSyDTWw?view

Which issues(s) does this PR fix?

N/A

Other notes for review

  • Tested on pyrmont
  • Tested on local interop set up
  • The testing methodology was to use metrics. With proposal, it should always be 3 hits per slot. Otherwise, 1 hit per slot.
  • Cache belongs in state pkg than cache pkg to live next to implementation and avoid import cycle. It calls ProcessSlots

@terencechain terencechain self-assigned this Jan 29, 2021
@codecov
Copy link

codecov bot commented Jan 29, 2021

Codecov Report

Merging #8357 (6d89f3b) into develop (e592cd7) will decrease coverage by 0.02%.
The diff coverage is 43.90%.

@@             Coverage Diff             @@
##           develop    #8357      +/-   ##
===========================================
- Coverage    57.89%   57.87%   -0.03%     
===========================================
  Files          457      458       +1     
  Lines        32076    32113      +37     
===========================================
+ Hits         18570    18584      +14     
- Misses       10674    10696      +22     
- Partials      2832     2833       +1     

// Updating next slot state cache can happen in the background. It shouldn't block rest of the process.
go func() {
if err := state.UpdateNextSlotCache(ctx, blockRoot[:], postState); err != nil {
log.WithError(err).Warn("could not update next slot state cache")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider debug here, warn is not actionable to users

// GetNextSlotState returns the saved state if the input root matches the root in `nextSlotCache`. Returns nil otherwise.
// This is useful to check before processing slots. With a cache hit, it will return last processed state with slot plus
// one advancement.
func GetNextSlotState(ctx context.Context, root []byte) (*state.BeaconState, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the naming convention of Get? I guess yes because that's what we use for caches typically, right

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed

// This is useful to check before processing slots. With a cache hit, it will return last processed state with slot plus
// one advancement.
func GetNextSlotState(ctx context.Context, root []byte) (*state.BeaconState, error) {
nsc.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rlock here

nsc.RLock()
// Returning early if state exists in cache.
if bytes.Equal(root, nsc.root) {
defer nsc.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer nsc.RUnlock()
nsc.RUnlock()

// Execute per slots transition.
state, err = ProcessSlots(ctx, state, signed.Block.Slot)
// Check whether the parent state has been advanced by 1 slot in next slot cache.
tsState, err := GetNextSlotState(ctx, signed.Block.ParentRoot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is ts as a prefix? Perhaps we can use a more informative name

@terencechain terencechain marked this pull request as ready for review January 29, 2021 16:28
@terencechain terencechain requested a review from a team as a code owner January 29, 2021 16:28
// This is useful to call after successfully processing a block.
func UpdateNextSlotCache(ctx context.Context, root []byte, state *state.BeaconState) error {
// Advancing one slot by using a copied state.
copied := state.Copy()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need to copy here, but im still doing it to be safe. Any preference on removing it and adding an explicit comment? @nisdas

I wonder if the performance trade off is worth it

@prylabs-bulldozer prylabs-bulldozer bot merged commit 3aaa98d into develop Jan 29, 2021
@delete-merged-branch delete-merged-branch bot deleted the trailing-slot-processing branch January 29, 2021 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants