Switch to lazy state balance cache #9822

kasey · 2021-10-27T02:28:25Z

What type of PR is this?

Other

(enhancement / code health, eventually part of WSS)

What does this PR do? Why is it needed?
This PR adds a modified state balance cache implementation. Instead of any code that modifies the copy of the justified root recorded by blockchain.Service needing to update the balance state cache, the cache is updated on read based on the requested block root.

This reduces the need for code to explicitly update the cache when updating the copy of the justified state. This simplifies several areas of the code, consolidates responsibility for the cache into the cache type, and decouples the cache from the service somewhat, enabling further refactoring of service initialization in support of weak subjectivity sync.

Other notes for review

Previously the update cache method also triggered a sync of the init sync blocks: s.cfg.BeaconDB.SaveBlocks(ctx, s.getInitSyncBlocks()). This has been removed based on the assumption that justified states are always saved to the database because they lie on epoch boundaries (which is our minimum state persistence interval). To strengthen this invariant we also changed the order of operations in updateJustifiedInitSync, ensuring that SaveJustifiedCheckpoint completes successfully before changing Service.justifiedCheckpoint. SaveJustifiedCheckpoint will fail if the corresponding state hasn't been saved to the db, so doing the swap after the save ensures a more atomic update of both db and runtime variable.

~~Marking this as WIP for now because I may want to add some additional test coverage to this type. Currently it is only tested transitively through service tests. On the topic of tests,~~ edit: added some test coverage directly on the cache. this PR also contains updates to every test that calls NewService, because NewService has been modified to initialize the cache. Previously many tests set the .cfg field on the Service after calling NewService, but since the cache is initialized inside NewService, it needs stategen to be set first. This was accomplished by updating all these call sites to use the new functional opts ([]Options param to NewService). Some helpers were added to cover common test initialization patterns and some care was taken to not add test db initialization where not necessary.

reduces the number of test cases that require db setup

beacon-chain/blockchain/process_block.go

terencechain · 2021-10-27T09:52:08Z

beacon-chain/blockchain/head.go

-
-	var justifiedState state.BeaconState
-	var err error
-	if justifiedRoot == s.genesisRoot {


Is this condition necessary in the new scheme?

when we have a cache miss and go back to stategen for the lookup, it will attempt to read the state from the db. If the root is equal to genesisRoot, I believe that we can assume it will be present in the database. correct? In other words I believe that get state by root using the genesis root is equivalent to calling db.GenesisState.

My instinct says this exists because for some reason at genesis both justifiedRoot and s.genesisRoot can be 0x000....

To be safe, maybe we should apply ensureRootNotZeros to c.stateGen.StateByRoot(ctx, justifiedRoot) in get ?

I forgot this detail -- stategen.StateByRoot internally checks for the zero hash and in that case loads from the DB:
https://github.com/prysmaticlabs/prysm/blob/develop/beacon-chain/state/stategen/getter.go#L45

Ah perfect. Thanks!

terencechain · 2021-10-27T09:53:56Z

beacon-chain/blockchain/process_block_helpers.go

@@ -359,9 +359,6 @@ func (s *Service) finalizedImpliesNewJustified(ctx context.Context, state state.
 		}
 		if !bytes.Equal(anc, s.finalizedCheckpt.Root) {
 			s.justifiedCheckpt = state.CurrentJustifiedCheckpoint()
-			if err := s.cacheJustifiedStateBalances(ctx, bytesutil.ToBytes32(s.justifiedCheckpt.Root)); err != nil {


Are we sure it's ok to remove this? Some of these are in place to defend against fork choice bouncing attacks?

I'm not 100% confident. Would appreciate your help formulating a test case to increase our confidence that this is not an issue.

Looking into this today

All these deletions are fine because we still updated s.justifiedCheckpt

Co-authored-by: terence tsao <terence@prysmaticlabs.com>

terencechain · 2021-10-28T21:18:11Z

beacon-chain/blockchain/process_block_helpers.go

@@ -359,9 +359,6 @@ func (s *Service) finalizedImpliesNewJustified(ctx context.Context, state state.
 		}
 		if !bytes.Equal(anc, s.finalizedCheckpt.Root) {
 			s.justifiedCheckpt = state.CurrentJustifiedCheckpoint()
-			if err := s.cacheJustifiedStateBalances(ctx, bytesutil.ToBytes32(s.justifiedCheckpt.Root)); err != nil {


All these deletions are fine because we still updated s.justifiedCheckpt

beacon-chain/blockchain/state_balance_cache.go

…lazy-cache-poc

service_test brings in a ton of dependencies that make bazel rules for blockchain complex, so just sticking these mocks in their own file simplifies things.

this test established that the zero root can't be used to look up the state, resulting in a change in another PR to update stategen to use the GenesisState db method instead when the zero root is detected.

testing/require/requires.go

terencechain · 2021-11-17T18:35:53Z

beacon-chain/blockchain/metrics.go

@@ -130,6 +130,14 @@ var (
 		Name: "sync_head_state_hit",
 		Help: "The number of sync head state requests that are present in the cache.",
 	})
+	stateBalanceCacheHit = promauto.NewCounter(prometheus.CounterOpts{
+		Name: "balance_cache_hit",


there names are too similar to the other metrics like total_effective_balance_cache_hit

why dont define these metrics closer to its usages inside state_balance_cache.go?

why dont define these metrics closer to its usages

I'm following the conventions of the blockchain package here, all the prom metric values are defined in metrics.go.

there names are too similar to the other metrics

how about state_balance_cache_(hit|miss)?

for some reason i thought the cache resides in cache pkg, it makes sense that the metric is here, i dont want to go into another debate where the cache should live ha!

state_balance_cache is fine, justified_state_balance_cache is also fine

terencechain · 2021-11-17T18:41:19Z

beacon-chain/blockchain/service.go

 		cfg:                  &config{},
 	}
 	for _, opt := range opts {
 		if err := opt(srv); err != nil {
 			return nil, err
 		}
 	}
+	if srv.justifiedBalances == nil {
+		if srv.cfg.StateGen == nil {


Any reasons not having the nil check inside newStateBalanceCache and change the signatures to
newStateBalanceCache(sg *stategen.State) (*stateBalanceCache, error)

i like that idea, the whole point of the constructor is to hint that the zero-value for the cache doesn't work, so it makes sense to make it more defensive

terencechain · 2021-11-17T18:48:46Z

beacon-chain/blockchain/state_balance_cache.go

+// read path can connect to the upstream cancellation/timeout chain.
+func (c *stateBalanceCache) get(ctx context.Context, justifiedRoot [32]byte) ([]uint64, error) {
+	c.Lock()
+	defer c.Unlock()


The locks can be further optimized by having read locks around this condition and write locks when miss

It's a trade off between readability, not sure if it's worth it, just wanted to point out this option

I agree that would be fine, and actually wrote it that way at first. Problem was, when you have the same thread holding a read lock, then a write lock, you have to briefly let go of the read lock before acquiring the write lock, and then another thread could get a stale read and try to concurrently update. I like the single mutex because it gets rid of that edge case and makes this simpler to reason about.

kasey added 6 commits October 25, 2021 18:25

quick lazy balance cache proof of concept

594a853

WIP refactoring to use lazy cache

628d2f3

updating tests to use functional opts

f29d0d8

updating the rest of the tests, all passing

3c71522

use mock stategen where possible

5499b9f

reduces the number of test cases that require db setup

rename test opt method for clear link

aa9004b

kasey requested review from rauljordan, terencechain and nisdas October 27, 2021 02:28

kasey requested a review from a team as a code owner October 27, 2021 02:28

terencechain reviewed Oct 27, 2021

View reviewed changes

kasey and others added 2 commits October 27, 2021 08:10

Update beacon-chain/blockchain/process_block.go

5832ae4

Co-authored-by: terence tsao <terence@prysmaticlabs.com>

test assumption that zerohash is in db

ca992c9

terencechain reviewed Oct 28, 2021

View reviewed changes

kasey added 4 commits November 1, 2021 10:57

remove unused MockDB (mocking stategen instead)

151283d

Merge branch 'lazy-cache-poc' of github.com:prysmaticlabs/prysm into …

95da658

…lazy-cache-poc

fix cache bug, switch to sync.Mutex

9e9622d

improve test coverage for the state cache

e308933

kasey changed the title ~~WIP: Switch to lazy state balance cache~~ Switch to lazy state balance cache Nov 2, 2021

uncomment failing genesis test for discussion

b5f56b4

kasey mentioned this pull request Nov 3, 2021

fix #9851 using GenesisState when zero hash in stategen StateByRoot/StateByRootInitialSync #9852

Merged

kasey added 9 commits November 3, 2021 21:37

Merge branch 'develop' into lazy-cache-poc

54768dc

gofmt

748bfd3

remove unused Service struct member

067eba9

cleanup unused func input

a18f5b6

combining type declaration in signature

868bc0e

don't export the state cache constructor

a89e0f5

work around blockchain deps w/ new file

ff41c28

service_test brings in a ton of dependencies that make bazel rules for blockchain complex, so just sticking these mocks in their own file simplifies things.

gofmt

54607c3

remove intentionally failing test

6fbb6f6

this test established that the zero root can't be used to look up the state, resulting in a change in another PR to update stategen to use the GenesisState db method instead when the zero root is detected.

kasey and others added 6 commits November 10, 2021 11:26

Merge branch 'develop' into lazy-cache-poc

3acc951

fixed error introduced by develop refresh

5451054

fix import ordering

a4a00f2

appease deepsource

63bf9f9

remove unused function

aededf8

Merge branch 'develop' into lazy-cache-poc

4af1335

terencechain reviewed Nov 10, 2021

View reviewed changes

testing/require/requires.go Show resolved Hide resolved

kasey added 2 commits November 17, 2021 11:31

Merge branch 'develop' into lazy-cache-poc

44eebee

godoc comments on new requires/assert

fa5f07e

terencechain reviewed Nov 17, 2021

View reviewed changes

kasey added 2 commits November 17, 2021 16:45

defensive constructor per terence's PR comment

56e9445

more differentiated balance cache metric names

bf8f9b6

terencechain approved these changes Nov 17, 2021

View reviewed changes

Merge branch 'develop' into lazy-cache-poc

75b6473

rauljordan added the OK to merge label Nov 19, 2021

Merge refs/heads/develop into lazy-cache-poc

e4da535

prylabs-bulldozer bot merged commit 39c33b8 into develop Nov 19, 2021

delete-merged-branch bot deleted the lazy-cache-poc branch November 19, 2021 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to lazy state balance cache #9822

Switch to lazy state balance cache #9822

kasey commented Oct 27, 2021 •

edited

Loading

terencechain Oct 27, 2021

kasey Oct 27, 2021

terencechain Oct 27, 2021

kasey Oct 27, 2021

terencechain Oct 27, 2021

terencechain Oct 27, 2021

kasey Oct 27, 2021

terencechain Oct 27, 2021

terencechain Oct 28, 2021

terencechain Oct 28, 2021

terencechain Nov 17, 2021

terencechain Nov 17, 2021

kasey Nov 17, 2021 •

edited

Loading

terencechain Nov 17, 2021

terencechain Nov 17, 2021

kasey Nov 17, 2021

terencechain Nov 17, 2021

kasey Nov 17, 2021

Switch to lazy state balance cache #9822

Switch to lazy state balance cache #9822

Conversation

kasey commented Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasey Nov 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasey commented Oct 27, 2021 •

edited

Loading

kasey Nov 17, 2021 •

edited

Loading