statediff: Use a worker pool #43

roysc · 2020-11-19T12:35:39Z

Changes statediff service to spawn a pool of workers each running an independent WriteLoop.

Number of workers is configured with a new statediff.workers CLI flag.
Expands the lastBlock cache to use a map of hashes to blocks, with a max size also limited to the number of workers.
Geth uses an internal cache for receipts and other data; this causes a data race to be detected inside receipts.DeriveFields() when used during indexing, though I didn't debug this deeply enough to find out why the same receipts data would ever be touched by goroutines which should only be working on separate blocks.
There is still a data race issue in the Trie data access, ~~probably due to the state.Database hitting its cache limit.~~(cache miss shouldn't cause this)

Resolves #41

roysc · 2020-11-20T10:51:53Z

Running this on a fresh full chain, I'm not able to reproduce the trie node errors I was seeing before, so they may not be directly related - removing the WIP tag.

i-norden

Looks great! A few comments but only one or two changes. Running locally everything seem to be working (ran with 4 and 8 workers, around block 80,000). I do see some errors on shutdown that I think are new but also probably inconsequential:
WARN [11-24|02:40:55.972] Error from chain event subscription error=nil WARN [11-24|02:40:56.031] Error from chain event subscription error=nil WARN [11-24|02:40:56.032] Error from chain event subscription error=nil

Are you still seeing a Trie data race issue on your end?

I haven't benchmarked, it would be good to get a rough estimate on performance impact.

i-norden · 2020-11-24T07:48:17Z

statediff/helpers.go

@@ -96,3 +99,39 @@ func CheckKeyType(elements []interface{}) (sdtypes.NodeType, error) {
 		return sdtypes.Unknown, fmt.Errorf("unknown hex prefix")
 	}
 }
+
+// Deep-copy a receipt
+func CopyReceipt(dst, src *types.Receipt) {


I know you can't do it with rlp encoding/decoding since it only encodes the consensus fields but I wonder if you could copy by marshaling the receipt to JSON bytes and then unmarshalling into a new types.Receipt.

i-norden · 2020-11-24T07:49:22Z

statediff/service.go

@@ -89,6 +89,15 @@ type IService interface {
 	WriteLoop(chainEventCh chan core.ChainEvent)
 }

+// Wraps consructor parameters
+type ServiceParams struct {


I like the params getting passed together like this!

i-norden · 2020-11-24T07:54:21Z

statediff/service.go

@@ -172,41 +198,63 @@ func (sds *Service) APIs() []rpc.API {
 	}
 }

-func (lbc *lastBlockCache) replace(currentBlock *types.Block, bc blockChain) *types.Block {


I'm wondering if the cache is still beneficial with the concurrent approach given the additional complexity.

I checked and the hit/access ratio is very close to 1, so it does still seem to be helping, I can verify further with benchmarks.

Oh that's awesome, thanks for checking that!

i-norden · 2020-11-24T07:55:17Z

statediff/service.go

+		parentBlock = block
+		if len(lbc.blocks) > int(lbc.maxSize) {
+			delete(lbc.blocks, parentHash)
+		}
 	} else {
 		parentBlock = bc.GetBlockByHash(parentHash)


I may be missing it but I think we still need to cache the parentBlock returned by bc.GetBlockByHash?

My reasoning was that two blocks shouldn't be processed with the same parent block, so we just cache the current one so it's available when its child block gets processed. I did some logging to verify this - blocks are accessed no more than once. Also, the number of misses is so low that hardly any parents would get added to the cache (since we would only add them on a miss).

Ah yeah that makes sense, and is generally true. The only time this wouldn't be true is when there are reorgs while syncing at the head of chain (will never see it happen during chain import e.g. during any of our local tests). I think in that case it is likely still better to need to reach past the cache to retrieve the block rather than to waste cache space on the off chance.

i-norden · 2020-11-24T08:00:46Z

statediff/service.go

 	lbc.Unlock()
 	return parentBlock
 }

+type workerParams struct {
+	chainEventCh <-chan core.ChainEvent


When integrating with the metrics we might want to have an intermediate select loop that logs the metric off the primary core.ChainEvent channel before passing the event along to a secondary worker queue channel here, if we want to avoid logging metrics from within worker goroutines.

Done, after rebasing on this branch

i-norden · 2020-11-24T08:04:58Z

statediff/service.go

@@ -410,12 +458,16 @@ func (sds *Service) Unsubscribe(id rpc.ID) error {
 func (sds *Service) Start() error {
 	log.Info("Starting statediff service")

-	chainEventCh := make(chan core.ChainEvent, chainEventChanSize)
-	go sds.Loop(chainEventCh)
+	{


Brackets and comment can go I think

oh yeah, meant to clean it up

i-norden · 2020-11-24T08:37:45Z

statediff/service.go

+		// To avoid a data race caused by Geth's internal caching, deep-copy the accessed receipts
+		for _, rct := range sds.BlockChain.GetReceiptsByHash(block.Hash()) {
+			var newrct types.Receipt
+			CopyReceipt(&newrct, rct)


This is simpler and appears to be working when I run locally:

newRct := new(types.Receipt) *newRct = *rct receipts = append(receipts, newRct)

It's odd - I was getting a data race detected at various points in this loop, but now I can't reproduce it even on that commit where it was happening. That was the reason for the deep copy before, but now it seems even the shallow copy is not necessary, so I can revert those changes.

i-norden

LGTM! Feel free to merge when you are ready.

roysc · 2020-11-25T10:29:17Z

The chain event subscription error seems to be from the workers receiving the close of the error channel before the quit channel. Like you said, doesn't seem to indicate a real problem, though.

roysc requested a review from i-norden November 19, 2020 12:48

roysc changed the title ~~[WIP] statediff: Use a worker pool~~ statediff: Use a worker pool Nov 20, 2020

metrics for statediff stats

c1c41ef

i-norden requested changes Nov 24, 2020

View reviewed changes

roysc added 2 commits November 24, 2020 18:19

metrics namespace/subsystem = statediff/{indexer,service}

72a4772

statediff: use a worker pool (for direct writes)

02c7e78

roysc force-pushed the 41-statediff-workerpool branch from 09c7f65 to 141311c Compare November 24, 2020 16:06

roysc added 6 commits November 25, 2020 00:06

fix test

bf02717

fix chain event subscription

5c35c86

log tweaks

dd6f9cc

func name

ae2f32f

unused import

8c9d8cb

intermediate chain event channel for metrics

ab841a9

roysc force-pushed the 41-statediff-workerpool branch from 141311c to ab841a9 Compare November 24, 2020 16:07

i-norden approved these changes Nov 25, 2020

View reviewed changes

cleanup

83c3583

roysc merged commit 7c8fb48 into v1.9.24-statediff Nov 25, 2020

i-norden mentioned this pull request Nov 25, 2020

Worker pool for diffing off of the chainEvent channel #41

Closed

i-norden deleted the 41-statediff-workerpool branch January 22, 2021 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statediff: Use a worker pool #43

statediff: Use a worker pool #43

roysc commented Nov 19, 2020 •

edited

Loading

roysc commented Nov 20, 2020

i-norden left a comment

i-norden Nov 24, 2020

i-norden Nov 24, 2020

i-norden Nov 24, 2020 •

edited

Loading

roysc Nov 24, 2020

i-norden Nov 25, 2020

i-norden Nov 24, 2020

roysc Nov 24, 2020

i-norden Nov 25, 2020 •

edited

Loading

i-norden Nov 24, 2020

roysc Nov 24, 2020

i-norden Nov 24, 2020

roysc Nov 24, 2020

i-norden Nov 24, 2020

roysc Nov 24, 2020 •

edited

Loading

i-norden left a comment •

edited

Loading

roysc commented Nov 25, 2020

statediff: Use a worker pool #43

statediff: Use a worker pool #43

Conversation

roysc commented Nov 19, 2020 • edited Loading

roysc commented Nov 20, 2020

i-norden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

i-norden Nov 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

i-norden Nov 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roysc Nov 24, 2020 • edited Loading

Choose a reason for hiding this comment

i-norden left a comment • edited Loading

Choose a reason for hiding this comment

roysc commented Nov 25, 2020

roysc commented Nov 19, 2020 •

edited

Loading

i-norden Nov 24, 2020 •

edited

Loading

i-norden Nov 25, 2020 •

edited

Loading

roysc Nov 24, 2020 •

edited

Loading

i-norden left a comment •

edited

Loading