[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #3050

fxamacker · 2022-08-22T16:36:30Z

Primary goal is to reduce operational RAM in checkpoint v5. Secondary goals include speeding up checkpointing and redesign to simplify concurrency in the next PR.

UPDATE: 🚀 Full checkpointing v5 finishes in 12-13 minutes on EN4 and reduced peak memory use more than expected. This PR was merged on Aug 23 and deployed to EN4.mainnet19 on Oct 7, 2022.

This PR replaces largest data structure used for checkpoint serialization. During serialization, this change processes subtries instead of entire tries at once. Changes also focused on preallocations to increase memory savings.

Serializing data in parallel is made easier (because this PR splits mtrie into multiple subtries), but adding parallelism is outside the scope of this PR. Issue #3075 should be used to determine if parallelism is worthwhile (at this time) before implementing it because parallelism has tradeoffs such as consuming more RAM, etc.

Closes #2964
Updates #1744
Updates #3075

Preliminary Results Using Level 4 (16 Subtries)

Using August 12 mainnet checkpoint file:

-37GB peak RAM (top command), -23GB RAM (go bench B/op)
-19.6 million (-50%) allocs/op in serialization phase
-2.7 minutes duration

Before:    625746 ms    88320868048 B/op    39291999 allocs/op
After:     461937 ms    64978613264 B/op    19671410 allocs/op

Root is at Level 0.
Benchmark used Go 1.18.5 on benchnet-dev-004.
No benchstat comparisons yet (n=5+) due to duration and memory required.

Tests

This PR passed unit tests and round-trip tests before it was merged to master on August 23, 2022:

On Sunday, August 21, 2022, I confirmed it passed round-trip tests using a 150GB checkpoint file (August 12 checkpoint file from mainnet). The final 150GB output exactly matched expected results (b2sum of 150GB files matched).
On Wednesday, August 31, 2022 another person mentioned in standup meeting a different test (comparing file size) also produced expected results.

NOTE: As of Sept 13, 2022 this PR has not been merged to mainnet.

EDIT: Added more details after reading PR review questions.
Clarified root is at level 0 and we're using level 4 (16 subtries).
Mentioned tests, including round-trip tests on Aug 21 that passed before merging PR to master on Aug 23.
Mention issue #3075 to replace "issue will be opened" about adding parallelism made easier by this PR.
Make it more clear this PR is not deployed yet to mainnet.

NodeIterator is modified to receive *node.Node instead of *trie.MTrie.

Replace very large Go map holding all unique nodes with smaller map of each subtrie to: - reduce operational memory by 37GB - reduce allocs by 19.6 million (50% of serialization allocs) - reduce duration by 2.7 minutes

m4ksio · 2022-08-22T18:29:33Z

ledger/complete/mtrie/flattener/iterator.go

@@ -115,7 +114,7 @@ func (i *NodeIterator) Next() bool {
 		// initial call to Next() for a non-empty trie
 		i.dig(i.unprocessedRoot)
 		i.unprocessedRoot = nil
-		return true
+		return len(i.stack) > 0


Why did we change this? It seems like an important change, but I don't see why we would

This is a bug fix for a problem that didn't surface yet because of the way node iterator was used.

The bug is unique node iterator's Next() returns true when i.unprocessedRoot is visited already and i.stack is empty.

This bug doesn't happen when we iterate nodes of an entire trie (root nodes are always unique). In this PR, we iterate nodes of subtries and subtries can be shared and visited already. So instead of always returning true assuming there's at least one unique node when digging i.unprocessedRoot, we only return true when there are unique nodes in the internal stack after calling dig(i.unprocessedRoot).

Thanks for the explanation! It does make sense.
Would it be possible to maybe add a test catching this particular bug, and showing how the fix helps?

Would it be possible to maybe add a test catching this particular bug, and showing how the fix helps?

Test for this bug is already in iterator_test.go#L269-L396.

The test iterates 3 left substries and 3 right subtries (some subtries are shared). The test verifies that:

order of iterated nodes is descendents first

shared subtries/nodes are not iterated twice

non-nil node is returned (meaning as long as Next() returns true Value() returns a non-nil node)

m4ksio · 2022-08-22T18:33:58Z

ledger/complete/mtrie/flattener/iterator_test.go

@@ -30,7 +30,7 @@ func TestPopulatedTrie(t *testing.T) {
 	emptyTrie := trie.NewEmptyMTrie()

 	// key: 0000...
-	p1 := utils.PathByUint8(1)
+	p1 := utils.PathByUint8(0)


Its the only place we change it in a test - is this value irrelevant or internal working has changed somehow?

is this value irrelevant or internal working has changed somehow?

Yes, this value is irrelevant and internal working hasn't changed.

The intent is to use p1 as a left leaf node and p2 as a right leaf node of the same parent.

Given p2 path is 0100 0000 created using utils.PathByUint8(64), p1's path can be either of those two paths:

0000 0000 created using utils.PathByUint8(0)

0000 0001 created using utils.PathByUint8(1)

I changed the p1 path to utils.PathByUint8(0) to be consistent with its comment // key: 0000..., which doesn't change the intention of the test.

m4ksio · 2022-08-22T19:30:41Z

Looks good overall and really smart idea!
But one thing I don't get, or maybe my understanding of this changes isn't right - where does such large amount of RAM savings comes from?
This significantly reduces size of allNodes, but all nodes are still serialized. Since allNodes is just a map from pointer to uint64, memory reduction should be rather small in this case

ramtinms · 2022-08-22T19:52:13Z

ledger/complete/mtrie/flattener/iterator.go

@@ -76,13 +75,13 @@ type NodeIterator struct {
 // as for each node, the children have been previously encountered.
 // NodeIterator created by NewNodeIterator is safe for concurrent use
 // because visitedNodes is always nil in this case.
-func NewNodeIterator(mTrie *trie.MTrie) *NodeIterator {
+func NewNodeIterator(n *node.Node) *NodeIterator {


thanks for fixing this node iterator to be a proper node iterator.

fxamacker · 2022-08-22T21:16:05Z

Looks good overall and really smart idea! But one thing I don't get, or maybe my understanding of this changes isn't right - where does such large amount of RAM savings comes from? This significantly reduces size of allNodes, but all nodes are still serialized. Since allNodes is just a map from pointer to uint64, memory reduction should be rather small in this case

@m4ksio yeah, that was my thought initially too but other aspects like memory savings from preallocations are huge (for very large maps).

Preallocation saves a lot of memory even with same map size of 1 million elements:

BenchmarkMap1000000-4               	      91114871 ns/op	 7299274 B/op	    3198 allocs/op
BenchmarkPreallocatedMap1000000-4   	      81377072 ns/op	 2874023 B/op	       1 allocs/op

…memory

ramtinms

Looks good to me.

…memory

codecov-commenter · 2022-08-23T18:18:35Z

Codecov Report

Merging #3050 (20f3022) into master (395422b) will increase coverage by 0.04%.
The diff coverage is 87.27%.

@@            Coverage Diff             @@
##           master    #3050      +/-   ##
==========================================
+ Coverage   54.43%   54.47%   +0.04%     
==========================================
  Files         722      722              
  Lines       66839    66910      +71     
==========================================
+ Hits        36383    36449      +66     
- Misses      27401    27405       +4     
- Partials     3055     3056       +1

Flag	Coverage Δ
unittests	`54.47% <87.27%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
ledger/complete/wal/checkpointer.go	`63.20% <86.66%> (+3.33%)`	⬆️
ledger/complete/mtrie/flattener/iterator.go	`100.00% <100.00%> (ø)`
insecure/corruptible/network.go	`57.04% <0.00%> (-0.71%)`	⬇️
admin/command_runner.go	`79.88% <0.00%> (ø)`
fvm/handler/contract.go	`88.59% <0.00%> (ø)`
engine/collection/synchronization/engine.go	`68.97% <0.00%> (ø)`
module/mempool/epochs/transactions.go	`100.00% <0.00%> (+9.67%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

…memory

fxamacker · 2022-08-31T19:14:59Z

Updated text to mention testing because another test for this PR was conducted and mentioned today (August 31, 2022).

fxamacker added 2 commits August 22, 2022 10:21

Modify NodeIterator to enable subtrie iteration

3be56c5

NodeIterator is modified to receive *node.Node instead of *trie.MTrie.

Use subtries to optimize checkpoint serialization

9bc441c

Replace very large Go map holding all unique nodes with smaller map of each subtrie to: - reduce operational memory by 37GB - reduce allocs by 19.6 million (50% of serialization allocs) - reduce duration by 2.7 minutes

fxamacker added Performance Execution Cadence Execution Team labels Aug 22, 2022

fxamacker self-assigned this Aug 22, 2022

fxamacker requested review from ramtinms, m4ksio and AlexHentschel as code owners August 22, 2022 16:36

fxamacker requested a review from zhangchiqing August 22, 2022 17:27

m4ksio approved these changes Aug 22, 2022

View reviewed changes

ramtinms reviewed Aug 22, 2022

View reviewed changes

Merge branch 'master' into fxamacker/reduce-checkpoint-serialization-…

28331ac

…memory

fxamacker mentioned this pull request Aug 23, 2022

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #2964

Closed

ramtinms approved these changes Aug 23, 2022

View reviewed changes

fxamacker added 2 commits August 23, 2022 12:40

Merge branch 'master' into fxamacker/reduce-checkpoint-serialization-…

b88da42

…memory

Merge branch 'master' into fxamacker/reduce-checkpoint-serialization-…

14b7303

…memory

fxamacker added 2 commits August 23, 2022 14:03

Merge branch 'master' into fxamacker/reduce-checkpoint-serialization-…

d136a3e

…memory

Merge branch 'master' into fxamacker/reduce-checkpoint-serialization-…

20f3022

…memory

fxamacker merged commit 78a3caf into master Aug 23, 2022

fxamacker deleted the fxamacker/reduce-checkpoint-serialization-memory branch August 23, 2022 20:39

fxamacker mentioned this pull request Sep 13, 2022

[EN Performance] Determine if checkpoint file should be split up and processed in parallel to resolve several issues #3075

Closed

fxamacker mentioned this pull request Oct 5, 2022

Backport PR 3050 to v0.27 to reduce memory and speedup checkpoint by splitting trie into subtries #3339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #3050

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #3050

fxamacker commented Aug 22, 2022 •

edited

Loading

m4ksio Aug 22, 2022

fxamacker Aug 22, 2022

m4ksio Aug 23, 2022

fxamacker Aug 23, 2022

m4ksio Aug 22, 2022

fxamacker Aug 22, 2022

m4ksio commented Aug 22, 2022

ramtinms Aug 22, 2022

fxamacker commented Aug 22, 2022 •

edited

Loading

ramtinms left a comment

codecov-commenter commented Aug 23, 2022 •

edited

Loading

fxamacker commented Aug 31, 2022

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #3050

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #3050

Conversation

fxamacker commented Aug 22, 2022 • edited Loading

Preliminary Results Using Level 4 (16 Subtries)

Tests

m4ksio Aug 22, 2022

Choose a reason for hiding this comment

fxamacker Aug 22, 2022

Choose a reason for hiding this comment

m4ksio Aug 23, 2022

Choose a reason for hiding this comment

fxamacker Aug 23, 2022

Choose a reason for hiding this comment

m4ksio Aug 22, 2022

Choose a reason for hiding this comment

fxamacker Aug 22, 2022

Choose a reason for hiding this comment

m4ksio commented Aug 22, 2022

ramtinms Aug 22, 2022

Choose a reason for hiding this comment

fxamacker commented Aug 22, 2022 • edited Loading

ramtinms left a comment

Choose a reason for hiding this comment

codecov-commenter commented Aug 23, 2022 • edited Loading

Codecov Report

fxamacker commented Aug 31, 2022

fxamacker commented Aug 22, 2022 •

edited

Loading

fxamacker commented Aug 22, 2022 •

edited

Loading

codecov-commenter commented Aug 23, 2022 •

edited

Loading