[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #2964

fxamacker · 2022-08-12T14:32:01Z

Problem

Although PR #2792 reduces peak memory used by checkpointing by reusing ledger state, we can further reduce peak memory used by over 35GB during checkpoint serialization.

Updates #1744

Proposed Solution

Replace largest data structure used for checkpoint serialization and process subtries instead of entire trie. Also use preallocations when feasible.

Optionally, allow a flag to specify the number of levels to use. Specifying 4 levels will use 16 subtries, which is a reasonable default for impactful memory savings and faster serialization.

Serializing data in parallel is made easier by this proposed change, but that is outside the scope of this issue.

Preliminary Results Using Levels=4 (16 Subtries)

Using August 12 mainnet checkpoint file with Go 1.18.5:

-37GB peak RAM (top command), -23GB RAM (go bench B/op)
-19.6 million (-50%) allocs/op in serialization phase
-2.7 minutes duration

Before:    625746 ms    88320868048 B/op    39291999 allocs/op
After:     461937 ms    64978613264 B/op    19671410 allocs/op

No benchstat comparisons yet (n=5+) due to duration and memory (requires the big benchnet-dev-004 server).

EDIT: added more details after reading PR #3050 review comments.

The text was updated successfully, but these errors were encountered:

fxamacker added Performance Execution Cadence Execution Team labels Aug 12, 2022

fxamacker self-assigned this Aug 12, 2022

fxamacker changed the title ~~[EN Performance] Further reduce peak memory used by checkpointing~~ [EN Performance] Reduce peak memory used by checkpointing by about 20-30GB Aug 15, 2022

fxamacker changed the title ~~[EN Performance] Reduce peak memory used by checkpointing by about 20-30GB~~ [EN Performance] Optimize checkpointing for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) Aug 22, 2022

fxamacker mentioned this issue Aug 22, 2022

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #3050

Merged

fxamacker closed this as completed in #3050 Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #2964

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #2964

fxamacker commented Aug 12, 2022 •

edited

Loading

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #2964

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs) #2964

Comments

fxamacker commented Aug 12, 2022 • edited Loading

Problem

Proposed Solution

Preliminary Results Using Levels=4 (16 Subtries)

fxamacker commented Aug 12, 2022 •

edited

Loading