Research reducing memory footprint #2885

dapplion · 2021-07-26T10:35:54Z

Tracking issue for various research efforts into reducing Lodestar beacon node memory footprint.

dapplion · 2021-07-26T10:36:15Z

A small analysis of state sizes. I'm using the performance states which are maxed out states with 250_000 validators

start                rss  0 B             heapTotal  0 B             heapUsed +4.8 KB          external +40 B            arrayBuffers  0 B            
getPubkeys()         rss +73.39 MB        heapTotal +38.91 MB        heapUsed +41.32 MB        external +11.87 MB        arrayBuffers +11.87 MB       
.defaultValue()      rss +11.83 MB        heapTotal +11.5 MB         heapUsed +12.61 MB        external -520.44 KB       arrayBuffers -520.44 KB      
build raw state      rss +100.47 MB       heapTotal +100.82 MB       heapUsed +100.43 MB       external  0 B             arrayBuffers  0 B            
addPendingAtt        rss +123.2 MB        heapTotal +128.25 MB       heapUsed +128.3 MB        external  0 B             arrayBuffers  0 B            
toTreeBacked         rss +684.8 MB        heapTotal +675.74 MB       heapUsed +675.85 MB       external -13.3 KB         arrayBuffers -12.69 KB       
CachedBeaconState    rss +624.05 MB       heapTotal +537.52 MB       heapUsed +529.14 MB       external +7.65 MB         arrayBuffers +7.65 MB

Source code:

async function analyzeStateMemory(): Promise<void> {
  await init("blst-native");

  const tracker = new MemoryTracker();
  tracker.logDiff("start");

  const pubkeys = getPubkeys().pubkeys;
  tracker.logDiff("getPubkeys()");

  const defaultState = ssz.phase0.BeaconState.defaultValue();
  tracker.logDiff(".defaultValue()");

  const state = buildPerformanceStateAllForks(defaultState, pubkeys);
  tracker.logDiff("build raw state");

  addPendingAttestations(state as phase0.BeaconState);
  tracker.logDiff("addPendingAtt");

  const stateTB = ssz.phase0.BeaconState.createTreeBackedFromStruct(state as phase0.BeaconState);
  tracker.logDiff("toTreeBacked");

  const cached = allForks.createCachedBeaconState(config, stateTB);
  tracker.logDiff("CachedBeaconState");
}

class MemoryTracker {
  prev = process.memoryUsage();

  logDiff(id: string): void {
    const curr = process.memoryUsage();
    const parts: string[] = [];
    for (const key of Object.keys(this.prev) as (keyof NodeJS.MemoryUsage)[]) {
      const prevVal = this.prev[key];
      const currVal = curr[key];
      const bytesDiff = currVal - prevVal;
      const sign = bytesDiff < 0 ? "-" : bytesDiff > 0 ? "+" : " ";
      parts.push(`${key} ${sign}${formatBytes(Math.abs(bytesDiff)).padEnd(15)}`);
    }
    this.prev = curr;
    console.log(id.padEnd(20), parts.join(" "));
  }
}

Originally posted in #2846 (comment)

dapplion · 2021-07-26T10:38:35Z

@protolambda very kindly tested memory usage of tree backed structures in Go and Python implementations, thanks! 🙏

ALL                        4448778  207.526 MB
tree.PairNode              2351329  143.514 MB
tree.Root                  2097425   64.008 MB
view.ContainerTypeDef            9    3.460 KB
view.ComplexListTypeDef          5       281 B
view.ComplexVectorTypeDef        3       171 B
view.BasicVectorTypeDef          2       128 B
view.BitListTypeDef              2        80 B
view.BasicListTypeDef            1        64 B
view.ContainerView               1        56 B
view.BitVectorTypeDef            1        40 B

Memory usage of deserialized state, after running hash-tree-root (i.e. filled cached hashes), in Go

ALL                            4250067  198.368 MB
tree.PairNode                  2250030  137.331 MB
tree.Root                      2000032   61.036 MB
view.ContainerTypeDef                1       519 B
view.BasicVectorTypeDef              1        64 B
view.ComplexListTypeDef              1        56 B
view.ComplexListView                 1        56 B
phase0.ValidatorsRegistryView        1         8 B

And just the validators registry

(tooling: https://github.com/fjl/memsize)

I just checked the python version, that one is 672 MB;

asizeof((BeaconState(Container)
    genesis_ti....c3f685007d2842f01615a0768870961ccc17,), limit=1000, stats=3.0) ...
 672767672 bytes or 641.6 MiB
         8 byte aligned
         8 byte sizeof(void*)
         1 object given
   9405514 objects sized
  21161972 objects seen
        54 deepest recursion

         7 profiles:  total (% of grand total), average, and largest flat size:  largest object
   2608206 class dict objects:  271253424 or 258.7 MiB (40%), 104, 104:  {'_backing': H(H(H(H(H(0xd85b295f00000....4c4bc15ff4cd105ab33c)), '_hook': None} alen 0
   2351329 class remerkleable.tree.PairNode objects:  169295688 or 161.5 MiB (25%), 72, 72:  H(H(H(H(H(0xd85b295f000000000000000000....a353aaa542ed63e44c4bc15ff4cd105ab33c))
   2351330 class remerkleable.tree.RootNode objects:  131674480 or 125.6 MiB (20%), 56, 56:  0xd85b295f00000000000000000000000000000000000000000000000000000000
   2094642 class asizeof._Slots objects:  100543896 or 95.9 MiB (15%), 48, 72:  ('left', 'right', '_root') alen 2
   
   (after init the hash-tree-root in python)
   
   2351329 class bytes objects:  169295688 or 161.5 MiB

Pair node: 143 MB (Go) 161 MiB=168 MB (Python): not significant really
Root node: 64 MB (Go) 125 MiB = 135 MB (python): double somehow, trying to figure that out
class asizeof._Slots objects:: python takes space to store attributes in slots per object. Will look into optimizing this, but 15% of total is not that bad
class dict objects: will look into difference between view and backing, it might just accidentally scanned the view attributes recursively, even though not using extra memory

(tooling https://gist.github.com/protolambda/4a918e48f835cd08e1c5a562ab730cfe)

Found a way to get rid of those empty 32% class dicts by forcing remerkleable Node parent classes to have no class-dicts with empty slots = () annotations. Python total size is down to 376.7 MiB now, cutting 259.6 MiB

It also cut down the sizes of the other types, now it's:

   2351329 class bytes objects:  169295688 or 161.5 MiB (43%), 72, 72:  b'_\xdb\xc3\xd3-US\xd0h\xd4\x99\x138\x....\x90\x88yu\x92(K\xe2\x15\xf2\xe4"\xf8' alen 32
   2351329 class remerkleable.tree.PairNode objects:  131674424 or 125.6 MiB (33%), 56, 56:  H(H(H(H(H(0xd85b295f000000000000000000....a353aaa542ed63e44c4bc15ff4cd105ab33c))
   2351330 class remerkleable.tree.RootNode objects:  94053200 or 89.7 MiB (24%), 40, 40:  0xd85b295f00000000000000000000000000000000000000000000000000000000
       127 class asizeof._Slots objects:  6888 or 6.7 KiB (0%), 54, 72:  ('left', 'right', '_root') alen 2

Conclusion: Current SSZ representation of tree is very heavy compared with low level langs, and significantly more heavy that python. Look for similar tricks or different strategies to represent those data structures that may be more memory efficient.

twoeths · 2021-07-26T10:42:10Z

Currently, to store the hash information in a persistent-merkle-tree Node, it takes 208 bytes

this is confirmed by v8 developer here https://stackoverflow.com/questions/45803829/memory-overhead-of-typed-arrays-vs-strings . He confirmed that however having a large number of small TypedArrays is not memory efficient, in our case it's Uint8Array

I wonder if it's possible to store ArrayBuffer as our hash instead of Uint8Array? This would reduce from 208 bytes to 72 bytes.

dapplion · 2021-07-31T16:35:06Z

This script captures the approach memory footprint of different Javascript objects experimentally https://gist.github.com/dapplion/94dff8bbf92d45a75c10181e1a95100f all numbers below refer to the resident set size increase.

Uint8Array with 32 bytes: 223 bytes / instance
FixedObject (from proto): 105 bytes / instance
native bigint with 32 bytes: 72 bytes / instance

Other options that result as bad as Uint8Array: DataView, ArrayBuffer, native bindings, strings.

I investigated using a big Uint8Array and manually managing memory, but seems like it's a worse option than using BigInts.

Adding items to a FinalizationRegistry has significant overhead, ~130 bytes
Would need to create a new object for each 32 bytes area of use, so it can be garbage collected and notify the FinalizationRegistry. An object with one property takes ~50 bytes

So seems that bigint could be useful to store hashes

About overhead of bigint <-> buffer conversions (in my laptop)
With native binding from https://github.com/no2chem/bigint-buffer

bigint -> buffer: 0.6us/op
buffer -> bigint: 0.6us/op
hash 2 x 32 bytes with our WASM impl (includes concat): 2.2us/op

But there exist a library from the same pp https://github.com/no2chem/bigint-hash that returns a bigint and is faster

hash 2 x 32 bytes with bigint-hash (pre-concated): 1.3us/op

That would be 2*0.6us (bigint conversions) + 0.4us (concat) + 1.3us, vs 2.2us currently. +30% slower

Having a custom wrapper at the bindings level that took two bigints, and does the conversions and concating efficiently while hashing it could be a big win

dapplion · 2021-07-31T16:39:25Z

I did some analysis on the Prater state at slot 936600 to understand the impact of size of each field / type. The sizes are approximate and may not account for offsets

field	length	ser size per item	serialized size	tree size per item	tree leaf size
blockRoots	8192	32	262144	32	262144
stateRoots	8192	32	262144	32	262144
historicalRoots	114	23	2622	32	3648
eth1DataVotes	611	72	43992	96	58656
validators	215612	121	26089052	320	68995840
balances	215612	8	1724896	8	1724896
randaoMixes	65536	32	2097152	32	2097152
slashings	8192	8	65536	8	65536
previousEpochAttestations	3431	165	566115	400	1372400
currentEpochAttestations	2494	165	411510	400	997600
Total MB			31.525163		75.840016

For validators the Uint8Array inefficiency accounts for 6 * 68995840 = 413 MB, where the 6 factor comes from 220/32-1 = 6. So it's worth it to explore alternative ways of representing the leafs of the validators tree.

dapplion · 2021-09-02T19:02:30Z

After #3046

getPubkeys()         rss +96.79 MB        heapTotal +33.91 MB        heapUsed +40.16 MB        external +11.44 MB        arrayBuffers +11.44 MB       
.defaultValue()      rss +17.75 MB        heapTotal +17.25 MB        heapUsed +17.07 MB        external  0 B             arrayBuffers  0 B            
build raw state      rss +91.24 MB        heapTotal +93.82 MB        heapUsed +94.21 MB        external  0 B             arrayBuffers  0 B            
addPendingAtt        rss +127.2 MB        heapTotal +134.5 MB        heapUsed +127.45 MB       external  0 B             arrayBuffers  0 B            
toTreeBacked         rss +111.71 MB       heapTotal +98.75 MB        heapUsed +106.14 MB       external  0 B             arrayBuffers  0 B            
CachedBeaconState    rss +493.89 MB       heapTotal +433.62 MB       heapUsed +430.64 MB       external  0 B             arrayBuffers  0 B

Adding some granularity to CachedBeaconState

pubkey2index              rss +75 MB
index2pubkey              rss +385 MB
effectiveBalances cache   rss +15 MB
shufflings                rss +11.3 MB
rest of epochCtx          rss +0 MB
participation cache       rss +7.11 MB

dapplion · 2021-09-03T07:38:54Z

After redoing the tests in my local env more times and in @tuyennhv 's the results are different:

pubkey2index + index2pubkey    rss +90.32 MB        heapTotal +61.23 MB        heapUsed +59.41 MB        external  0 B             arrayBuffers  0 B

dapplion added the scope-memory Issues to reduce and improve memory usage. label Jul 26, 2021

wemeetagain mentioned this issue Jul 31, 2021

Add bigint-backed Node ChainSafe/persistent-merkle-tree#46

Closed

twoeths mentioned this issue Aug 2, 2021

Add digestObjects method ChainSafe/as-sha256#58

Merged

twoeths mentioned this issue Aug 12, 2021

JS heap blows up when running beacon node #2005

Closed

dapplion mentioned this issue Aug 31, 2021

Drop flat validators cache #3046

Merged

wemeetagain closed this as completed in #3046 Sep 2, 2021

wemeetagain mentioned this issue Mar 20, 2024

Investigating Tree Hashing in Lodestar ChainSafe/ssz#355

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research reducing memory footprint #2885

Research reducing memory footprint #2885

dapplion commented Jul 26, 2021

dapplion commented Jul 26, 2021

dapplion commented Jul 26, 2021 •

edited

Loading

twoeths commented Jul 26, 2021

dapplion commented Jul 31, 2021

dapplion commented Jul 31, 2021 •

edited

Loading

dapplion commented Sep 2, 2021

dapplion commented Sep 3, 2021

Research reducing memory footprint #2885

Research reducing memory footprint #2885

Comments

dapplion commented Jul 26, 2021

dapplion commented Jul 26, 2021

dapplion commented Jul 26, 2021 • edited Loading

twoeths commented Jul 26, 2021

dapplion commented Jul 31, 2021

dapplion commented Jul 31, 2021 • edited Loading

dapplion commented Sep 2, 2021

dapplion commented Sep 3, 2021

dapplion commented Jul 26, 2021 •

edited

Loading

dapplion commented Jul 31, 2021 •

edited

Loading