Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research reducing memory footprint #2885

Closed
dapplion opened this issue Jul 26, 2021 · 7 comments · Fixed by #3046
Closed

Research reducing memory footprint #2885

dapplion opened this issue Jul 26, 2021 · 7 comments · Fixed by #3046
Labels
scope-memory Issues to reduce and improve memory usage.

Comments

@dapplion
Copy link
Contributor

Tracking issue for various research efforts into reducing Lodestar beacon node memory footprint.

@dapplion dapplion added the scope-memory Issues to reduce and improve memory usage. label Jul 26, 2021
@dapplion
Copy link
Contributor Author

A small analysis of state sizes. I'm using the performance states which are maxed out states with 250_000 validators

start                rss  0 B             heapTotal  0 B             heapUsed +4.8 KB          external +40 B            arrayBuffers  0 B            
getPubkeys()         rss +73.39 MB        heapTotal +38.91 MB        heapUsed +41.32 MB        external +11.87 MB        arrayBuffers +11.87 MB       
.defaultValue()      rss +11.83 MB        heapTotal +11.5 MB         heapUsed +12.61 MB        external -520.44 KB       arrayBuffers -520.44 KB      
build raw state      rss +100.47 MB       heapTotal +100.82 MB       heapUsed +100.43 MB       external  0 B             arrayBuffers  0 B            
addPendingAtt        rss +123.2 MB        heapTotal +128.25 MB       heapUsed +128.3 MB        external  0 B             arrayBuffers  0 B            
toTreeBacked         rss +684.8 MB        heapTotal +675.74 MB       heapUsed +675.85 MB       external -13.3 KB         arrayBuffers -12.69 KB       
CachedBeaconState    rss +624.05 MB       heapTotal +537.52 MB       heapUsed +529.14 MB       external +7.65 MB         arrayBuffers +7.65 MB

Source code:

async function analyzeStateMemory(): Promise<void> {
  await init("blst-native");

  const tracker = new MemoryTracker();
  tracker.logDiff("start");

  const pubkeys = getPubkeys().pubkeys;
  tracker.logDiff("getPubkeys()");

  const defaultState = ssz.phase0.BeaconState.defaultValue();
  tracker.logDiff(".defaultValue()");

  const state = buildPerformanceStateAllForks(defaultState, pubkeys);
  tracker.logDiff("build raw state");

  addPendingAttestations(state as phase0.BeaconState);
  tracker.logDiff("addPendingAtt");

  const stateTB = ssz.phase0.BeaconState.createTreeBackedFromStruct(state as phase0.BeaconState);
  tracker.logDiff("toTreeBacked");

  const cached = allForks.createCachedBeaconState(config, stateTB);
  tracker.logDiff("CachedBeaconState");
}

class MemoryTracker {
  prev = process.memoryUsage();

  logDiff(id: string): void {
    const curr = process.memoryUsage();
    const parts: string[] = [];
    for (const key of Object.keys(this.prev) as (keyof NodeJS.MemoryUsage)[]) {
      const prevVal = this.prev[key];
      const currVal = curr[key];
      const bytesDiff = currVal - prevVal;
      const sign = bytesDiff < 0 ? "-" : bytesDiff > 0 ? "+" : " ";
      parts.push(`${key} ${sign}${formatBytes(Math.abs(bytesDiff)).padEnd(15)}`);
    }
    this.prev = curr;
    console.log(id.padEnd(20), parts.join(" "));
  }
}

Originally posted in #2846 (comment)

@dapplion
Copy link
Contributor Author

dapplion commented Jul 26, 2021

@protolambda very kindly tested memory usage of tree backed structures in Go and Python implementations, thanks! 🙏


ALL                        4448778  207.526 MB
tree.PairNode              2351329  143.514 MB
tree.Root                  2097425   64.008 MB
view.ContainerTypeDef            9    3.460 KB
view.ComplexListTypeDef          5       281 B
view.ComplexVectorTypeDef        3       171 B
view.BasicVectorTypeDef          2       128 B
view.BitListTypeDef              2        80 B
view.BasicListTypeDef            1        64 B
view.ContainerView               1        56 B
view.BitVectorTypeDef            1        40 B

Memory usage of deserialized state, after running hash-tree-root (i.e. filled cached hashes), in Go

ALL                            4250067  198.368 MB
tree.PairNode                  2250030  137.331 MB
tree.Root                      2000032   61.036 MB
view.ContainerTypeDef                1       519 B
view.BasicVectorTypeDef              1        64 B
view.ComplexListTypeDef              1        56 B
view.ComplexListView                 1        56 B
phase0.ValidatorsRegistryView        1         8 B

And just the validators registry

(tooling: https://github.com/fjl/memsize)

I just checked the python version, that one is 672 MB;

asizeof((BeaconState(Container)
    genesis_ti....c3f685007d2842f01615a0768870961ccc17,), limit=1000, stats=3.0) ...
 672767672 bytes or 641.6 MiB
         8 byte aligned
         8 byte sizeof(void*)
         1 object given
   9405514 objects sized
  21161972 objects seen
        54 deepest recursion

         7 profiles:  total (% of grand total), average, and largest flat size:  largest object
   2608206 class dict objects:  271253424 or 258.7 MiB (40%), 104, 104:  {'_backing': H(H(H(H(H(0xd85b295f00000....4c4bc15ff4cd105ab33c)), '_hook': None} alen 0
   2351329 class remerkleable.tree.PairNode objects:  169295688 or 161.5 MiB (25%), 72, 72:  H(H(H(H(H(0xd85b295f000000000000000000....a353aaa542ed63e44c4bc15ff4cd105ab33c))
   2351330 class remerkleable.tree.RootNode objects:  131674480 or 125.6 MiB (20%), 56, 56:  0xd85b295f00000000000000000000000000000000000000000000000000000000
   2094642 class asizeof._Slots objects:  100543896 or 95.9 MiB (15%), 48, 72:  ('left', 'right', '_root') alen 2
   
   (after init the hash-tree-root in python)
   
   2351329 class bytes objects:  169295688 or 161.5 MiB
  • Pair node: 143 MB (Go) 161 MiB=168 MB (Python): not significant really
  • Root node: 64 MB (Go) 125 MiB = 135 MB (python): double somehow, trying to figure that out
  • class asizeof._Slots objects:: python takes space to store attributes in slots per object. Will look into optimizing this, but 15% of total is not that bad
  • class dict objects: will look into difference between view and backing, it might just accidentally scanned the view attributes recursively, even though not using extra memory

(tooling https://gist.github.com/protolambda/4a918e48f835cd08e1c5a562ab730cfe)


Found a way to get rid of those empty 32% class dicts by forcing remerkleable Node parent classes to have no class-dicts with empty slots = () annotations. Python total size is down to 376.7 MiB now, cutting 259.6 MiB

It also cut down the sizes of the other types, now it's:

   2351329 class bytes objects:  169295688 or 161.5 MiB (43%), 72, 72:  b'_\xdb\xc3\xd3-US\xd0h\xd4\x99\x138\x....\x90\x88yu\x92(K\xe2\x15\xf2\xe4"\xf8' alen 32
   2351329 class remerkleable.tree.PairNode objects:  131674424 or 125.6 MiB (33%), 56, 56:  H(H(H(H(H(0xd85b295f000000000000000000....a353aaa542ed63e44c4bc15ff4cd105ab33c))
   2351330 class remerkleable.tree.RootNode objects:  94053200 or 89.7 MiB (24%), 40, 40:  0xd85b295f00000000000000000000000000000000000000000000000000000000
       127 class asizeof._Slots objects:  6888 or 6.7 KiB (0%), 54, 72:  ('left', 'right', '_root') alen 2

Conclusion: Current SSZ representation of tree is very heavy compared with low level langs, and significantly more heavy that python. Look for similar tricks or different strategies to represent those data structures that may be more memory efficient.

@twoeths
Copy link
Contributor

twoeths commented Jul 26, 2021

Currently, to store the hash information in a persistent-merkle-tree Node, it takes 208 bytes
Screen Shot 2021-07-26 at 17 30 43

this is confirmed by v8 developer here https://stackoverflow.com/questions/45803829/memory-overhead-of-typed-arrays-vs-strings . He confirmed that however having a large number of small TypedArrays is not memory efficient, in our case it's Uint8Array

I wonder if it's possible to store ArrayBuffer as our hash instead of Uint8Array? This would reduce from 208 bytes to 72 bytes.

@dapplion
Copy link
Contributor Author

This script captures the approach memory footprint of different Javascript objects experimentally https://gist.github.com/dapplion/94dff8bbf92d45a75c10181e1a95100f all numbers below refer to the resident set size increase.

  • Uint8Array with 32 bytes: 223 bytes / instance
  • FixedObject (from proto): 105 bytes / instance
  • native bigint with 32 bytes: 72 bytes / instance

Other options that result as bad as Uint8Array: DataView, ArrayBuffer, native bindings, strings.

I investigated using a big Uint8Array and manually managing memory, but seems like it's a worse option than using BigInts.

  • Adding items to a FinalizationRegistry has significant overhead, ~130 bytes
  • Would need to create a new object for each 32 bytes area of use, so it can be garbage collected and notify the FinalizationRegistry. An object with one property takes ~50 bytes

So seems that bigint could be useful to store hashes


About overhead of bigint <-> buffer conversions (in my laptop)
With native binding from https://github.com/no2chem/bigint-buffer

  • bigint -> buffer: 0.6us/op
  • buffer -> bigint: 0.6us/op
  • hash 2 x 32 bytes with our WASM impl (includes concat): 2.2us/op

But there exist a library from the same pp https://github.com/no2chem/bigint-hash that returns a bigint and is faster

  • hash 2 x 32 bytes with bigint-hash (pre-concated): 1.3us/op

That would be 2*0.6us (bigint conversions) + 0.4us (concat) + 1.3us, vs 2.2us currently. +30% slower

Having a custom wrapper at the bindings level that took two bigints, and does the conversions and concating efficiently while hashing it could be a big win

@dapplion
Copy link
Contributor Author

dapplion commented Jul 31, 2021

I did some analysis on the Prater state at slot 936600 to understand the impact of size of each field / type. The sizes are approximate and may not account for offsets

field length ser size per item serialized size tree size per item tree leaf size
blockRoots 8192 32 262144 32 262144
stateRoots 8192 32 262144 32 262144
historicalRoots 114 23 2622 32 3648
eth1DataVotes 611 72 43992 96 58656
validators 215612 121 26089052 320 68995840
balances 215612 8 1724896 8 1724896
randaoMixes 65536 32 2097152 32 2097152
slashings 8192 8 65536 8 65536
previousEpochAttestations 3431 165 566115 400 1372400
currentEpochAttestations 2494 165 411510 400 997600
Total MB     31.525163   75.840016

Screenshot from 2021-07-31 18-56-35


For validators the Uint8Array inefficiency accounts for 6 * 68995840 = 413 MB, where the 6 factor comes from 220/32-1 = 6. So it's worth it to explore alternative ways of representing the leafs of the validators tree.

@dapplion
Copy link
Contributor Author

dapplion commented Sep 2, 2021

After #3046

getPubkeys()         rss +96.79 MB        heapTotal +33.91 MB        heapUsed +40.16 MB        external +11.44 MB        arrayBuffers +11.44 MB       
.defaultValue()      rss +17.75 MB        heapTotal +17.25 MB        heapUsed +17.07 MB        external  0 B             arrayBuffers  0 B            
build raw state      rss +91.24 MB        heapTotal +93.82 MB        heapUsed +94.21 MB        external  0 B             arrayBuffers  0 B            
addPendingAtt        rss +127.2 MB        heapTotal +134.5 MB        heapUsed +127.45 MB       external  0 B             arrayBuffers  0 B            
toTreeBacked         rss +111.71 MB       heapTotal +98.75 MB        heapUsed +106.14 MB       external  0 B             arrayBuffers  0 B            
CachedBeaconState    rss +493.89 MB       heapTotal +433.62 MB       heapUsed +430.64 MB       external  0 B             arrayBuffers  0 B 

Adding some granularity to CachedBeaconState

pubkey2index              rss +75 MB
index2pubkey              rss +385 MB
effectiveBalances cache   rss +15 MB
shufflings                rss +11.3 MB
rest of epochCtx          rss +0 MB
participation cache       rss +7.11 MB

@dapplion
Copy link
Contributor Author

dapplion commented Sep 3, 2021

After redoing the tests in my local env more times and in @tuyennhv 's the results are different:

pubkey2index + index2pubkey    rss +90.32 MB        heapTotal +61.23 MB        heapUsed +59.41 MB        external  0 B             arrayBuffers  0 B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope-memory Issues to reduce and improve memory usage.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants