-
-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research reducing memory footprint #2885
Comments
A small analysis of state sizes. I'm using the performance states which are maxed out states with 250_000 validators
Source code: async function analyzeStateMemory(): Promise<void> {
await init("blst-native");
const tracker = new MemoryTracker();
tracker.logDiff("start");
const pubkeys = getPubkeys().pubkeys;
tracker.logDiff("getPubkeys()");
const defaultState = ssz.phase0.BeaconState.defaultValue();
tracker.logDiff(".defaultValue()");
const state = buildPerformanceStateAllForks(defaultState, pubkeys);
tracker.logDiff("build raw state");
addPendingAttestations(state as phase0.BeaconState);
tracker.logDiff("addPendingAtt");
const stateTB = ssz.phase0.BeaconState.createTreeBackedFromStruct(state as phase0.BeaconState);
tracker.logDiff("toTreeBacked");
const cached = allForks.createCachedBeaconState(config, stateTB);
tracker.logDiff("CachedBeaconState");
}
class MemoryTracker {
prev = process.memoryUsage();
logDiff(id: string): void {
const curr = process.memoryUsage();
const parts: string[] = [];
for (const key of Object.keys(this.prev) as (keyof NodeJS.MemoryUsage)[]) {
const prevVal = this.prev[key];
const currVal = curr[key];
const bytesDiff = currVal - prevVal;
const sign = bytesDiff < 0 ? "-" : bytesDiff > 0 ? "+" : " ";
parts.push(`${key} ${sign}${formatBytes(Math.abs(bytesDiff)).padEnd(15)}`);
}
this.prev = curr;
console.log(id.padEnd(20), parts.join(" "));
}
} Originally posted in #2846 (comment) |
@protolambda very kindly tested memory usage of tree backed structures in Go and Python implementations, thanks! 🙏
Memory usage of deserialized state, after running hash-tree-root (i.e. filled cached hashes), in Go
And just the validators registry (tooling: https://github.com/fjl/memsize) I just checked the python version, that one is 672 MB;
(tooling https://gist.github.com/protolambda/4a918e48f835cd08e1c5a562ab730cfe) Found a way to get rid of those empty 32% class dicts by forcing remerkleable Node parent classes to have no class-dicts with empty slots = () annotations. Python total size is down to 376.7 MiB now, cutting 259.6 MiB It also cut down the sizes of the other types, now it's:
Conclusion: Current SSZ representation of tree is very heavy compared with low level langs, and significantly more heavy that python. Look for similar tricks or different strategies to represent those data structures that may be more memory efficient. |
Currently, to store the hash information in a persistent-merkle-tree Node, it takes 208 bytes this is confirmed by v8 developer here https://stackoverflow.com/questions/45803829/memory-overhead-of-typed-arrays-vs-strings . He confirmed that I wonder if it's possible to store ArrayBuffer as our hash instead of Uint8Array? This would reduce from 208 bytes to 72 bytes. |
This script captures the approach memory footprint of different Javascript objects experimentally https://gist.github.com/dapplion/94dff8bbf92d45a75c10181e1a95100f all numbers below refer to the resident set size increase.
Other options that result as bad as Uint8Array: DataView, ArrayBuffer, native bindings, strings. I investigated using a big Uint8Array and manually managing memory, but seems like it's a worse option than using BigInts.
So seems that bigint could be useful to store hashes About overhead of bigint <-> buffer conversions (in my laptop)
But there exist a library from the same pp https://github.com/no2chem/bigint-hash that returns a bigint and is faster
That would be 2*0.6us (bigint conversions) + 0.4us (concat) + 1.3us, vs 2.2us currently. +30% slower Having a custom wrapper at the bindings level that took two bigints, and does the conversions and concating efficiently while hashing it could be a big win |
I did some analysis on the Prater state at slot 936600 to understand the impact of size of each field / type. The sizes are approximate and may not account for offsets
For validators the Uint8Array inefficiency accounts for |
After #3046
Adding some granularity to CachedBeaconState
|
After redoing the tests in my local env more times and in @tuyennhv 's the results are different:
|
Tracking issue for various research efforts into reducing Lodestar beacon node memory footprint.
The text was updated successfully, but these errors were encountered: