-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Execution state] - Checkpointer and Flattener speed and memory improvements #1750
Comments
@ramtinms Currently, checkpoints contain clear text ledger payloads, which seems very inefficient to me both in terms of encoding/decoding speed and storage space usage to a lesser extent. Using a zstandard compression dictionary, for example, could both improve the speed (by having less data to read/write) of processing checkpoints and reduce their size. This could be done without breaking changes by still supporting the uncompressed format when reading old checkpoints, of course. WDYT? |
Payload compression can be further extended by custom zstd dictionary for each cadence type (I think limit there is 2^32 dictionaries) and uncompress on first access ( or better LRU cache ) |
Ideas for Improving Speed and Memory@ramtinms These are just some ideas that came up while getting familiar with mtrie code in flow-go. Thoughts? Use stream encodingThe current implementation builds the entire nodes in memory in Checkpoint file format would need to be modified to enable stream mode. For example, instead of encoding number of nodes in the header, we can encode a marker after all nodes are serialized. Same idea for stream encode Optimize
|
Ideas for Reducing Checkpoint File Size@ramtinms Maybe we can save over 500MB by not repeating version number in the checkpoint. Thoughts?
|
WDYM by that? AFAIK since currently the payloads are encoded in flat nodes, their size is not always the same, which means we cannot directly guess the position of a node in the checkpoint's encoded data unless we have already read the whole checkpoint. But maybe I misunderstood what you meant. Also I'm not sure if that's on the table or not but using extension nodes in some cases could also reduce the checkpoint size as it would reduce the amount of interim nodes that are needed. Last time we talked about it though @ramtinms argued that it wouldn't save much since entries in the trie have a path distribution that is random so there's very little overlap between paths which could be improved with extensions. When we try it on our end we still get some improvements though, even though they're indeed not as huge as initially expected (the initial test was skewed because it used the method to generate paths that is used in unit tests, which uses right side padding and results in a lot of savings by getting rid of most of the branches). |
Hi Brendan. Unless I'm mistaken, the interim nodes don't vary in size. What I meant is: instead of For example:
Thank you for the update on extension nodes. I haven't got a chance to take a closer look at it yet (I got sick in early Dec and am starting to feel better now). I'll take a look at your code this week. |
More Ideas for Reducing Checkpoint File SizeIf checkpointer encodes leaf nodes and interim nodes differently as suggested by Ramtin in "separation of node types" #1745, further reduction is possible.
Together with 510MB size reduction in previous comment, combined savings are around 3.4GB. |
Hi @Ullaakut,
I read some of the DPS trie code and I can see how extension nodes are beneficial in some cases. Some concerns about adding extension nodes to Flow's mtrie include:
Given these concerns, I don't know if there would be enough benefits from adding extension nodes to Flow's mtrie to offset the added complexity. |
currently the checkpoint and trie flattener reconstructs the nodes as the storable version which would duplicate the memory usage when checkpointing, by reusing the same structure the speed and memory usage of the checkpointer can be improved a lot.
The text was updated successfully, but these errors were encountered: