[Execution state] complete ledger - Separate storage of payloads #1746

ramtinms · 2021-12-09T04:28:54Z

Problem Definition

Currently, we do keep payloads as part of the leaf nodes and in memory, if we store them on a disk-based data store we can separate the index (trie) from the actual data. and that should reduce memory usage drastically. should also improve garbage collection.

This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.

Ullaakut · 2021-12-21T09:55:25Z

[...] if we store them on a disk-based data store [...]
This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.

Do you have suggestions on how to achieve this? It seems to me like the only way to effectively do this is to have both a persistent store on the filesystem and a cache in memory, but then it would not consistently have no impact on those operations. It would have no impact only in ideal scenarios where the cache (LRU would be best) contains all of the values that we need to read, and that those values never need to be retrieved from the filesystem. If we ever need to retrieve a value from the filesystem in order to satisfy a read call, how could it have no impact? We would have to block the read call until we successfully fetched the value from disk, which is probably an operation that is a few orders of magnitude more costly in performance terms than a read on memory is.

The most optimal way I've found is to have the LRU cache write on disk upon evicting values, and to regularly evict (and therefore, persist) its oldest entries, to hopefully never reach a full cache (which would mean blocking disk operations) and allow the disk writes to be done concurrently without interfering with new read/write operations, but that solution is limited by how much memory is devoted to the cache. The bigger the cache and the more often its oldest entries are purged, the better performance is to be expected, but even that does not solve the problem where a read call could come for any old key that is now on disk, and that would require a disk read which would inevitably be slow.

github-actions · 2024-10-15T02:03:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

fxamacker mentioned this issue Feb 7, 2022

Optimize MTrie checkpoint: 47x speedup (11.7 hours -> 15 mins), -431 GB alloc/op, -7.6 billion allocs/op, -6.9 GB file size #1944

Merged

9 tasks

j1010001 mentioned this issue Apr 18, 2023

[Execution State] Memory usage and performance optimisation #1744

Open

7 tasks

github-actions bot added the Stale Label used when marking an issue stale. label Oct 15, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Execution state] complete ledger - Separate storage of payloads #1746

[Execution state] complete ledger - Separate storage of payloads #1746

ramtinms commented Dec 9, 2021

Ullaakut commented Dec 21, 2021

github-actions bot commented Oct 15, 2024

[Execution state] complete ledger - Separate storage of payloads #1746

[Execution state] complete ledger - Separate storage of payloads #1746

Comments

ramtinms commented Dec 9, 2021

Problem Definition

Ullaakut commented Dec 21, 2021

github-actions bot commented Oct 15, 2024