You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we do keep payloads as part of the leaf nodes and in memory, if we store them on a disk-based data store we can separate the index (trie) from the actual data. and that should reduce memory usage drastically. should also improve garbage collection.
This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.
The text was updated successfully, but these errors were encountered:
[...] if we store them on a disk-based data store [...]
This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.
Do you have suggestions on how to achieve this? It seems to me like the only way to effectively do this is to have both a persistent store on the filesystem and a cache in memory, but then it would not consistently have no impact on those operations. It would have no impact only in ideal scenarios where the cache (LRU would be best) contains all of the values that we need to read, and that those values never need to be retrieved from the filesystem. If we ever need to retrieve a value from the filesystem in order to satisfy a read call, how could it have no impact? We would have to block the read call until we successfully fetched the value from disk, which is probably an operation that is a few orders of magnitude more costly in performance terms than a read on memory is.
The most optimal way I've found is to have the LRU cache write on disk upon evicting values, and to regularly evict (and therefore, persist) its oldest entries, to hopefully never reach a full cache (which would mean blocking disk operations) and allow the disk writes to be done concurrently without interfering with new read/write operations, but that solution is limited by how much memory is devoted to the cache. The bigger the cache and the more often its oldest entries are purged, the better performance is to be expected, but even that does not solve the problem where a read call could come for any old key that is now on disk, and that would require a disk read which would inevitably be slow.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Problem Definition
Currently, we do keep payloads as part of the leaf nodes and in memory, if we store them on a disk-based data store we can separate the index (trie) from the actual data. and that should reduce memory usage drastically. should also improve garbage collection.
This should be done in a way that doesn't impact the time spent on operations like read and update and be parallelizable as much as possible.
The text was updated successfully, but these errors were encountered: