-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improved shard cache #7429
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything obviously wrong now! This is fiddly, so a second pair of eyes would help (and @mm-near has an outstanding NO review).
I see there are nayduck failures in the run. Are those fixed at the latest version of the code?
// Convert trie changes to database ops for trie nodes. | ||
// Create separate store update for deletions, because we want to update cache and don't want to remove nodes | ||
// from the store. | ||
let mut deletions_store_update = self.store().store_update(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another quite questionable bit of code, we should think about how to make this clearer in a follow-up
They have been failing for a while, so it is not caused by this PR. I raised the question here https://near.zulipchat.com/#narrow/stream/295558-pagoda.2Fcore/topic/chunk_management.20test.20failure/near/294949074 |
Co-authored-by: firatNEAR <firat@near.org> Co-authored-by: firatNEAR <102993450+firatNEAR@users.noreply.github.com>
Co-authored-by: firatNEAR <firat@near.org> Co-authored-by: firatNEAR <102993450+firatNEAR@users.noreply.github.com>
Co-authored-by: firatNEAR <firat@near.org> Co-authored-by: firatNEAR <102993450+firatNEAR@users.noreply.github.com>
Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in near#7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache.
* perf: avoid expensive lookup of storage metrics Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in #7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache. * apply reviewer suggestions - group metrics together in two new structs - update comments
* perf: avoid expensive lookup of storage metrics Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in #7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache. * apply reviewer suggestions - group metrics together in two new structs - update comments
* perf: avoid expensive lookup of storage metrics Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in #7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache. * apply reviewer suggestions - group metrics together in two new structs - update comments
* perf: avoid expensive lookup of storage metrics Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in #7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache. * apply reviewer suggestions - group metrics together in two new structs - update comments
Improve shard cache to use RAM more effectively.
Three changes are introduced:
If we put new value to LRU cache and total size of existing values exceeds
total_sizes_capacity
, we evict values from it until that is no longer the case. So the actual total size should never exceedtotal_size_limit
+TRIE_LIMIT_CACHED_VALUE_SIZE
.We add this because value sizes generally vary from 1 B to 500 B and we want to count cache size precisely. The current value size limit is 1000 B, so for average size of 100 B we use shard cache 10x more effectively.
When we save trie changes, we previously just applied insertions to the shard cache - which means that we added newly created nodes to it. Deletions were applied only during GC of the old block. Now we apply deletions and call
pop
for shard cache during saving trie changes of a new block as well.This helps to use shard cache space more effectively. Previously nodes from the old state could occupy a lot of space which led to eviction of nodes from the fresh state.
If shard cache
pop
is called, item is not deleted but put to thedeletions
queue withdeletions_queue_capacity
first. If popped item doesn't fit in the queue, the last item is removed from the queue and LRU cache, and newly popped item is inserted to the queue.It is needed to delay removals when we have forks. In simple case, two blocks may share a parent P. When we process the first block, we call
pop
for some nodes from P, but when we process the second block, we may need to read some nodes from P as well. Now we delay removal by 100_000, which helps to keep all nodes from 3 completely full last blocks.Next steps:
We want to get the whole update merged by next Wednesday, and cherry-pick it to 1.28 and 1.29 releases. This is not a protocol change, so it doesn't require a separate release or protocol version.
Testing
BoundedQueue
which keep the queue of trie deletionsTrieCache
which logic is less trivial now