-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort #10486
perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort #10486
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the edits are passed to the root store and saved in DB, so we can't remove sorting.
Reverted sort.Strings(keys) please take a look again.
Kindly pinging y'all for a review/merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, can you add a changelog entry?
We can shave off some milliseconds, but also cut down some Megabytes of RAM consumed by only requesting from the cache if needed, but also using the map clearing idiom which is recognized by the compiler to make fast code. Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710 - Before * Memory profiles ```shell 19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{}) ``` * CPU profiles ```go . . 118: // TODO: Consider allowing usage of Batch, which would allow the write to . . 119: // at least happen atomically. 150ms 150ms 120: for _, key := range keys { 220ms 3.64s 121: cacheValue := store.cache[key] . . 122: . . 123: switch { . 250ms 124: case store.isDeleted(key): . . 125: store.parent.Delete([]byte(key)) 210ms 210ms 126: case cacheValue.value == nil: . . 127: // Skip, it already doesn't exist in parent. . . 128: default: 240ms 27.94s 129: store.parent.Set([]byte(key), cacheValue.value) . . 130: } . . 131: } ... 10ms 60ms 134: store.cache = make(map[string]*cValue) . 40ms 135: store.deleted = make(map[string]struct{}) . 50ms 136: store.unsortedCache = make(map[string]struct{}) . 110ms 137: store.sortedCache = dbm.NewMemDB() ``` - After * Memory profiles ```shell . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . . 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . . 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . . 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } ``` * CPU profiles ```shell . . 111: // TODO: Consider allowing usage of Batch, which would allow the write to . . 112: // at least happen atomically. 110ms 110ms 113: for _, key := range keys { . 210ms 114: if store.isDeleted(key) { . . 115: // We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot . . 116: // be sure if the underlying store might do a save with the byteslice or . . 117: // not. Once we get confirmation that .Delete is guaranteed not to . . 118: // save the byteslice, then we can assume only a read-only copy is sufficient. . . 119: store.parent.Delete([]byte(key)) . . 120: continue . . 121: } . . 122: 50ms 2.45s 123: cacheValue := store.cache[key] 910ms 920ms 124: if cacheValue.value != nil { . . 125: // It already exists in the parent, hence delete it. 120ms 29.56s 126: store.parent.Set([]byte(key), cacheValue.value) . . 127: } . . 128: } . . 129: . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . 210ms 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . 10ms 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . 170ms 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } . 260ms 142: store.sortedCache = dbm.NewMemDB() . 10ms 143:} ``` Fixes #10487 Updates evmos/ethermint#710
Done, thank you @fedekunze! |
// Clear the cache using the map clearing idiom | ||
// and not allocating fresh objects. | ||
// Please see https://bencher.orijtech.com/perfclinic/mapclearing/ | ||
for key := range store.cache { | ||
delete(store.cache, key) | ||
} | ||
for key := range store.deleted { | ||
delete(store.deleted, key) | ||
} | ||
for key := range store.unsortedCache { | ||
delete(store.unsortedCache, key) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect in the general case. I've had to actually undo this in the CacheKV store myself in the Osmosis branch in a couple places.
This preserves the old capacity of the map, which makes iterating over map entries / sorting / etc. take much longer. (hashmap iteration time is proportional to its capacity, not its number of entries)
So in the Osmosis case, or a chain upgrading case, if you have one very large blcok, then many short blocks, every short block will take similar store time for iteration as the long block does.
What we did in the Osmosis branch was only use a map clearing idiom if the map is sufficiently small, otherwise we just clear it.
(This made a difference between N^2 behavior and O(N) behavior in our codebase, due to iterator things. Potentially less crucial here, but I'd still caution against this as a premature optimization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://go-review.googlesource.com/c/go/+/110055/, the struct reinitialized and memory is cleared. So the optimization is that we save on heap memory allocation. However, if one TX will consume looooot of memory then we may have a problem (eg: huge iterator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odeke-em can you help answer ^^
* IAVL * perf: store/cachekv: Correct the naming and implementation for existing benchmarks (cosmos#10116) ## Description Cref discussion in cosmos#10026, updates the CacheKV Benchmark naming and implementation to correspond to whats actually going on, and remove many irrelevant/incorrect components from being in the benchmarks timing. Basically the old Benchmark's iterator creation was very flawed, and never hit the complex case, only repeatedly performing the best case performance of the iterator. Instead it really benchmarked iterator set and next operations. This PR splits out the benchmarks for those two accordingly. --- ### Author Checklist *All items are required. Please add a note to the item if the item is not applicable and please add links to any relevant follow up issues.* I have... - [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [x] added `!` to the type prefix if API or client breaking change - [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [x] provided a link to the relevant issue or specification - [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules) - [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [x] added a changelog entry to `CHANGELOG.md` - N/A imo - [x] included comments for [documenting Go code](https://blog.golang.org/godoc) - [x] updated the relevant documentation or specification - [x] reviewed "Files changed" and left comments if necessary - [x] confirmed all CI checks have passed - lint failure unrelated ### Reviewers Checklist *All items are required. Please add a note if the item is not applicable and please add your handle next to the items reviewed if you only reviewed selected items.* I have... - [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] confirmed `!` in the type prefix if API or client breaking change - [ ] confirmed all author checklist items have been addressed - [ ] reviewed state machine logic - [ ] reviewed API design and naming - [ ] reviewed documentation is accurate - [ ] reviewed tests and test coverage - [ ] manually tested (if applicable) * store cosmos#10486 * supply cosmos#10393 * supply cosmos#10393 * telemetry cosmos#10334 * module account Co-authored-by: Marko <marbar3778@yahoo.com> Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com> Co-authored-by: likhita-809 <78951027+likhita-809@users.noreply.github.com>
…st (cosmos#10486) We can shave off some milliseconds, but also cut down some Megabytes of RAM consumed by only requesting from the cache if needed, but also using the map clearing idiom which is recognized by the compiler to make fast code. Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710 - Before * Memory profiles ```shell 19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{}) ``` * CPU profiles ```go . . 118: // TODO: Consider allowing usage of Batch, which would allow the write to . . 119: // at least happen atomically. 150ms 150ms 120: for _, key := range keys { 220ms 3.64s 121: cacheValue := store.cache[key] . . 122: . . 123: switch { . 250ms 124: case store.isDeleted(key): . . 125: store.parent.Delete([]byte(key)) 210ms 210ms 126: case cacheValue.value == nil: . . 127: // Skip, it already doesn't exist in parent. . . 128: default: 240ms 27.94s 129: store.parent.Set([]byte(key), cacheValue.value) . . 130: } . . 131: } ... 10ms 60ms 134: store.cache = make(map[string]*cValue) . 40ms 135: store.deleted = make(map[string]struct{}) . 50ms 136: store.unsortedCache = make(map[string]struct{}) . 110ms 137: store.sortedCache = dbm.NewMemDB() ``` - After * Memory profiles ```shell . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . . 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . . 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . . 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } ``` * CPU profiles ```shell . . 111: // TODO: Consider allowing usage of Batch, which would allow the write to . . 112: // at least happen atomically. 110ms 110ms 113: for _, key := range keys { . 210ms 114: if store.isDeleted(key) { . . 115: // We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot . . 116: // be sure if the underlying store might do a save with the byteslice or . . 117: // not. Once we get confirmation that .Delete is guaranteed not to . . 118: // save the byteslice, then we can assume only a read-only copy is sufficient. . . 119: store.parent.Delete([]byte(key)) . . 120: continue . . 121: } . . 122: 50ms 2.45s 123: cacheValue := store.cache[key] 910ms 920ms 124: if cacheValue.value != nil { . . 125: // It already exists in the parent, hence delete it. 120ms 29.56s 126: store.parent.Set([]byte(key), cacheValue.value) . . 127: } . . 128: } . . 129: . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . 210ms 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . 10ms 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . 170ms 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } . 260ms 142: store.sortedCache = dbm.NewMemDB() . 10ms 143:} ``` Fixes cosmos#10487 Updates evmos/ethermint#710
@Mergifyio backport release/v0.45.x |
…st (#10486) We can shave off some milliseconds, but also cut down some Megabytes of RAM consumed by only requesting from the cache if needed, but also using the map clearing idiom which is recognized by the compiler to make fast code. Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710 - Before * Memory profiles ```shell 19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{}) ``` * CPU profiles ```go . . 118: // TODO: Consider allowing usage of Batch, which would allow the write to . . 119: // at least happen atomically. 150ms 150ms 120: for _, key := range keys { 220ms 3.64s 121: cacheValue := store.cache[key] . . 122: . . 123: switch { . 250ms 124: case store.isDeleted(key): . . 125: store.parent.Delete([]byte(key)) 210ms 210ms 126: case cacheValue.value == nil: . . 127: // Skip, it already doesn't exist in parent. . . 128: default: 240ms 27.94s 129: store.parent.Set([]byte(key), cacheValue.value) . . 130: } . . 131: } ... 10ms 60ms 134: store.cache = make(map[string]*cValue) . 40ms 135: store.deleted = make(map[string]struct{}) . 50ms 136: store.unsortedCache = make(map[string]struct{}) . 110ms 137: store.sortedCache = dbm.NewMemDB() ``` - After * Memory profiles ```shell . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . . 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . . 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . . 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } ``` * CPU profiles ```shell . . 111: // TODO: Consider allowing usage of Batch, which would allow the write to . . 112: // at least happen atomically. 110ms 110ms 113: for _, key := range keys { . 210ms 114: if store.isDeleted(key) { . . 115: // We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot . . 116: // be sure if the underlying store might do a save with the byteslice or . . 117: // not. Once we get confirmation that .Delete is guaranteed not to . . 118: // save the byteslice, then we can assume only a read-only copy is sufficient. . . 119: store.parent.Delete([]byte(key)) . . 120: continue . . 121: } . . 122: 50ms 2.45s 123: cacheValue := store.cache[key] 910ms 920ms 124: if cacheValue.value != nil { . . 125: // It already exists in the parent, hence delete it. 120ms 29.56s 126: store.parent.Set([]byte(key), cacheValue.value) . . 127: } . . 128: } . . 129: . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . 210ms 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . 10ms 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . 170ms 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } . 260ms 142: store.sortedCache = dbm.NewMemDB() . 10ms 143:} ``` Fixes #10487 Updates evmos/ethermint#710 (cherry picked from commit 5399e72) # Conflicts: # CHANGELOG.md
✅ Backports have been created
|
…st, avoid keys sort (backport #10486) (#10848) * perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (#10486) We can shave off some milliseconds, but also cut down some Megabytes of RAM consumed by only requesting from the cache if needed, but also using the map clearing idiom which is recognized by the compiler to make fast code. Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710 - Before * Memory profiles ```shell 19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{}) ``` * CPU profiles ```go . . 118: // TODO: Consider allowing usage of Batch, which would allow the write to . . 119: // at least happen atomically. 150ms 150ms 120: for _, key := range keys { 220ms 3.64s 121: cacheValue := store.cache[key] . . 122: . . 123: switch { . 250ms 124: case store.isDeleted(key): . . 125: store.parent.Delete([]byte(key)) 210ms 210ms 126: case cacheValue.value == nil: . . 127: // Skip, it already doesn't exist in parent. . . 128: default: 240ms 27.94s 129: store.parent.Set([]byte(key), cacheValue.value) . . 130: } . . 131: } ... 10ms 60ms 134: store.cache = make(map[string]*cValue) . 40ms 135: store.deleted = make(map[string]struct{}) . 50ms 136: store.unsortedCache = make(map[string]struct{}) . 110ms 137: store.sortedCache = dbm.NewMemDB() ``` - After * Memory profiles ```shell . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . . 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . . 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . . 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } ``` * CPU profiles ```shell . . 111: // TODO: Consider allowing usage of Batch, which would allow the write to . . 112: // at least happen atomically. 110ms 110ms 113: for _, key := range keys { . 210ms 114: if store.isDeleted(key) { . . 115: // We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot . . 116: // be sure if the underlying store might do a save with the byteslice or . . 117: // not. Once we get confirmation that .Delete is guaranteed not to . . 118: // save the byteslice, then we can assume only a read-only copy is sufficient. . . 119: store.parent.Delete([]byte(key)) . . 120: continue . . 121: } . . 122: 50ms 2.45s 123: cacheValue := store.cache[key] 910ms 920ms 124: if cacheValue.value != nil { . . 125: // It already exists in the parent, hence delete it. 120ms 29.56s 126: store.parent.Set([]byte(key), cacheValue.value) . . 127: } . . 128: } . . 129: . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . 210ms 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . 10ms 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . 170ms 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } . 260ms 142: store.sortedCache = dbm.NewMemDB() . 10ms 143:} ``` Fixes #10487 Updates evmos/ethermint#710 (cherry picked from commit 5399e72) # Conflicts: # CHANGELOG.md * fix conflicts Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com> Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
…st, avoid keys sort (backport cosmos#10486) (cosmos#10848) * perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (cosmos#10486) We can shave off some milliseconds, but also cut down some Megabytes of RAM consumed by only requesting from the cache if needed, but also using the map clearing idiom which is recognized by the compiler to make fast code. Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710 - Before * Memory profiles ```shell 19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{}) ``` * CPU profiles ```go . . 118: // TODO: Consider allowing usage of Batch, which would allow the write to . . 119: // at least happen atomically. 150ms 150ms 120: for _, key := range keys { 220ms 3.64s 121: cacheValue := store.cache[key] . . 122: . . 123: switch { . 250ms 124: case store.isDeleted(key): . . 125: store.parent.Delete([]byte(key)) 210ms 210ms 126: case cacheValue.value == nil: . . 127: // Skip, it already doesn't exist in parent. . . 128: default: 240ms 27.94s 129: store.parent.Set([]byte(key), cacheValue.value) . . 130: } . . 131: } ... 10ms 60ms 134: store.cache = make(map[string]*cValue) . 40ms 135: store.deleted = make(map[string]struct{}) . 50ms 136: store.unsortedCache = make(map[string]struct{}) . 110ms 137: store.sortedCache = dbm.NewMemDB() ``` - After * Memory profiles ```shell . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . . 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . . 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . . 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } ``` * CPU profiles ```shell . . 111: // TODO: Consider allowing usage of Batch, which would allow the write to . . 112: // at least happen atomically. 110ms 110ms 113: for _, key := range keys { . 210ms 114: if store.isDeleted(key) { . . 115: // We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot . . 116: // be sure if the underlying store might do a save with the byteslice or . . 117: // not. Once we get confirmation that .Delete is guaranteed not to . . 118: // save the byteslice, then we can assume only a read-only copy is sufficient. . . 119: store.parent.Delete([]byte(key)) . . 120: continue . . 121: } . . 122: 50ms 2.45s 123: cacheValue := store.cache[key] 910ms 920ms 124: if cacheValue.value != nil { . . 125: // It already exists in the parent, hence delete it. 120ms 29.56s 126: store.parent.Set([]byte(key), cacheValue.value) . . 127: } . . 128: } . . 129: . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . 210ms 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . 10ms 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . 170ms 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } . 260ms 142: store.sortedCache = dbm.NewMemDB() . 10ms 143:} ``` Fixes cosmos#10487 Updates evmos/ethermint#710 (cherry picked from commit 5399e72) # Conflicts: # CHANGELOG.md * fix conflicts Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com> Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
…st, avoid keys sort (backport cosmos#10486) (cosmos#10848) * perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (cosmos#10486) We can shave off some milliseconds, but also cut down some Megabytes of RAM consumed by only requesting from the cache if needed, but also using the map clearing idiom which is recognized by the compiler to make fast code. Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710 - Before * Memory profiles ```shell 19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{}) ``` * CPU profiles ```go . . 118: // TODO: Consider allowing usage of Batch, which would allow the write to . . 119: // at least happen atomically. 150ms 150ms 120: for _, key := range keys { 220ms 3.64s 121: cacheValue := store.cache[key] . . 122: . . 123: switch { . 250ms 124: case store.isDeleted(key): . . 125: store.parent.Delete([]byte(key)) 210ms 210ms 126: case cacheValue.value == nil: . . 127: // Skip, it already doesn't exist in parent. . . 128: default: 240ms 27.94s 129: store.parent.Set([]byte(key), cacheValue.value) . . 130: } . . 131: } ... 10ms 60ms 134: store.cache = make(map[string]*cValue) . 40ms 135: store.deleted = make(map[string]struct{}) . 50ms 136: store.unsortedCache = make(map[string]struct{}) . 110ms 137: store.sortedCache = dbm.NewMemDB() ``` - After * Memory profiles ```shell . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . . 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . . 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . . 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } ``` * CPU profiles ```shell . . 111: // TODO: Consider allowing usage of Batch, which would allow the write to . . 112: // at least happen atomically. 110ms 110ms 113: for _, key := range keys { . 210ms 114: if store.isDeleted(key) { . . 115: // We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot . . 116: // be sure if the underlying store might do a save with the byteslice or . . 117: // not. Once we get confirmation that .Delete is guaranteed not to . . 118: // save the byteslice, then we can assume only a read-only copy is sufficient. . . 119: store.parent.Delete([]byte(key)) . . 120: continue . . 121: } . . 122: 50ms 2.45s 123: cacheValue := store.cache[key] 910ms 920ms 124: if cacheValue.value != nil { . . 125: // It already exists in the parent, hence delete it. 120ms 29.56s 126: store.parent.Set([]byte(key), cacheValue.value) . . 127: } . . 128: } . . 129: . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . 210ms 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . 10ms 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . 170ms 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } . 260ms 142: store.sortedCache = dbm.NewMemDB() . 10ms 143:} ``` Fixes cosmos#10487 Updates evmos/ethermint#710 (cherry picked from commit 5399e72) # Conflicts: # CHANGELOG.md * fix conflicts Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com> Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
…st, avoid keys sort (backport cosmos#10486) (cosmos#10848) * perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (cosmos#10486) We can shave off some milliseconds, but also cut down some Megabytes of RAM consumed by only requesting from the cache if needed, but also using the map clearing idiom which is recognized by the compiler to make fast code. Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710 - Before * Memory profiles ```shell 19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{}) ``` * CPU profiles ```go . . 118: // TODO: Consider allowing usage of Batch, which would allow the write to . . 119: // at least happen atomically. 150ms 150ms 120: for _, key := range keys { 220ms 3.64s 121: cacheValue := store.cache[key] . . 122: . . 123: switch { . 250ms 124: case store.isDeleted(key): . . 125: store.parent.Delete([]byte(key)) 210ms 210ms 126: case cacheValue.value == nil: . . 127: // Skip, it already doesn't exist in parent. . . 128: default: 240ms 27.94s 129: store.parent.Set([]byte(key), cacheValue.value) . . 130: } . . 131: } ... 10ms 60ms 134: store.cache = make(map[string]*cValue) . 40ms 135: store.deleted = make(map[string]struct{}) . 50ms 136: store.unsortedCache = make(map[string]struct{}) . 110ms 137: store.sortedCache = dbm.NewMemDB() ``` - After * Memory profiles ```shell . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . . 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . . 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . . 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } ``` * CPU profiles ```shell . . 111: // TODO: Consider allowing usage of Batch, which would allow the write to . . 112: // at least happen atomically. 110ms 110ms 113: for _, key := range keys { . 210ms 114: if store.isDeleted(key) { . . 115: // We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot . . 116: // be sure if the underlying store might do a save with the byteslice or . . 117: // not. Once we get confirmation that .Delete is guaranteed not to . . 118: // save the byteslice, then we can assume only a read-only copy is sufficient. . . 119: store.parent.Delete([]byte(key)) . . 120: continue . . 121: } . . 122: 50ms 2.45s 123: cacheValue := store.cache[key] 910ms 920ms 124: if cacheValue.value != nil { . . 125: // It already exists in the parent, hence delete it. 120ms 29.56s 126: store.parent.Set([]byte(key), cacheValue.value) . . 127: } . . 128: } . . 129: . . 130: // Clear the cache using the map clearing idiom . . 131: // and not allocating fresh objects. . . 132: // Please see https://bencher.orijtech.com/perfclinic/mapclearing/ . 210ms 133: for key := range store.cache { . . 134: delete(store.cache, key) . . 135: } . 10ms 136: for key := range store.deleted { . . 137: delete(store.deleted, key) . . 138: } . 170ms 139: for key := range store.unsortedCache { . . 140: delete(store.unsortedCache, key) . . 141: } . 260ms 142: store.sortedCache = dbm.NewMemDB() . 10ms 143:} ``` Fixes cosmos#10487 Updates evmos/ethermint#710 (cherry picked from commit 5399e72) # Conflicts: # CHANGELOG.md * fix conflicts Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com> Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.
Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710
19.50MB 19.50MB 134: store.cache = make(map[string]*cValue) 18.50MB 18.50MB 135: store.deleted = make(map[string]struct{}) 15.50MB 15.50MB 136: store.unsortedCache = make(map[string]struct{})
Fixes #10487
Updates evmos/ethermint#710