-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: abstractions for snapshot and pruning; snapshot intervals eventually pruned; unit tests #11496
refactor!: abstractions for snapshot and pruning; snapshot intervals eventually pruned; unit tests #11496
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11496 +/- ##
==========================================
+ Coverage 65.91% 66.14% +0.23%
==========================================
Files 667 678 +11
Lines 70466 72010 +1544
==========================================
+ Hits 46448 47632 +1184
- Misses 21323 21598 +275
- Partials 2695 2780 +85
|
@p0mvn is this ready for review? |
@alexanderbez not yet, need to fix one more test. I will ping you once ready |
The test I referred to above is broken on master so this is ready. @alexanderbez I have some questions about |
pruning/README.md
Outdated
|
||
The strategies are configured in `app.toml`: | ||
pruning = "< strategy >" # where the options are: | ||
- `default`: only the last 100,000 states(approximately 1 week worth of state) are kept; pruning at 100 block intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed the default value some time ago. Did you change it back to 100k in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 100k in Osmosis. I missed this difference when cherry-picking. Changed back to the updated default value of 362880.
What was the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty large PR to review in detail TBH. But since it was merged in Osmosis, I see no reason it can't be merged here.
74d245d
to
b57d056
Compare
.github/workflows/lint.yml
Outdated
@@ -1,31 +1,17 @@ | |||
name: Lint | |||
# Lint runs golangci-lint over the entire cosmos-sdk repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we revert changing this file, its setup to not run if no go code is down to not clog and slow down ci for docs prs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an accidental change, removed now. Thanks for catching that
519c590
to
b57d056
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this change @p0mvn. I've added a review, please take a look.
baseapp/baseapp.go
Outdated
if !ok { | ||
return errors.New("rootmulti store is required") | ||
} | ||
if err := rms.GetPruning().Validate(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simply do: return rms.GetPruning().Validate()
and that will remove the need for the 3+ lines below.
pruning/manager.go
Outdated
func (m *Manager) flushPruningSnapshotHeights(batch dbm.Batch) { | ||
m.mx.Lock() | ||
defer m.mx.Unlock() | ||
bz := make([]byte, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can preallocate this buffer by doing:
bz := make([]byte, 0, m.pruneSnapshotHeights.Len()*8)
pruning/manager.go
Outdated
} | ||
|
||
func (m *Manager) flushPruningHeights(batch dbm.Batch) { | ||
bz := make([]byte, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bz := make([]byte, 0, len(m.pruneHeights)*8)
pruning/manager.go
Outdated
offset += 8 | ||
} | ||
|
||
if pruneSnapshotHeights.Len() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition will never fail given that offset which starts off as 0 is ALWAYS less than len(bz) which you just tested right above and have as the invariant for the loop. You will always add at least 1 element to the list.
store/rootmulti/store.go
Outdated
for key, store := range rs.stores { | ||
// If the store is wrapped with an inter-block cache, we must first unwrap | ||
// it to get the underlying IAVL store. | ||
if store.GetStoreType() == types.StoreTypeIAVL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reverse conditional this and avoid the nesting so
if store... != types.StoreTypeIAVL. {
continue
}
...
store/rootmulti/store.go
Outdated
|
||
store = rs.GetCommitKVStore(key) | ||
|
||
if err := store.(*iavl.Store).DeleteVersions(pruningHeights...); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here if you'd like above reverse conditional
store/rootmulti/store.go
Outdated
// GetStoreByName performs a lookup of a StoreKey given a store name typically | ||
func (rs *Store) handlePruning(version int64) error { | ||
rs.pruningManager.HandleHeight(version - 1) // we should never prune the current version. | ||
if rs.pruningManager.ShouldPruneAtHeight(version) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reverse conditional this to return if !rs...
store/rootmulti/store.go
Outdated
@@ -26,11 +21,18 @@ import ( | |||
"github.com/cosmos/cosmos-sdk/store/transient" | |||
"github.com/cosmos/cosmos-sdk/store/types" | |||
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors" | |||
|
|||
iavltree "github.com/cosmos/iavl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this block imports as they were idiomatic before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a "github.com/cosmos/cosmos-sdk/store/iavl"
on top. So we need a rename to avoid the clash.
@odeke-em provided excellent feedback here. TY! |
b57d056
to
0b15651
Compare
@odeke-em Thank you for your review. All comments are now addressed. Please take a look when you have time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my feedback @p0mvn! Nice work and LGTM!
server/start.go
Outdated
@@ -25,13 +25,14 @@ import ( | |||
"github.com/cosmos/cosmos-sdk/client" | |||
"github.com/cosmos/cosmos-sdk/client/flags" | |||
"github.com/cosmos/cosmos-sdk/codec" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this newline though as this is the same block of imports :-)
@@ -559,7 +544,47 @@ func (rs *Store) GetKVStore(key types.StoreKey) types.KVStore { | |||
return store | |||
} | |||
|
|||
// GetStoreByName performs a lookup of a StoreKey given a store name typically | |||
func (rs *Store) handlePruning(version int64) error { | |||
rs.pruningManager.HandleHeight(version - 1) // we should never prune the current version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please check that this "version" is always greater than 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version
can be 1. However, that is not a problem because we always require keep-recent > 2 so a 0 height never gets pruned. Just in case this ever gets changed, I added a guard against previousHeight == 0
in HandleHeight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if version becomes negative? Checking for == 0 only guards against that single value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not possible. The only valid values are 0 and greater
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always a good idea to practice defensive programming. As version
is of type int64
it's possible this method is called with a value of 0
or -1
or -123456
. If HandleHeight
has specific constraints on the values it accepts, then this method should enforce those constraints before calling the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to cover all possible inputs, table tests TestHandleHeight_Inputs
to test all possible behaviors, and updated godoc
store/rootmulti/store.go
Outdated
err := rs.pruneStores() | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, you can make a compound one-liner
if err := rs.pruneStores(); err != nil {
panic(err)
}
062847f
to
d43a785
Compare
I tested all these changes on Osmosis v7.x by cherry-picking all relevant commits onto the desired branch. Works exactly as expected. The summary of the tests can be seen here: osmosis-labs#184 (comment) @alexanderbez Please let me know how we should proceed. I'm ready to merge this if you agree that the tests are acceptable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We need to undo the changelog changes. Then we can proceed with merging.
Also, tbh, I haven't fully reviewed this PR in complete detail -- it's just way too large. So I'm mainly giving an ACK as it's been seen to work reliably for Osmosis.
a65917f
to
073b664
Compare
Will merge once CI passes 👍 |
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
…eventually pruned; unit tests (cosmos#11496)
Description
Upstreaming: osmosis-labs#140
We've run into issues where attempting to prune a height under snapshot started being too frequent. To temporarily mitigate, we had to require node operators to have a large
pruning-keep-recent
.This PR fixes this problem by avoiding pruning snapshot heights until after a snapshot is complete.
In addition, an abstraction for pruning (
pruning/manager.go
) was added. Also, unit tested rigorously from config to base app.Marked as API breaking because changed the following:
type Committer interface
type Snapshotter interface
Closes: #XXXX
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change