-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Wait shutdown-grace-seconds to flush async db writes to disk #17094
Conversation
You can probably implement the app.Close method, which is called in graceful shutdown. |
Thank you for the comment. Will try it. |
Do you know where the database handle to |
0c754e7
to
864ef99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the server
package's signal capturing mechanism executes a callback function that performs all cleanup, instead the app should pass what it wants cleaned up to that call IMO
Are you referring to cleanupFn from A callback to add items to clean up makes sense, but since start.go/startApp opens db, should start.go clean itself up rather than relying on the chain app.go to ask? cosmos-sdk/baseapp/baseapp.go creates the snapshotmanager, so is it better that cosmos-sdk clean that up rather than rely on a callback supplied by the application chain? if app/app.go opens a new database, then cleanupFn should clean up that new database that cosmos-sdk does not manage |
@alexanderbez Confirming whether you agree that:
|
I would say this makes sense to me, yes! |
Following guidance, cometbft closes databases it opens OnStop: |
github dependency test does not see the new snapshot manager Close() method local
|
This is because store it is own go.mod. You could split the PR and scope one on store changes the the other one for integrating the changes (using then a pseudo version from main). That's better than adding replaces imho. |
store/snapshots/manager.go
Outdated
@@ -550,3 +550,9 @@ func (m *Manager) snapshot(height int64) { | |||
m.logger.Debug("pruned state snapshots", "pruned", pruned) | |||
} | |||
} | |||
|
|||
// close snapshot database | |||
func (m *Manager) Close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a changelog under store/CHANGELOG.md for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #17294
snapshot manager Close() split into #17294 |
a83f171
to
593f240
Compare
6a91d0f
to
398b357
Compare
Yes |
Let's wait for an update from @yihuang to see how soon a root cause solution is available. If a solution isn't available soon, it might be good to use the grace period now and reduce data loss on program exit (this is a current problem). We can remove the grace feature later. |
sorry for the delay, our chain binary still uses sdk 0.46, which is known to be not good at graceful shutdown. |
@yihuang Do you think it's better to remove the new grace-seconds and only keep the DB close calls in baseapp/baseapp.go Close() ? Is there any way to backport the Close()/Cleanup so that previous SDK versions can shutdown properly? Or must all chains upgrade to 0.49/0.50 to shutdown properly? |
I think so, the issue probably don't reproduce in main branch, but I guess we can add this in 0.46 backport if is reproduce there?
I did an try before, but there are too many API breaking changes to be accepted: #16203 |
How about we move the db close calls in baseapp.go to a new PR to close the databases on exit. This PR can be just for the shutdown-grace backport for versions that don't shutdown gracefully like 0.45/0.46/0.47(juno) |
Yeah, sounds good to me. |
Moved database close calls to #17667 |
This PR can be just for the shutdown-grace backport for versions that don't shutdown gracefully like 0.45/0.46/0.47(juno) |
@Mergifyio backport release/v0.50.x |
✅ Backports have been created
|
Description
server/start.go by default exits the main() routine before databases have a chance to flush async writes to disk (such as pebbledb)
Since there is no master registry of all open databases that can be used for orderly flushing of all open databases, databases should be allowed to catch interrupt signals (like the main daemon) and be given time to flush to disk to avoid data loss.
Add Close() for snapshot manager
Add application.db and snapshots/metadata.db to BaseApp.Close()
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
make lint
andmake test
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change