Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Wait shutdown-grace-seconds to flush async db writes to disk #17094

Merged
merged 21 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ Ref: https://keepachangelog.com/en/1.0.0/
* (cli) [#16209](https://github.com/cosmos/cosmos-sdk/pull/16209) Make `StartCmd` more customizable.
* (types) [#16257](https://github.com/cosmos/cosmos-sdk/pull/16257) Allow setting the base denom in the denom registry.
* (genutil) [#16046](https://github.com/cosmos/cosmos-sdk/pull/16046) Add "module-name" flag to genutil `add-genesis-account` to enable intializing module accounts at genesis.
* (baseapp) [#17094](https://github.com/cosmos/cosmos-sdk/pull/17094) Add duration `shutdown-grace` for resource clean up (closing database handles) before exit

### Improvements

Expand Down
24 changes: 24 additions & 0 deletions baseapp/baseapp.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package baseapp

import (
"context"
goerrors "errors"
chillyvee marked this conversation as resolved.
Show resolved Hide resolved
"fmt"
"sort"
"strconv"
Expand Down Expand Up @@ -1057,5 +1058,28 @@ func (app *BaseApp) ProcessProposalVerifyTx(txBz []byte) (sdk.Tx, error) {

// Close is called in start cmd to gracefully cleanup resources.
func (app *BaseApp) Close() error {
var errs []error
chillyvee marked this conversation as resolved.
Show resolved Hide resolved

// Close app.db (opened by cosmos-sdk/server/start.go call to openDB)
app.logger.Info("Closing application.db")
if err := app.db.Close(); err != nil {
chillyvee marked this conversation as resolved.
Show resolved Hide resolved
errs = append(errs, err)
}

// Close app.snapshotManager
// - opened when app chains use cosmos-sdk/server/util.go/DefaultBaseappOptions (boilerplate)
// - which calls cosmos-sdk/server/util.go/GetSnapshotStore
// - which is passed to baseapp/options.go/SetSnapshot
// - to set app.snapshotManager = snapshots.NewManager
if app.snapshotManager != nil {
app.logger.Info("Closing snapshots/metadata.db")
if err := app.snapshotManager.Close(); err != nil {
errs = append(errs, err)
}
}

if len(errs) > 0 {
chillyvee marked this conversation as resolved.
Show resolved Hide resolved
return goerrors.Join(errs...)
}
return nil
}
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ require (
cosmossdk.io/errors v1.0.0
cosmossdk.io/log v1.2.0
cosmossdk.io/math v1.0.1
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7
kocubinski marked this conversation as resolved.
Show resolved Hide resolved
cosmossdk.io/x/tx v0.9.1
github.com/99designs/keyring v1.2.1
github.com/bgentry/speakeasy v0.1.1-0.20220910012023-760eaf8b6816
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ cosmossdk.io/log v1.2.0 h1:BbykkDsutXPSy8RojFB3KZEWyvMsToLy0ykb/ZhsLqQ=
cosmossdk.io/log v1.2.0/go.mod h1:GNSCc/6+DhFIj1aLn/j7Id7PaO8DzNylUZoOYBL9+I4=
cosmossdk.io/math v1.0.1 h1:Qx3ifyOPaMLNH/89WeZFH268yCvU4xEcnPLu3sJqPPg=
cosmossdk.io/math v1.0.1/go.mod h1:Ygz4wBHrgc7g0N+8+MrnTfS9LLn9aaTGa9hKopuym5k=
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982 h1:61YFeW2AhwwPfoJWzNJWvVubCj32sm5jZkJfraS9pDQ=
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982/go.mod h1:QAF9zeRa/9ghuv7E8NS9SzWqRbgVNwH/dZwGhYDHUjI=
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7 h1:defJ42NPi/x3i8i6lPCm168TDvoeudlnrPI9kE5bQII=
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7/go.mod h1:ah/SpIc7aroX8Cg9jRkaxNhtdUO6UW2fpVPk/XMBmmw=
cosmossdk.io/x/tx v0.9.1 h1:9pmmXA9Vs4qdouOFnzhsdsff2mif0f0kylMq5xTGhRI=
cosmossdk.io/x/tx v0.9.1/go.mod h1:/YFGTXG6+kyihd8YbfuJiXHV4R/mIMm2uvVzo80CIhA=
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
Expand Down
17 changes: 16 additions & 1 deletion server/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"net"
"os"
"runtime/pprof"
"time"

"github.com/cometbft/cometbft/abci/server"
cmtcmd "github.com/cometbft/cometbft/cmd/cometbft/commands"
Expand Down Expand Up @@ -63,6 +64,7 @@ const (
FlagMinRetainBlocks = "min-retain-blocks"
FlagIAVLCacheSize = "iavl-cache-size"
FlagDisableIAVLFastNode = "iavl-disable-fastnode"
FlagShutdownGrace = "shutdown-grace"

// state sync-related flags
FlagStateSyncSnapshotInterval = "state-sync.snapshot-interval"
Expand Down Expand Up @@ -167,9 +169,21 @@ is performed. Note, when enabled, gRPC will also be automatically enabled.
serverCtx.Logger.Info("starting ABCI without CometBFT")
}

return wrapCPUProfile(serverCtx, func() error {
err = wrapCPUProfile(serverCtx, func() error {
return start(serverCtx, clientCtx, appCreator, withCMT, opts)
})

serverCtx.Logger.Debug("received quit signal")

julienrbrt marked this conversation as resolved.
Show resolved Hide resolved
// wait for app.Close() and tmNode.Stop() to flush async db writes to disk
julienrbrt marked this conversation as resolved.
Show resolved Hide resolved
graceDuration, _ := cmd.Flags().GetDuration(FlagShutdownGrace)
if graceDuration > 0 {
serverCtx.Logger.Info("graceful shutdown start", FlagShutdownGrace, graceDuration)
<-time.After(graceDuration)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this, I think the defer functions will run when received signals

Copy link
Contributor Author

@chillyvee chillyvee Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, defers run in all simple sample programs, however in practice, cosmos-sdk baseapp exits before defers can finish.

Here is an example of junod with pebbledb and goleveldb with and without shutdown-grace

# junod exits without any database closes
# shutdown-grace = default 0

## pebbledb
^CWaitForQuitSignals() interrupt
do not wait for database to flush module=server
baseApp defer exit module=server
--exit without db close--

## goleveldb
^CWaitForQuitSignals() interrupt
baseApp defer exit module=server
WaitForQuitSignals() interrupt
do not wait for database to flush module=server
--exit without db close--




# junod closes db before exit with shutdown-grace
# shutdown-grace = "4.5s"

## pebbledb

commit synced commit=436F6D6D697449447B5B31373520383020323337203131312033203820343820313838203437203232372031313820323434203130372031333720353820373920333320313137203838203635203138203233382031333920323532203220373920343420343220333020313131203231362038365D3A3242423236327D module=server
committed state app_hash=AF50ED6F030830BC2FE376F46B893A4F2175584112EE8BFC024F2C2A1E6FD856 height=2863714 module=state num_txs=0
indexed block exents height=2863714 module=txindex
^CWaitForQuitSignals() interrupt
wait for database to flush graceDuration=4500 module=server
WaitForQuitSignals() interrupt
baseApp defer exit module=server
service stop impl={"Logger":{"Logger":{}}} module=server msg={}
Stopping Node module=server
Closing blockstore module=server
Closing statestore module=server
Closing evidencestore module=server
Closing application.db module=server
Closing snapshots/metadata.db module=server
exiting... module=server
finished waiting for database to flush module=server


## goleveldb
commit synced commit=436F6D6D697449447B5B31313820323531203439203139312031323120323131203136312031303220323131203234392034342031363520313435203130302031203436203137342038203820313634203233322032323520383320313739203131302031363620392038203130382031332034342036335D3A3242423331397D module=server
committed state app_hash=76FB31BF79D3A166D3F92CA59164012EAE0808A4E8E153B36EA609086C0D2C3F height=2863897 module=state num_txs=0
indexed block exents height=2863897 module=txindex
baseApp defer exit module=server
WaitForQuitSignals() interrupt
wait for database to flush graceDuration=4500 module=server
Closing blockstore module=server
Closing statestore module=server
Closing evidencestore module=server
Closing application.db module=server
Closing snapshots/metadata.db module=server
exiting... module=server
finished waiting for database to flush module=server

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we diagnose the root cause first, I've diagnosed some similar issues before, I'll have another look at what happens with our chain binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer to merge this PR without the grace period and only close the open databases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yihuang, any news for the root cause diagnosis?

serverCtx.Logger.Info("graceful shutdown complete")
}

return err
},
}

Expand Down Expand Up @@ -205,6 +219,7 @@ is performed. Note, when enabled, gRPC will also be automatically enabled.
cmd.Flags().Uint32(FlagStateSyncSnapshotKeepRecent, 2, "State sync snapshot to keep")
cmd.Flags().Bool(FlagDisableIAVLFastNode, false, "Disable fast node for IAVL tree")
cmd.Flags().Int(FlagMempoolMaxTxs, mempool.DefaultMaxTx, "Sets MaxTx value for the app-side mempool")
cmd.Flags().Duration(FlagShutdownGrace, 0*time.Second, "On Shutdown, duration to wait for resource clean up")

// support old flags name for backwards compatibility
cmd.Flags().SetNormalizeFunc(func(f *pflag.FlagSet, name string) pflag.NormalizedName {
Expand Down
2 changes: 1 addition & 1 deletion simapp/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ require (
cosmossdk.io/depinject v1.0.0-alpha.3
cosmossdk.io/log v1.2.0
cosmossdk.io/math v1.0.1
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7
cosmossdk.io/tools/confix v0.0.0-20230613133644-0a778132a60f
cosmossdk.io/x/circuit v0.0.0-20230613133644-0a778132a60f
cosmossdk.io/x/evidence v0.0.0-20230613133644-0a778132a60f
Expand Down
4 changes: 2 additions & 2 deletions simapp/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,8 @@ cosmossdk.io/log v1.2.0 h1:BbykkDsutXPSy8RojFB3KZEWyvMsToLy0ykb/ZhsLqQ=
cosmossdk.io/log v1.2.0/go.mod h1:GNSCc/6+DhFIj1aLn/j7Id7PaO8DzNylUZoOYBL9+I4=
cosmossdk.io/math v1.0.1 h1:Qx3ifyOPaMLNH/89WeZFH268yCvU4xEcnPLu3sJqPPg=
cosmossdk.io/math v1.0.1/go.mod h1:Ygz4wBHrgc7g0N+8+MrnTfS9LLn9aaTGa9hKopuym5k=
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982 h1:61YFeW2AhwwPfoJWzNJWvVubCj32sm5jZkJfraS9pDQ=
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982/go.mod h1:QAF9zeRa/9ghuv7E8NS9SzWqRbgVNwH/dZwGhYDHUjI=
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7 h1:defJ42NPi/x3i8i6lPCm168TDvoeudlnrPI9kE5bQII=
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7/go.mod h1:ah/SpIc7aroX8Cg9jRkaxNhtdUO6UW2fpVPk/XMBmmw=
cosmossdk.io/x/tx v0.9.1 h1:9pmmXA9Vs4qdouOFnzhsdsff2mif0f0kylMq5xTGhRI=
cosmossdk.io/x/tx v0.9.1/go.mod h1:/YFGTXG6+kyihd8YbfuJiXHV4R/mIMm2uvVzo80CIhA=
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
Expand Down
2 changes: 1 addition & 1 deletion tests/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ require (
cosmossdk.io/log v1.2.0
cosmossdk.io/math v1.0.1
cosmossdk.io/simapp v0.0.0-20230309163709-87da587416ba
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7
cosmossdk.io/x/evidence v0.0.0-20230613133644-0a778132a60f
cosmossdk.io/x/feegrant v0.0.0-20230613133644-0a778132a60f
cosmossdk.io/x/nft v0.0.0-20230613133644-0a778132a60f // indirect
Expand Down
4 changes: 2 additions & 2 deletions tests/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,8 @@ cosmossdk.io/log v1.2.0 h1:BbykkDsutXPSy8RojFB3KZEWyvMsToLy0ykb/ZhsLqQ=
cosmossdk.io/log v1.2.0/go.mod h1:GNSCc/6+DhFIj1aLn/j7Id7PaO8DzNylUZoOYBL9+I4=
cosmossdk.io/math v1.0.1 h1:Qx3ifyOPaMLNH/89WeZFH268yCvU4xEcnPLu3sJqPPg=
cosmossdk.io/math v1.0.1/go.mod h1:Ygz4wBHrgc7g0N+8+MrnTfS9LLn9aaTGa9hKopuym5k=
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982 h1:61YFeW2AhwwPfoJWzNJWvVubCj32sm5jZkJfraS9pDQ=
cosmossdk.io/store v1.0.0-alpha.1.0.20230728080422-54ed7dab3982/go.mod h1:QAF9zeRa/9ghuv7E8NS9SzWqRbgVNwH/dZwGhYDHUjI=
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7 h1:defJ42NPi/x3i8i6lPCm168TDvoeudlnrPI9kE5bQII=
cosmossdk.io/store v1.0.0-alpha.1.0.20230807110739-60198f077af7/go.mod h1:ah/SpIc7aroX8Cg9jRkaxNhtdUO6UW2fpVPk/XMBmmw=
cosmossdk.io/x/tx v0.9.1 h1:9pmmXA9Vs4qdouOFnzhsdsff2mif0f0kylMq5xTGhRI=
cosmossdk.io/x/tx v0.9.1/go.mod h1:/YFGTXG6+kyihd8YbfuJiXHV4R/mIMm2uvVzo80CIhA=
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
Expand Down