-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POS-86: Graceful shutdown of Heimdall deamon, REST and Bridge #753
Conversation
momoshell
commented
Dec 2, 2021
- Graceful shutdown of Heimdall, REST, and Bridge are all initiated by the same context signaled by the OS on interrupt and program termination
- Necessary code duplications are due to the private functions and types from Cosmos SDK that needed changes to accommodate shutdowns
Codecov Report
@@ Coverage Diff @@
## v2 #753 +/- ##
=======================================
Coverage 72.88% 72.88%
=======================================
Files 49 49
Lines 3507 3507
=======================================
Hits 2556 2556
Misses 723 723
Partials 228 228 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git history seems to have garbage, can you rebase with master?
We'll most definitely have to squash the commits... |
@vcastellm not sure about master though, since this was done through V2 fork, hence all the noise. @sh3ll3x3c agree, definitely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, it turns out that this "small" change include a lot of code changes with respect other PRs. I would like to propose another way to handle the graceful shutdown.
Now, what you do is to have some "start" functions that setup the service (i.e. bridge, rest, daemon) and wait for it to finish. Then, you need to pass around the ctx
so that they know when to stop.
Instead, something I usually do is for these "start" functions to only start (no wait to finish) and return some structs that export a Close
function. Then, it is up to the functions creating and running the services (in this case, the cli commands) to start everything, wait for interrupt signal and Close every service. This Close
function can be blocking and wait for it to finish everything.
@@ -28,6 +30,105 @@ const ( | |||
waitDuration = 1 * time.Minute | |||
) | |||
|
|||
// StartBridgeWithCtx starts bridge service and is able to shutdow gracefully | |||
// returns service errors, if any | |||
func StartBridgeWithCtx(shutdownCtx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this doing the same thing as StartBridge
does? I see a lot of repeated code. If so, please merge both functions together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature is changed, StartBridgeWithCtx takes context instead of bool flag, and returns an error so it could be used with error groups properly. And the logic was changed so we would be provided with better overall control.
StartBridge was left there since it's being used with the package init function, wrapped inside of a GetStartCmd. I think this is a support for older commands.
Also, could you please explain what do we gain from returned structs? Are we really sure we need structs here? Could easily return Close function itself, but that's not the point here.
Passing context enables us to initiate shutdown across multiple running requests or commands, simultaneously.
Anyway, as docs say: the same Context may be passed to functions running in different goroutines; Contexts are safe for simultaneous use by multiple goroutines.
This is the exact thing we need here for a graceful shutdown.
Keep in mind that our services, that were affected by these changes, also block and wait for interrupt signals.
cmd/heimdalld/main.go
Outdated
@@ -98,12 +102,14 @@ func GetSignerInfo(pub crypto.PubKey, priv []byte, cdc *codec.Codec) ValidatorAc | |||
|
|||
func main() { | |||
cdc := app.MakeCodec() | |||
ctx := server.NewDefaultContext() | |||
serverCtx := server.NewDefaultContext() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable change is creating a lot of new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is passed to only two functions from the main itself, not sure I get this comment.
Also, we could merge server context and standard context with context.WithValue, where server context would be nested.
} | ||
|
||
// ServeCommands will generate a long-running rest server | ||
// (aka Light Client Daemon) that exposes functionality similar | ||
// to the cli, but over rest | ||
func ServeCommands(cdc *codec.Codec, registerRoutesFn func(*lcd.RestServer)) *cobra.Command { | ||
func ServeCommands(shutdownCtx ctx.Context, cdc *codec.Codec, registerRoutesFn func(ctx client.CLIContext, mux *mux.Router)) *cobra.Command { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should pass this shutdownCtx
to each command because the shutdown signal should be something specific to each execution of a command, this is, something to be created inside the Run
command functions.
Could we do:
RunE: func(cmd *cobra.Command, args []string) error {
shutdownCtx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer stop()
err := StartRestServer(shutdownCtx, cdc, registerRoutesFn, restCh)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, each command would have created a context in its scope, where needed of course?
I could be wrong, but when we run our binary, each cmd should execute with the same context, since our main function is responsible for each execution of our defined commands?
Threading context down is a standard mechanism, not sure what I'm getting wrong here.
registerRoutesFn(rs) | ||
const shutdownTimeout = 10 * time.Second | ||
|
||
func StartRestServer(mainCtx ctx.Context, cdc *codec.Codec, registerRoutesFn func(ctx client.CLIContext, mux *mux.Router), restCh chan struct{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to call it mainCtx
and not just ctx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done, fairly simple to change
@@ -66,11 +219,11 @@ func ServeCommands(cdc *codec.Codec, registerRoutesFn func(*lcd.RestServer)) *co | |||
} | |||
|
|||
// RegisterRoutes register routes of all modules | |||
func RegisterRoutes(rs *lcd.RestServer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That RestServer struct is imported from Cosmos SDK. For us, to be able to shut down the REST server properly, we needed access to the listener field, which is private in this case.
Not copying the RestServer struct from the package gave us control, and as an added bonus we got that logger that was also private.
Now, since the RestServer was not used in our logic for starting the server itself, the signature of RegisterRoutes was changed accordingly. And there is no need for a whole struct to be passed, since RegisterRoutes needs only two params.
@ferranbt we've cleaned up what was discussed, should be good to go now... |
Hey @ferranbt we spotted a bug during heimdall initialization so will make that fix in this PR. |
@ferranbt we made fix to heimdall initialization so it should be ready for review again. |