Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS-86: Graceful shutdown of Heimdall deamon, REST and Bridge #753

Merged
merged 19 commits into from
Dec 8, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 102 additions & 1 deletion bridge/cmd/start.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package cmd

import (
"context"
"fmt"
"os"
"os/signal"
Expand All @@ -19,6 +20,7 @@ import (
"github.com/spf13/cobra"
"github.com/tendermint/tendermint/libs/common"
httpClient "github.com/tendermint/tendermint/rpc/client"
"golang.org/x/sync/errgroup"

"github.com/maticnetwork/heimdall/helper"
"github.com/spf13/viper"
Expand All @@ -28,6 +30,105 @@ const (
waitDuration = 1 * time.Minute
)

// StartBridgeWithCtx starts bridge service and is able to shutdow gracefully
// returns service errors, if any
func StartBridgeWithCtx(shutdownCtx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doing the same thing as StartBridge does? I see a lot of repeated code. If so, please merge both functions together.

Copy link
Contributor Author

@momoshell momoshell Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature is changed, StartBridgeWithCtx takes context instead of bool flag, and returns an error so it could be used with error groups properly. And the logic was changed so we would be provided with better overall control.

StartBridge was left there since it's being used with the package init function, wrapped inside of a GetStartCmd. I think this is a support for older commands.

Also, could you please explain what do we gain from returned structs? Are we really sure we need structs here? Could easily return Close function itself, but that's not the point here.
Passing context enables us to initiate shutdown across multiple running requests or commands, simultaneously.

Anyway, as docs say: the same Context may be passed to functions running in different goroutines; Contexts are safe for simultaneous use by multiple goroutines.
This is the exact thing we need here for a graceful shutdown.

Keep in mind that our services, that were affected by these changes, also block and wait for interrupt signals.

var logger = helper.Logger.With("module", "bridge/cmd/")

// create codec
cdc := app.MakeCodec()
// queue connector & http client
_queueConnector := queue.NewQueueConnector(helper.GetConfig().AmqpURL)
_queueConnector.StartWorker()

_txBroadcaster := broadcaster.NewTxBroadcaster(cdc)
_httpClient := httpClient.NewHTTP(helper.GetConfig().TendermintRPCUrl, "/websocket")

// selected services to start
services := []common.Service{}
services = append(services,
listener.NewListenerService(cdc, _queueConnector, _httpClient),
processor.NewProcessorService(cdc, _queueConnector, _httpClient, _txBroadcaster),
)

// Start http client
err := _httpClient.Start()
if err != nil {
logger.Error("Error connecting to server: %v", err)
return err
}

// cli context
cliCtx := cliContext.NewCLIContext().WithCodec(cdc)
cliCtx.BroadcastMode = client.BroadcastAsync
cliCtx.TrustNode = true

// start bridge services only when node fully synced
loop := true
for loop {
select {
case <-shutdownCtx.Done():
return nil
case <-time.After(waitDuration):
if !util.IsCatchingUp(cliCtx) {
logger.Info("Node up to date, starting bridge services")
loop = false
} else {
logger.Info("Waiting for heimdall to be synced")
}
}
}

// start services
g := new(errgroup.Group)
for _, service := range services {
// loop variable must be captured
srv := service
g.Go(func() error {
if err := srv.Start(); err != nil {
logger.Error("GetStartCmd | serv.Start", "Error", err)
return err
}
<-srv.Quit()
return nil
})
}

// shutdown phase
g.Go(func() error {
// wait for interrupt and start shut down
<-shutdownCtx.Done()

logger.Info("Received stop signal - Stopping all services")
for _, service := range services {
srv := service
if srv.IsRunning() {
if err := srv.Stop(); err != nil {
logger.Error("GetStartCmd | service.Stop", "Error", err)
return err
}
}
}
// stop http client
if err := _httpClient.Stop(); err != nil {
logger.Error("GetStartCmd | _httpClient.Stop", "Error", err)
return err
}
// stop db instance
util.CloseBridgeDBInstance()

return nil
})

// wait for all routines to finish and log error
if err := g.Wait(); err != nil {
logger.Error("Bridge stopped", "err", err)
return err
}

return nil
}

// StartBridge starts bridge service, isStandAlone prevents os.Exit if the bridge started as side service
func StartBridge(isStandAlone bool) {
var logger = helper.Logger.With("module", "bridge/cmd/")
Expand Down Expand Up @@ -102,7 +203,7 @@ func StartBridge(isStandAlone bool) {
time.Sleep(waitDuration)
}

// strt all processes
// start all processes
for _, service := range services {
go func(serv common.Service) {
defer wg.Done()
Expand Down
Loading