Skip to content

Commit

Permalink
node: Flow cancel enhancements and bug fixes (#4016)
Browse files Browse the repository at this point in the history
* node: Fix issue where transfers that were loaded from the DB did not add
a flow-cancel transfer on the TargetChain

Flow-canceling is done in the `ProcessMsgForTime` loop when a new
message occurs. However, this was not done when a node restarted and
reloaded transfers from the past 24 hours. As a result it was possible
for the node to calculate a result that showed that the outgoing
transfers for an emitter chain exceeded the daily limit. In effect this
is true but only with the condition that there was incoming flow to
allow this to happen. This appeared to violate an invariant and so the
node did not start properly.

node: Add unit tests when reloading flow cancel transactions from the
database

node: fix lint errors in governor_test.go

* node: Add a command-line flag to enable or disable flow-canceling on restart

Added a command-line flag to enable or disable flow-canceling when
starting the node. This should allow Guardians to disable flow canceling
in the case of future bugs or during a security incident. This should
prevent the need to rollback to earlier Guardian versions. (@mdulin2 )

* node: Use deterministic iteration order over chains when changing Governor state

- Adds a field that stores a sorted slice of chain IDs to the governor.
- Use this field to iterate in a determinstic order when performing
  actions that change the state of the Governor
- This should help Guardians reach a more similar view of the Governor
  in scenarios where iteration order might impact whether a transfer is
  queued. (This is relevant especially in the case of Flow Canceling)
- Cases where only a single VAA is being modified were not changed.
  Iteration order should not matter here and determinstic order may
  may worse for performance when searching for a particular element.

* node: Fix tokenEntry when checking flow cancel for pending transfers

(Squash and merge bug fix from PR #4001)
Similar to a previous issue in the function `ProcessMsgForTime`, the
tokenEntry was not being generated properly.
This should result in queued "small transfers" being able to flow cancel
when they are released from the queue.
Also adds a comment on the CheckedInt64 function to indicate what its
error states mean and when they occur.

Add comments and change variable names for governor_monitoring
- Add function comments to explain what they do and what their error
  states mean
- Adds governor logging to error cases
- Change variable names in publishStatus function. `value` was used
  first to indicate the "governor usage" and then reused to indicate the
  remaining available notional value for a chain. This refactor tries to
  make it clear that these are different concepts

Add unit test for flow cancelling when a pending transfer is
released

- Add a unit test to ensure that, when a pending transfer is released,
  it also does flow-cancelling on the TargetChain (previously we had a
  bug here)
- Add documentation for CheckPendingForTime to clarify that it has
  side-effects

* node: Modify error handling for CheckPending method in the Governor

Previous rollouts of the Flow Cancel feature contained issues when
calculating the Governor usage when usage was near the daily limit. This
caused an invariant to be violated. However, this was propagated to the
processor code and resulted in the processor restarting the entire
process. Instead, the Governor should simply fail-closed and report that
there is no remaining capacity, causing further VAAs to be queued until
the usage diminishes over time.
The circumstances leading to the invariant violations are not addressed
in this commit. Instead this commit reworks the way errors are handled
by the CheckPending, making careful choices about when the process
should or should not be killed.

- Change "invariant" error handling: instead of causing the process to
  die, log an error and skip further for a single chain while allowing
  processing for other chains to continue
- Remove 'invariant error' in TrimAndSumValueForChain as it can occur
  somewhat regularly with the addition of the flow cancel feature
- Return dailyLimit in error condition rather than 0 so that future
  transfers will be queued
- Do not cap the sum returned from TrimAndSumValueForChain: instead
  allow it to exceed the daily limit.
- Modify unit tests to reflect this
- Add unit tests for overflow/underflow scenarios in the TrimAndSumValue
  functions
- Change other less severe error cases to log warnings instead of
  returning errors.
- Generally prevent flow-cancel related issues from affecting normal
  Governor operations. Instead the flow cancel transfers should simply
  not be populated and thus result in "GovernorV1" behavior.
- Add documentation to CheckPendingForTime to explain the dangers of
  returning an error
- Reword error messages to be more precise and include more relevant
  fields. Add documentation explaining when the process should and
  should not die

* node: Add additional metrics for Governor status

Modify the monitoring code and protobuf files to make the status of the
Governor more legible when flow-canceling is enabled. This can be
consumed by Wormhole Dashboard to better reflect the effects of flow
cancelling.

On the level of the Governor:
- whether the Guardian has enabled flow cancel or not

On the level of the Governor's emitters, reports 24h metrics for:
- net value that has moved across the chain
- total outgoing amount
- total incoming flow cancel amount

Currently big transfers are not accounted for as they do not affect the
Governor's capacity. (They are always queued.)

* node: Add new flow cancel parameter to Governor in tests

* node: goimports formatting

* node: Bug fix in changes to governor monitoring

- Fix issue where stats weren't being populated unless flow cancel was
  enabled
- Fix wrong return value used in unit test
- Fix typo in proto variable name
- Move sorting outside of a for loop for efficiency
- Restore unit test that was deleted in the process of rebasing

* node: address prealloc lint error in governor code

* node: Fix "generated proto differs from committed proto"

* node: Fix bug in chainIds allocation

- This resolves a mistake with allocating the chainIds in the governor
initialization that causes nil entries in the slice.
- Add unit tests to ensure that the chainIds slice matches the chains
  map
- Add unit test to ensure that TrimAndSumValueForChain checks for a nil
  pointer to avoid panics

* node: Fix returning nil on err in governor_test.go

* node: Cleanup comments in governor code

* node: fix governor comment

* node: enable flow cancel in governor_monitoring tests

* node: Add flow cancel information to p2p heartbeat features

* node: Remove outdated comment from governor

* node: Upgrade logs to Error from Warn when reloading transfers from
database

* node: Enable flow cancel in check_query test function

* node: Cleanup comments and redundant code in governor

* node: Refactor how the flow cancel token list gets populated

- Only populate the flow cancel tokens list once
- Change default behavior to use an empty flow cancel assets list, rather
than first populating the list and then clearing it
- Refactor the logic around enabling the flow cancel token field for
  governed assets. Now it only executes if flow cancel is enabled,
  rather than operating over an empty slice when flow cancel is disabled
- Modify devnet/testnet configs so that they are responsible for
  returning the correct list of flow cancelling assets

* node: Add unit test for flow cancel feature flag

* node: Move new Governor status proto fields from Emitter to Chain

* node: lint governor_monitoring

---------

Co-authored-by: Maxwell Dulin <strikeout@maxwells-mbp.lan>
  • Loading branch information
johnsaigle and Maxwell Dulin authored Jul 30, 2024
1 parent 038d76b commit 5042ff1
Show file tree
Hide file tree
Showing 16 changed files with 817 additions and 172 deletions.
11 changes: 9 additions & 2 deletions node/cmd/guardiand/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,8 @@ var (
// Prometheus remote write URL
promRemoteURL *string

chainGovernorEnabled *bool
chainGovernorEnabled *bool
governorFlowCancelEnabled *bool

ccqEnabled *bool
ccqAllowedRequesters *string
Expand Down Expand Up @@ -435,6 +436,7 @@ func init() {
promRemoteURL = NodeCmd.Flags().String("promRemoteURL", "", "Prometheus remote write URL (Grafana)")

chainGovernorEnabled = NodeCmd.Flags().Bool("chainGovernorEnabled", false, "Run the chain governor")
governorFlowCancelEnabled = NodeCmd.Flags().Bool("governorFlowCancelEnabled", false, "Enable flow cancel on the governor")

ccqEnabled = NodeCmd.Flags().Bool("ccqEnabled", false, "Enable cross chain query support")
ccqAllowedRequesters = NodeCmd.Flags().String("ccqAllowedRequesters", "", "Comma separated list of signers allowed to submit cross chain queries")
Expand Down Expand Up @@ -541,6 +543,11 @@ func runNode(cmd *cobra.Command, args []string) {
os.Exit(1)
}

if !(*chainGovernorEnabled) && *governorFlowCancelEnabled {
fmt.Println("Flow cancel can only be enabled when the governor is enabled")
os.Exit(1)
}

logger := zap.New(zapcore.NewCore(
consoleEncoder{zapcore.NewConsoleEncoder(
zap.NewDevelopmentEncoderConfig())},
Expand Down Expand Up @@ -1575,7 +1582,7 @@ func runNode(cmd *cobra.Command, args []string) {
node.GuardianOptionDatabase(db),
node.GuardianOptionWatchers(watcherConfigs, ibcWatcherConfig),
node.GuardianOptionAccountant(*accountantWS, *accountantContract, *accountantCheckEnabled, accountantWormchainConn, *accountantNttContract, accountantNttWormchainConn),
node.GuardianOptionGovernor(*chainGovernorEnabled),
node.GuardianOptionGovernor(*chainGovernorEnabled, *governorFlowCancelEnabled),
node.GuardianOptionGatewayRelayer(*gatewayRelayerContract, gatewayRelayerWormchainConn),
node.GuardianOptionQueryHandler(*ccqEnabled, *ccqAllowedRequesters),
node.GuardianOptionAdminService(*adminSocketPath, ethRPC, ethContract, rpcMap),
Expand Down
2 changes: 1 addition & 1 deletion node/pkg/adminrpc/adminserver_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ func Test_adminCommands(t *testing.T) {
}

func newNodePrivilegedServiceForGovernorTests() *nodePrivilegedService {
gov := governor.NewChainGovernor(zap.NewNop(), &db.MockGovernorDB{}, wh_common.GoTest)
gov := governor.NewChainGovernor(zap.NewNop(), &db.MockGovernorDB{}, wh_common.GoTest, false)

return &nodePrivilegedService{
db: nil,
Expand Down
7 changes: 5 additions & 2 deletions node/pkg/governor/devnet_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,11 @@ func (gov *ChainGovernor) initDevnetConfig() ([]tokenConfigEntry, []tokenConfigE
{chain: 2, addr: "000000000000000000000000DDb64fE46a91D46ee29420539FC25FD07c5FEa3E", symbol: "WETH", coinGeckoId: "weth", decimals: 8, price: 1174},
}

flowCancelTokens := []tokenConfigEntry{
{chain: 1, addr: "3b442cb3912157f13a933d0134282d032b5ffecd01a2dbf1b7790608df002ea7", symbol: "USDC", coinGeckoId: "usdc", decimals: 6, price: 1}, // Addr: 4zMMC9srt5Ri5X14GAgXhaHii3GnPAEERYPJgZJDncDU, Notional: 1
flowCancelTokens := []tokenConfigEntry{}
if gov.flowCancelEnabled {
flowCancelTokens = []tokenConfigEntry{
{chain: 1, addr: "3b442cb3912157f13a933d0134282d032b5ffecd01a2dbf1b7790608df002ea7", symbol: "USDC", coinGeckoId: "usdc", decimals: 6, price: 1}, // Addr: 4zMMC9srt5Ri5X14GAgXhaHii3GnPAEERYPJgZJDncDU, Notional: 1
}
}

chains := []chainConfigEntry{
Expand Down
Loading

0 comments on commit 5042ff1

Please sign in to comment.