Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e testing framework based on Osmosis #247

Merged
merged 6 commits into from
Dec 19, 2022
Merged

Add e2e testing framework based on Osmosis #247

merged 6 commits into from
Dec 19, 2022

Conversation

KonradStaniec
Copy link
Contributor

@KonradStaniec KonradStaniec commented Dec 14, 2022

Adds e2e test with ibc based on osmosis e2e test (https://github.com/osmosis-labs/osmosis/tree/main/tests/e2e)

As currently the only thing it does is:

  1. start two chains babylon chains with different ID
  2. start hermes relayer
  3. connect both chains through ibc using zoneconcierge port.

Two run tests:

  1. first run create babylon docker image through make localnet-build-env
  2. run make test-e2e

Not adding to CI yet as, it fails on current babylon dev docker image but works on main docker image.

On current dev it is hermes releyer that fails with:

        2022-12-14T15:59:41.899385Z ERROR ThreadId(01) [bbn-test-b -> bbn-test-a:07-tendermint-0]  failed CreateClient: error raised while creating client for chain bbn-test-b: failed when building client state: gRPC call failed with status: status: Internal, message: "UnmarshalJSON cannot decode empty bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "0"} }

@SebastianElvis any leads what could have changed between current main and dev that could lead to that ?


func (bc *baseConfigurer) connectIBCChains(chainA *chain.Config, chainB *chain.Config) error {
bc.t.Logf("connecting %s and %s chains via IBC", chainA.ChainMeta.Id, chainB.ChainMeta.Id)
cmd := []string{"hermes", "create", "channel", chainA.ChainMeta.Id, chainB.ChainMeta.Id, "--port-a=zoneconcierge", "--port-b=zoneconcierge"}
Copy link
Contributor Author

@KonradStaniec KonradStaniec Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SebastianElvis another ibc question to which maybe you now answer, and spare me some debuggin 😅

If we change port-a and port-b here to transfer the the test will fail, which expected. What is less expected is that two nodes from chain bbn-test-a will be killed with consensus failure looking like:

4:12PM ERR CONSENSUS FAILURE!!! err="module does not own channel capability: sourcePort: transfer, sourceChannel: channel-0: channel capability not found" module=consensus stack="goroutine 175 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:724 +0x4c\npanic({0x1f274a0, 0xc0034559e0})\n\truntime/panic.go:838 +0x207\ngithub.com/babylonchain/babylon/x/zoneconcierge.EndBlocker({{0x2b84e00, 0xc00019e000}, {0x2b93140, 0xc0032d6c40}, {{0xb, 0x0}, {0xc0015ac830, 0xa}, 0x1b, {0x466d8e7, ...}, ...}, ...}, ...)\n\tgithub.com/babylonchain/babylon/x/zoneconcierge/abci.go:27 +0x299\ngithub.com/babylonchain/babylon/x/zoneconcierge.AppModule.EndBlock(...)\n\tgithub.com/babylonchain/babylon/x/zoneconcierge/module.go:161\ngithub.com/cosmos/cosmos-sdk/types/module.(*Manager).EndBlock(_, {{0x2b84e00, 0xc00019e000}, {0x2b93140, 0xc0032d6c40}, {{0xb, 0x0}, {0xc0015ac830, 0xa}, 0x1b, ...}, ...}, ...)\n\tgithub.com/cosmos/cosmos-sdk@v0.46.6/types/module/module.go:505 +0x482\ngithub.com/babylonchain/babylon/app.(*BabylonApp).EndBlocker(_, {{0x2b84e00, 0xc00019e000}, {0x2b93140, 0xc0032d6c40}, {{0xb, 0x0}, {0xc0015ac830, 0xa}, 0x1b, ...}, ...}, ...)\n\tgithub.com/babylonchain/babylon/app/app.go:724 +0x78\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).EndBlock(0xc000d768c0, {0x0?})\n\tgithub.com/cosmos/cosmos-sdk@v0.46.6/baseapp/abci.go:206 +0x165\ngithub.com/tendermint/tendermint/abci/client.(*localClient).EndBlockSync(0xc000e1b800, {0xc000e1b800?})\n\tgithub.com/tendermint/tendermint@v0.34.24/abci/client/local_client.go:288 +0xdf\ngithub.com/tendermint/tendermint/proxy.(*appConnConsensus).EndBlockSync(0xc003384d40?, {0xc000f0d200?})\n\tgithub.com/tendermint/tendermint@v0.34.24/proxy/app_conn.go:89 +0x24\ngithub.com/tendermint/tendermint/state.execBlockOnProxyApp({0x2b85fb8?, 0xc0011f4300}, {0x2b8be10, 0xc000e8a750}, 0xc0000012c0, {0x2b92930, 0xc00022c540}, 0x1a?)\n\tgithub.com/tendermint/tendermint@v0.34.24/state/execution.go:327 +0x54e\ngithub.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0xc0011e11c0, 0x7}}, {0xc0011e11d0, 0xa}, 0x1, 0x1a, {{0xc00501be80, ...}, ...}, ...}, ...)\n\tgithub.com/tendermint/tendermint@v0.34.24/state/execution.go:140 +0x171\ngithub.com/tendermint/tendermint/consensus.(*State).finalizeCommit(0xc000c84700, 0x1b)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1654 +0xafd\ngithub.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit(0xc000c84700, 0x1b)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1563 +0x2ff\ngithub.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1498 +0x87\ngithub.com/tendermint/tendermint/consensus.(*State).enterCommit(0xc000c84700, 0x1b, 0x0)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1536 +0xcb7\ngithub.com/tendermint/tendermint/consensus.(*State).addVote(0xc000c84700, 0xc0013f5c20, {0xc000059530, 0x28})\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:2150 +0xb7c\ngithub.com/tendermint/tendermint/consensus.(*State).tryAddVote(0xc000c84700, 0xc0013f5c20, {0xc000059530?, 0xc00157e300?})\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1948 +0x2c\ngithub.com/tendermint/tendermint/consensus.(*State).handleMsg(0xc000c84700, {{0x2b67400?, 0xc004b2a6f8?}, {0xc000059530?, 0x0?}})\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:853 +0x44b\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine(0xc000c84700, 0x0)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:760 +0x419\ncreated by github.com/tendermint/tendermint/consensus.(*State).OnStart\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:379 +0x12d\n"

This imo sounds critical as it means anybody could try to open ibc channel on transfer port and kill our whole network ? But maybe I am missing something so thats why I am asking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are using main, given the path github.com/babylonchain/babylon/x/zoneconcierge/abci.go:27 in error trace (this file only has 24 lines in dev). In main, this line is sending IBC heartbeat packet. It has been removed in dev.

Also, how did you create a channel between two Babylon chains with port transfer rather than zoneconcierge? Such channel request will be rejected by verification rules here and here. If we somehow worked around these rules, then this panic would happen upon sending IBC packets since the ZoneConcierge module only has capability for zoneconcierge port but not other ones.

@SebastianElvis
Copy link
Member

SebastianElvis commented Dec 15, 2022

On current dev it is hermes releyer that fails with:

        2022-12-14T15:59:41.899385Z ERROR ThreadId(01) [bbn-test-b -> bbn-test-a:07-tendermint-0]  failed CreateClient: error raised while creating client for chain bbn-test-b: failed when building client state: gRPC call failed with status: status: Internal, message: "UnmarshalJSON cannot decode empty bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "0"} }

@SebastianElvis any leads what could have changed between current main and dev that could lead to that ?

I used to try Hermes relayer for integrating Injective, and encountered this issue as well (even in the main branch that was used in demo). The root cause seems to be related to GRPC and Hermes configurations. We discontinued to use Hermes and ended up using an unstable version of IBC-Go that supports Injective.

I will spend some time for debugging this later today.

@SebastianElvis
Copy link
Member

Right, so the two problems seem to share the same root cause: the genesis does not include BLS PKs.

When executing make test-e2e, all chains fail with the following error. Cosmos SDK has a weird behaviour that after the chain has a consensus failure, it keeps running but returns empty stuff upon GRPC queries. This makes Hermes relayer to give us the error UnmarshalJSON cannot decode empty bytes.

6:14AM ERR CONSENSUS FAILURE!!! err="failed to store validator BLS set at epoch 1: failed to get BLS public key of address [83 67 172 142 105 206 253 198 16 180 227 54 154 123 31 148 224 78 11 78]: BLS public key does not exist with address bbnvaloper12dp6ernfem7uvy95uvmf57cljnsyuz6wy8d7jp: BLS public key does not exist [cosmossdk.io/errors@v1.0.0-beta.7/errors.go:153]" module=consensus stack="goroutine 175 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:724 +0x4c\npanic({0x1e5dd20, 0xc00322b860})\n\truntime/panic.go:838 +0x207\ngithub.com/babylonchain/babylon/x/checkpointing.BeginBlocker({{0x2ba13c0, 0xc00012a000}, {0x2baf760, 0xc000cddc00}, {{0xb, 0x0}, {0xc001641d80, 0xa}, 0x1, {0x27c01620, ...}, ...}, ...}, ...)\n\tgithub.com/babylonchain/babylon/x/checkpointing/abci.go:29 +0x879\ngithub.com/babylonchain/babylon/x/checkpointing.AppModule.BeginBlock(...)\n\tgithub.com/babylonchain/babylon/x/checkpointing/module.go:172\ngithub.com/cosmos/cosmos-sdk/types/module.(*Manager).BeginBlock(_, {{0x2ba13c0, 0xc00012a000}, {0x2baf760, 0xc000cddc00}, {{0xb, 0x0}, {0xc001641d80, 0xa}, 0x1, ...}, ...}, ...)\n\tgithub.com/cosmos/cosmos-sdk@v0.46.6/types/module/module.go:484 +0x398\ngithub.com/babylonchain/babylon/app.(*BabylonApp).BeginBlocker(_, {{0x2ba13c0, 0xc00012a000}, {0x2baf760, 0xc000cddc00}, {{0xb, 0x0}, {0xc001641d80, 0xa}, 0x1, ...}, ...}, ...)\n\tgithub.com/babylonchain/babylon/app/app.go:734 +0x7b\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock(_, {{0xc003244b80, 0x20, 0x20}, {{0xb, 0x0}, {0xc001641d80, 0xa}, 0x1, {0x27c01620, ...}, ...}, ...})\n\tgithub.com/cosmos/cosmos-sdk@v0.46.6/baseapp/abci.go:183 +0x843\ngithub.com/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync(_, {{0xc003244b80, 0x20, 0x20}, {{0xb, 0x0}, {0xc001641d80, 0xa}, 0x1, {0x27c01620, ...}, ...}, ...})\n\tgithub.com/tendermint/tendermint@v0.34.24/abci/client/local_client.go:280 +0x118\ngithub.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync(_, {{0xc003244b80, 0x20, 0x20}, {{0xb, 0x0}, {0xc001641d80, 0xa}, 0x1, {0x27c01620, ...}, ...}, ...})\n\tgithub.com/tendermint/tendermint@v0.34.24/proxy/app_conn.go:81 +0x55\ngithub.com/tendermint/tendermint/state.execBlockOnProxyApp({0x2ba2578?, 0xc0010360c0}, {0x2ba8380, 0xc0015d53f0}, 0xc0012643c0, {0x2baef58, 0xc000542318}, 0x0?)\n\tgithub.com/tendermint/tendermint@v0.34.24/state/execution.go:307 +0x3dd\ngithub.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0xc0015ed610, 0x7}}, {0xc0015ed620, 0xa}, 0x1, 0x0, {{0x0, ...}, ...}, ...}, ...)\n\tgithub.com/tendermint/tendermint@v0.34.24/state/execution.go:140 +0x171\ngithub.com/tendermint/tendermint/consensus.(*State).finalizeCommit(0xc000cab180, 0x1)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1654 +0xafd\ngithub.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit(0xc000cab180, 0x1)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1563 +0x2ff\ngithub.com/tendermint/tendermint/consensus.(*State).enterCommit.func1()\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1498 +0x87\ngithub.com/tendermint/tendermint/consensus.(*State).enterCommit(0xc000cab180, 0x1, 0x0)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1536 +0xcb7\ngithub.com/tendermint/tendermint/consensus.(*State).addVote(0xc000cab180, 0xc001266780, {0x0, 0x0})\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:2150 +0xb7c\ngithub.com/tendermint/tendermint/consensus.(*State).tryAddVote(0xc000cab180, 0xc001266780, {0x0?, 0x47e6e6?})\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:1948 +0x2c\ngithub.com/tendermint/tendermint/consensus.(*State).handleMsg(0xc000cab180, {{0x2b837a0?, 0xc000010b08?}, {0x0?, 0x0?}})\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:853 +0x44b\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine(0xc000cab180, 0x0)\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:780 +0x512\ncreated by github.com/tendermint/tendermint/consensus.(*State).OnStart\n\tgithub.com/tendermint/tendermint@v0.34.24/consensus/state.go:379 +0x12d\n"

The above error means that the BLS PK is not specified when initialising genesis. Without the BLS PK, the checkpointing keeper will panic when initialising the BLS validator set for a new epoch. To fix this, we will need to manually insert BLS PKs to the validator when initialising genesis, possibly here. An example is https://github.com/babylonchain/babylon/pull/242/files#diff-860afb36e1ac0fcfc5e7680274639a6e3861bf4c9c09856a147a38b65aa4e6eeR47-R55.

Copy link
Contributor

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I added some comments regarding BLS key generation and we should insert BLS keys into the genesis state of the checkpointing module.

mnemonic string
keyInfo *keyring.Record
privateKey cryptotypes.PrivKey
consensusKey privval.FilePVKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have Babylon private validator type called privval.WrappedFilePVKey which includes BLS keys. Maybe we should use that

return err
}

filePV := privval.LoadOrGenFilePV(pvKeyFile, pvStateFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use privval.LoadOrGenWrappedFilePV to generate validator key with BLS keys

return nil
}

func initGenesis(chain *internalChain, votingPeriod, expeditedVotingPeriod time.Duration, forkHeight int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure that every genesis validator has a BLS key, we should also insert BLS keys into genesis state of the checkpointing module. Please see an example here

@KonradStaniec
Copy link
Contributor Author

@SebastianElvis thanks for debugging this.

When executing make test-e2e, all chains fail with the following error. Cosmos SDK has a weird behaviour that after the chain has a consensus failure, it keeps running but returns empty stuff upon GRPC queries. This makes Hermes relayer to give us the error UnmarshalJSON cannot decode empty bytes.

Thats a bit weird as when i have babylon image built form main branch, the test passes. Imo it would be great if node would not start if validators do not have public keys specified in genesis. Nevertheless,I will try to add those keys (this should be pretty similar to the staff we currently do in testnet command), and see what happens.

@SebastianElvis
Copy link
Member

@SebastianElvis thanks for debugging this.

When executing make test-e2e, all chains fail with the following error. Cosmos SDK has a weird behaviour that after the chain has a consensus failure, it keeps running but returns empty stuff upon GRPC queries. This makes Hermes relayer to give us the error UnmarshalJSON cannot decode empty bytes.

Thats a bit weird as when i have babylon image built form main branch, the test passes. Imo it would be great if node would not start if validators do not have public keys specified in genesis. Nevertheless,I will try to add those keys (this should be pretty similar to the staff we currently do in testnet command), and see what happens.

Ah sorry I forgot to mention that, this change was introduced in 80c406a, which is indeed not in the main branch.

@KonradStaniec
Copy link
Contributor Author

@SebastianElvis @gitferry I have added bls keys to validators and genesis. I have also needed to add non validator nodes to serve as relayer nodes, as when using validators there were constant failures due to race conditions with bls transactions.

Now everything should work smoothly. I would add this setup to CI in separate pr though.

Copy link
Member

@SebastianElvis SebastianElvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of work, thanks!


var (
// Last nodes are non validator nodes to serve as the ones using relayer. Out
// validator are constantly sending bls transactions which make relayer operatrions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// validator are constantly sending bls transactions which make relayer operatrions
// validators are constantly sending bls transactions which make relayer operatrions

// validator are constantly sending bls transactions which make relayer operatrions
// fail constantly

// each started validator containers corresponds to one of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// each started validator containers corresponds to one of
// each started validator container corresponds to one of


// TODO currently using image hosted by osmolab we should probably use our own
// Hermes repo/version for relayer
relayerRepository = "osmolabs/hermes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just asking, why does Osmosis use its own fork of Hermes, rather than using the official version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I did not investigate it that much, just assumed it is somehow necessary for the starting script to work. It is on my follow up task list to either have our own fork or official hermes docker image used in here .

valPubKey, err := cryptocodec.FromTmPubKeyInterface(node.consensusKey.PubKey)

if err != nil {
panic("It should be possible retrieve validator public key")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
panic("It should be possible retrieve validator public key")
panic("It should be possible to retrieve validator public key")

da, err := sdk.AccAddressFromBech32(node.consensusKey.DelegatorAddress)

if err != nil {
panic("is should be possible to get validator address from delegator address")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
panic("is should be possible to get validator address from delegator address")
panic("It should be possible to get validator address from delegator address")

@KonradStaniec KonradStaniec merged commit d8f9a1b into dev Dec 19, 2022
@KonradStaniec KonradStaniec deleted the ibc-test branch December 19, 2022 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants