Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Node syncing error after transferring the validator private key files #531

Closed
Zheng-Shilin opened this issue Jul 5, 2021 · 17 comments
Closed
Assignees
Labels
kind/bug Something isn't working status/wontfix This will not be worked on

Comments

@Zheng-Shilin
Copy link

Bug description

I was migrating my validator node. I synced the new full node with version 0.16.0 from height 1 and it caught up fine. When the new node is fully synced, I stopped my old node and transferred the validator private key json file and the validator state json file, then I got an error restarting my new node. The log said: panic: unknown field "moniker_params" in types.Params.

@Zheng-Shilin Zheng-Shilin added the kind/bug Something isn't working label Jul 5, 2021
@leobragaz
Copy link
Contributor

@Zheng-Shilin, after the completing the sync did you update your node to v0.16.3?

@Zheng-Shilin
Copy link
Author

@bragaz yes by using git checkout tags. is that the right way?

I synced another node using statesync and with version v0.16.3 again. The new node caught up fine. Then I transferred the validator files but the same issue showed up. Here is the log
image

@Zheng-Shilin
Copy link
Author

@kwunyeung here is the log

-- Logs begin at Mon 2021-07-05 05:20:32 UTC. --
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]:         github.com/spf13/cobra@v1.1.3/command.go:897
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]: github.com/spf13/cobra.(*Command).ExecuteContext(...)
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]:         github.com/spf13/cobra@v1.1.3/command.go:890
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]: github.com/cosmos/cosmos-sdk/server/cmd.Execute(0xc000e57900, 0xc000e51410, 0xd, 0x217c638, 0xc000e3bb80)
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]:         github.com/cosmos/cosmos-sdk@v0.42.4/server/cmd/execute.go:36 +0x265
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]: main.main()
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]:         github.com/desmos-labs/desmos/app/desmos/main.go:17 +0x45
Jul 06 03:58:52 desmos-ddp3 cosmovisor[57046]: exit status 2
Jul 06 03:58:52 desmos-ddp3 systemd[1]: desmosd.service: Main process exited, code=exited, status=1/FAILURE
Jul 06 03:58:52 desmos-ddp3 systemd[1]: desmosd.service: Failed with result 'exit-code'.
Jul 06 03:58:55 desmos-ddp3 systemd[1]: desmosd.service: Scheduled restart job, restart counter is at 280.
Jul 06 03:58:55 desmos-ddp3 systemd[1]: Stopped Desmos Full Node.
Jul 06 03:58:55 desmos-ddp3 systemd[1]: Started Desmos Full Node.
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF starting ABCI with Tendermint
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting multiAppConn service impl=multiAppConn module=proxy
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting localClient service connection=query impl=localClient module=abci-client
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting localClient service connection=mempool impl=localClient module=abci-client
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting localClient service connection=consensus impl=localClient module=abci-client
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting EventBus service impl=EventBus module=events
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting PubSub service impl=PubSub module=pubsub
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF Starting IndexerService service impl=IndexerService module=txindex
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF created new capability module=ibc name=ports/transfer
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF port binded module=x/ibc/port port=transfer
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: 3:58AM INF claimed capability capability=1 module=transfer name=ports/transfer
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: panic: unknown field "moniker_params" in types.Params
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: goroutine 1 [running]:
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: github.com/cosmos/cosmos-sdk/codec.(*ProtoCodec).MustUnmarshalJSON(0xc0011d6310, 0xc00010e800, 0xff, 0x100, 0x2155b70, 0xc000024200)
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]:         github.com/cosmos/cosmos-sdk@v0.42.4/codec/proto_codec.go:160 +0x98
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: github.com/desmos-labs/desmos/x/profiles.AppModule.InitGenesis(0x217c638, 0xc0011d6310, 0x2146868, 0xc0011d7e60, 0x2176f58, 0xc0011d6310, 0x2176f58, 0xc0011d6310, 0xc0005b60d0, 0x2146868, ...)
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]:         github.com/desmos-labs/desmos/x/profiles/module.go:153 +0xa4
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: github.com/cosmos/cosmos-sdk/types/module.(*Manager).InitGenesis(0xc000fdd1f0, 0x21615c8, 0xc00011c010, 0x2176970, 0xc000589b40, 0x0, 0x0, 0xc00122e108, 0x11, 0x0, ...)
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]:         github.com/cosmos/cosmos-sdk@v0.42.4/types/module/module.go:304 +0x2b5
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: github.com/desmos-labs/desmos/app.(*DesmosApp).InitChainer(0xc001256400, 0x21615c8, 0xc00011c010, 0x2176970, 0xc000589b40, 0x0, 0x0, 0xc00122e108, 0x11, 0x0, ...)
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]:         github.com/desmos-labs/desmos/app/app.go:550 +0x170
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).InitChain(0xc0010ab040, 0x0, 0xed81a01d0, 0x0, 0xc00122e108, 0x11, 0xc000b90a60, 0x2e353a0, 0x0, 0x0, ...)
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]:         github.com/cosmos/cosmos-sdk@v0.42.4/baseapp/abci.go:62 +0x2f5
Jul 06 03:58:55 desmos-ddp3 cosmovisor[57064]: github.com/tendermint/tendermint/abci/client.(*localClient).InitChainSync(0xc000d816e0, 0x0, 0xed81a01d0, 0x0, 0xc00122e108, 0x11, 0xc000b90a60, 0x2e353a0, 0x0, 0x0, ...)

@kwunyeung
Copy link
Contributor

@Zheng-Shilin what happen if you use a non-validator priv_validator_key.json file again?

@Zheng-Shilin
Copy link
Author

@kwunyeung do you mean removing the current priv_validator_key.json file and unsafe-reset-all?

@RiccardoM
Copy link
Contributor

This error is due to the fact that when you stop Desmos and re-start it, it reads the genesis file again.

During our first on-chain upgrade we changed the x/profiles parameter names from moniker_params to nickname_params. This has caused the problem that if you try to read the genesis file using a new Desmos version (like in this case, where Desmos is restarted with v0.16.3), it won't be able to correctly read the genesis file and return that error.

Ideally, this error should be resolved in two ways:

  1. Using the state sync method that should allow to skip the genesis reading.
  2. Manually editing the genesis.json file updating the parameter names to match the ones that are used now.

Honestly I'm not sure about option (2) as it might change the genesis hash and I don't know if it's used somewhere though.

@kwunyeung
Copy link
Contributor

@RiccardoM this is what I'm confused. I expect it doesn't run InitChian as @Zheng-Shilin recovered this node with statesync.

@RiccardoM
Copy link
Contributor

@RiccardoM this is what I'm confused. I expect it doesn't run InitChian as @Zheng-Shilin recovered this node with statesync.

From what I can read she didn't. She used the fast sync and synced from block 1:

I synced the new full node with version 0.16.0 from height 1 and it caught up fine

@Zheng-Shilin
Copy link
Author

@RiccardoM the node I am pasting the log from used state sync method with version v0.16.3

@Zheng-Shilin
Copy link
Author

@RiccardoM this is what I'm confused. I expect it doesn't run InitChian as @Zheng-Shilin recovered this node with statesync.

From what I can read she didn't. She used the fast sync and synced from block 1:

I synced the new full node with version 0.16.0 from height 1 and it caught up fine

That was the node I synced last week and asked the question in Discord. Then yesterday I synced another node using state sync with v0.16.3, still ran into the same issue. The log I pasted here is the newest node using state sync.

@RiccardoM
Copy link
Contributor

RiccardoM commented Jul 6, 2021

Ok, then I think this might be a Cosmos bug. I've opened cosmos/cosmos-sdk#9637 to track this.

@ryuash
Copy link
Contributor

ryuash commented Jul 6, 2021

@Zheng-Shilin
I've just successfully migrated my node using state sync and desmos v0.16.3

@Zheng-Shilin
Copy link
Author

Zheng-Shilin commented Jul 6, 2021

It seems it works fine if starting the node first after transferring the validator files, then restore the validator key.

Oooops, I took back. After restoring my key, it works fine for a couple of minutes. I even voted for prop 9. Then my node is completely crashed. So I unsafe-reset-all and restart and the same error showed up 😂

@Zheng-Shilin
Copy link
Author

Would that because of the unsafe-reset-all?
image

@dadamu
Copy link
Contributor

dadamu commented Jul 9, 2021

@Zheng-Shilin The error happens because the field name in genesis.json file is old version.

Maybe we should have a custom migrate-genesis command like band protocol does to solve this issue?
Besides, It seems that the similar issue would happen if we change the message proto files. I think we should also handle the versioning problem.

What do you think? @RiccardoM @bragaz

@leobragaz
Copy link
Contributor

@dadamu Yes indeed it's something we're going to take care of for sure!

@RiccardoM
Copy link
Contributor

@dadamu @bragaz This was not something wrong within the code of Desmos nor in the code of the Cosmos SDK. It was just an error caused by a poorly written documentation that @ryuash has already taken care of with #549.

As written here, she was able to migrate the same node without any error.

@RiccardoM RiccardoM added the status/wontfix This will not be worked on label Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working status/wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

6 participants