Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More checks as part of config semantic validation #1388

Closed
5 tasks
Tracked by #1401 ...
ancazamfir opened this issue Sep 24, 2021 · 2 comments · Fixed by #1863
Closed
5 tasks
Tracked by #1401 ...

More checks as part of config semantic validation #1388

ancazamfir opened this issue Sep 24, 2021 · 2 comments · Fixed by #1863
Assignees
Labels
A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews I: dependencies Internal: related to dependencies I: logic Internal: related to the relaying logic I: rpc Internal: related to (g)RPC
Milestone

Comments

@ancazamfir
Copy link
Collaborator

ancazamfir commented Sep 24, 2021

Crate

relayer, relayer-cli

Summary

Warn users if full nodes/ chains are not properly configured in the following cases:

  1. staking module's historical_entries == 0
  2. full node does not have tx indexing enabled or aggressive pruning is configured (??)

Problem Definition

  1. To establish IBC connections between two SDK chains, the staking module must be configured with historical_entries > 0. This is to make sure that local header information is stored in the IBC state and therefore client proofs that are part of the connection handshake messages can be verified.
    Currently, if this is not the case, connection handshake fails with the following error:
deliver_tx .. reports error: code=Err(22), log=Log(\"failed to execute message; message index: 1: connection handshake open ack failed: self consensus state not found\")

The error is ambiguous and makes debugging hard as the same error can occur if the last client update on the source chain happened more than historical_entries before.

  1. If the full node does not have tx indexing enabled, hermes cannot clear packets or other datagrams as it needs to search Tx-es for IBC events.
    Currently, if indexing is not enabled hermes will rightly not relay for old events but debugging for the root cause is difficult

Proposal

Add the following to hermes health-check

  1. Add check for historical_entries > 0. As an example, there is a gRPC query for unbonding_period() that already gets the staking parameters here:
    https://github.com/informalsystems/ibc-rs/blob/4eb28ffcdd4d2217ff95635e0774899b1a459913/relayer/src/chain/cosmos.rs#L159
    Just extract the historical_entries as part of a new query.
  2. Add check that tx indexing is enabled. I believe the status() RPC endpoint returns this (buried 2 level deep in node_info.other.tx_index). Not sure if there is a way to retrieve the pruning parameters and check for aggressive pruning (like pruning = "everything" or small pruning-keep-recent).

Acceptance Criteria

Warn user on hermes start or hermes health-check in the cases above.
For 1. the relayer should not even attempt relaying for the chain.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere
Copy link
Member

adizere commented Sep 27, 2021

Regarding point 2 tx indexing. I recall one of the original problems that motivated us to add the health checkup mechanism was to detect and warn if tx indexing is disabled. The problem in issue #697 was actually because tx indexing was not enabled. The testing instructions in PR #1145 should cover this case, see "Internal error: transaction indexing is disabled" in that PR.

The way we currently check for tx indexing is to call into the tx_search endpoint.

https://github.com/informalsystems/ibc-rs/blob/4eb28ffcdd4d2217ff95635e0774899b1a459913/relayer/src/chain/cosmos.rs#L1979-L1987

If tx indexing is disabled, calling into this endpoint will result into an error "Internal error: transaction indexing is disabled (code: -32603)" and the overall health check fails.
Should we complement this check with additionally calling into status and checking node_info.other.tx_index?

@adizere adizere added this to the 10.2021 milestone Sep 27, 2021
@ancazamfir
Copy link
Collaborator Author

Ah, thanks for the reminder. I'm not sure we need both. Just pick one, my preferrence is status as I think it's faster.

@adizere adizere added I: dependencies Internal: related to dependencies I: logic Internal: related to the relaying logic I: rpc Internal: related to (g)RPC labels Sep 30, 2021
@adizere adizere modified the milestones: 10.2021, 01.2022 Nov 2, 2021
@romac romac added the A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews label Feb 8, 2022
@romac romac self-assigned this Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews I: dependencies Internal: related to dependencies I: logic Internal: related to the relaying logic I: rpc Internal: related to (g)RPC
Projects
No open projects
Status: Needs PR
Development

Successfully merging a pull request may close this issue.

3 participants