Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of health check and only perform it on hermes start #1361

Merged
merged 13 commits into from
Sep 19, 2021

Conversation

adizere
Copy link
Member

@adizere adizere commented Sep 16, 2021

Closes: #1336

Description

This PR also adds a health-check command:

USAGE:
    hermes health-check <OPTIONS>

DESCRIPTION:
    Performs a health check of all chains in the the config
$ hermes health-check
Sep 16 21:54:03.360  INFO using default configuration from '/Users/coromac/.hermes/config.toml'
Sep 16 21:54:03.372  INFO [ibc-0] performing health check...
Sep 16 21:54:03.436  INFO [ibc-0] chain is healthy
Sep 16 21:54:03.437  INFO [ibc-1] performing health check...
Sep 16 21:54:03.443  INFO [ibc-1] chain is healthy
Success: performed health check for all chains in the config

For contributor use:

  • Added a changelog entry, using unclog.
  • If applicable: Unit tests written, added test to CI.
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Updated relevant documentation (docs/) and code comments.
  • Re-reviewed Files changed in the Github PR explorer.

@romac romac changed the title Initial implementation logic for the health-check command Improve performance of health check and only perform it on hermes start Sep 16, 2021
@romac romac requested a review from andynog September 16, 2021 19:47
@andynog andynog closed this Sep 17, 2021
@andynog andynog reopened this Sep 17, 2021
@andynog
Copy link
Contributor

andynog commented Sep 17, 2021

@romac the code lgtm. Couple of questions:

I've tested it with a config that has to chains (cosmoshub-4 and osmosis-1). In the cosmoshub chain config I deliberately set an invalid value for the max_tx_size but for osmosis is fine. So when running the command I see that the health-check errors out for one of them but not the other. Should this be a condition to fail to start the relayer or it's just for logging and an operator will have to check the log to identify an issue with the config?

hermes -c config.toml health-check
Sep 17 10:56:56.468  INFO [cosmoshub-4] performing health check...
Sep 17 10:56:56.829  WARN Hermes might be misconfigured for chain 'cosmoshub-4'
Sep 17 10:56:56.829  WARN     Reason: 
   0: semantic config validation failed for option `max_tx_size` chain 'cosmoshub-4', reason: `max_tx_size` = 2097152 is greater than 90% of the genesis block param `max_size` = 200000

Location:
   /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/flex-error-0.4.2/src/tracer_impl/eyre.rs:10

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Sep 17 10:56:56.829  WARN     Some Hermes features may not work in this mode!
Sep 17 10:56:56.830 ERROR [cosmoshub-4] chain is unhealthy: 
   0: semantic config validation failed for option `max_tx_size` chain 'cosmoshub-4', reason: `max_tx_size` = 2097152 is greater than 90% of the genesis block param `max_size` = 200000

Location:
   /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/flex-error-0.4.2/src/tracer_impl/eyre.rs:10

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Sep 17 10:56:56.833  INFO [osmosis-1] performing health check...
Sep 17 10:56:57.261  INFO [osmosis-1] chain is healthy
Success: performed health check for all chains in the config

Another question. Should the updated guide with the new command health-check be part of the PR ?

@romac
Copy link
Member

romac commented Sep 17, 2021

It's just for logging but I agree that we should clean up the log and not display the backtrace, just the error message.

@romac
Copy link
Member

romac commented Sep 17, 2021

Should the updated guide with the new command health-check be part of the PR ?

No, we currently do the guide updates in the release PR, because otherwise they show up before the feature is released. Hopefully once we have versioning we'll be able to do the guide updates in the PR directly.

Copy link
Contributor

@andynog andynog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The output of the logs can be improved but I think they are not blockers.

@romac romac merged commit 116f7d6 into master Sep 19, 2021
@romac romac deleted the andy/health-check branch September 19, 2021 19:47
hu55a1n1 pushed a commit to hu55a1n1/hermes that referenced this pull request Sep 13, 2022
…art` (informalsystems#1361)

* Initial implementation logic for the health-check command (informalsystems#1336)

* Move health check code in `Chain::health_check`

* Perform health check on supervisor start

* Improve output of `health-check` command

* Use the `/consensus_params` endpoint to get the max block size

* Update comment

* Display non-debug chain id in health check command

* Use `latest_consensus_params` endpoint

* Revert to using tendermint-rs master

* Add .changelog entry

* Remove stacktrace from health check logs

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Romain Ruetschi <romain@informal.systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for disabling health check
3 participants