Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: register sdk metrics on startInProcess #13997

Closed
wants to merge 4 commits into from
Closed

Conversation

facs95
Copy link

@facs95 facs95 commented Nov 24, 2022

Description

closes #13982


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@facs95 facs95 requested a review from a team as a code owner November 24, 2022 01:31
@julienrbrt
Copy link
Member

Maybe I am doing something wrong, but if I try the current main vs your fix, I see no difference.
Both are getting metrics. What difference do you see between using simd from main and from your branch?

@facs95
Copy link
Author

facs95 commented Nov 24, 2022

@julienrbrt you will always see Tendermint metrics. The issue is with metrics extended by applications.

Use the following config on both tries

[telemetry]

# Prefixed with keys to separate services.
service-name = ""

# Enabled enables the application telemetry functionality. When enabled,
# an in-memory sink is also enabled by default. Operators may also enabled
# other sinks such as Prometheus.
enabled = true

# Enable prefixing gauge values with hostname.
enable-hostname = false

# Enable adding hostname to labels.
enable-hostname-label = false

# Enable adding service to labels.
enable-service-label = false

# PrometheusRetentionTime, when positive, enables a Prometheus metrics sink.
prometheus-retention-time = 99999999999

Try the next stteps locally with both version to reproduce:

  1. Clone Ethermint
  2. Use local SDK package
    go mod edit --replace github.com/cosmos/cosmos-sdk=/PATH_TO_LOCAL_SDK
  3. Run make install
  4. Init local node cd tests/solidity & ./init-test-node.sh
  5. Check metrics at localhost:26660/metrics - Usual Tendermint metrics will be displayed on both versions
  6. Submit an ETH transaction.
  • you can clone https://github.com/facs95/rpc-scripts
  • run npm i
  • run npx ts-node sendTransaction.ts
  1. Refresh localhost:26660/metrics - Search for tx_msg_ethereum_tx_gas_used_total - In main branch you will not be able to see it, but you will in this branch.

@alexanderbez alexanderbez added backport/0.46.x PR scheduled for inclusion in the v0.46's next stable release backport/v0.47.x PR scheduled for inclusion in the v0.47's next stable release labels Nov 24, 2022
Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why moving the SetTelemetry call within Server#Start fixes the bug?

Also, if we're removing the mutex from SetTelemetry then I don't feel comfortable about having it be public. The caller has no idea wether if/when it's safe to call it.

Also, changelog :p

@facs95
Copy link
Author

facs95 commented Nov 24, 2022

I am honestly not entirely sure why this fixes the bug. I went through the changes on #12448 and from first observation, it didnt seem to have anything wrong. The only thing I knew was that before this PR things were working correctly (as I had tested locally using the steps I mentioned earlier).

I tried different combinations and realized that the only way it works as expected is by setting Telemetry inside of the Start method. My guess it that there is something weird with go routines. The Server#Start method is called on a separate go routine and maybe there is something weird happening there when calling methods using pointer semantics (as SetTelemetry and Start does). But I dont know for sure the exact root of the issue, would have to read more about this. I do know that this fixes the issue though.

@facs95
Copy link
Author

facs95 commented Nov 24, 2022

Can you explain why moving the SetTelemetry call within Server#Start fixes the bug?

Also, if we're removing the mutex from SetTelemetry then I don't feel comfortable about having it be public. The caller has no idea wether if/when it's safe to call it.

Also, changelog :p

Updated Changelog and made SetTelemetry private 👌

@julienrbrt
Copy link
Member

Let's wait, if we are merging, after the investigation of OP (#13982 (comment))

@facs95
Copy link
Author

facs95 commented Nov 25, 2022

Closing this as it is an issue on Evmos as explained here! - #13982 (comment).

@facs95 facs95 closed this Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/v0.47.x PR scheduled for inclusion in the v0.47's next stable release backport/0.46.x PR scheduled for inclusion in the v0.46's next stable release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

telemetry doesnt work on v0.46.x
3 participants