Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose database query / application metrics on internal /metrics endpoint #620

Closed
brahman81 opened this issue Aug 28, 2018 · 10 comments
Closed
Assignees
Labels
Milestone

Comments

@brahman81
Copy link
Contributor

To assist during debugging and capacity planning, it would prove useful to expose database metrics on the /metrics Horizon endpoint.

  • Time spent performing the various database calls (offers, transactions, assets, accounts, etc)
  • Total requests per second to the core database
  • Total requests per second to the horizon database

Ideally we would namespace the metrics to distinguish Horizon vs Core database queries.

@MonsieurNicolas
Copy link
Contributor

this should probably not be internet facing

@brahman81
Copy link
Contributor Author

Thanks @MonsieurNicolas, it makes sense to somehow restrict access to these extra database metrics.

A user could potentially restrict access to the /metrics endpoint before enabling these new db stats via a config option ? These are nice metrics to graph, especially when debugging or looking at capacity planning...

@bartekn bartekn added this to the Horizon v0.15.0 milestone Sep 5, 2018
@bartekn bartekn added the horizon label Sep 5, 2018
@bartekn bartekn modified the milestones: Horizon v0.15.0, Horizon v0.16.0 Oct 30, 2018
@bartekn bartekn modified the milestones: Horizon v0.16.0, Horizon next minor release Jan 22, 2019
@brahman81
Copy link
Contributor Author

Having a second http listener started on an alternate port 8001 would be ideal imo, access can easily be restricted by most users and it would be a big ops win to be able to extract these types of metrics from Horizon.

I have dreams of Horizon metrics being in Prometheus, Grafana, etc.

@bartekn bartekn removed this from the Horizon next minor release milestone Jun 13, 2019
@ire-and-curses ire-and-curses added the Hacktoberfest https://hacktoberfest.digitalocean.com/details label Sep 30, 2019
@brahman81
Copy link
Contributor Author

Is instrumenting the application with https://github.com/prometheus/client_golang an option ? It would avoid the need for an external exporter and allow Prometheus to scrape Horizon directly ?

@bartekn bartekn changed the title expose database query metrics on /metrics endpoint Expose database query / application metrics on internal /metrics endpoint Nov 12, 2019
@bartekn bartekn added this to the Horizon 0.24.0 milestone Nov 12, 2019
@bartekn bartekn modified the milestones: Horizon 0.24.0, Horizon 0.25.0 Dec 3, 2019
@abuiles abuiles removed the Hacktoberfest https://hacktoberfest.digitalocean.com/details label Jan 7, 2020
@bartekn
Copy link
Contributor

bartekn commented Feb 11, 2020

Just added a couple PRs connected to this:

When all are merged I'll deploy it to the staging server and we can try integrate it with our Prometheus server.

@bartekn
Copy link
Contributor

bartekn commented Feb 12, 2020

All PRs above are merged. When it comes to DB metrics it requires a small refactor of support/db package so moving this to 1.1.0. cc @ire-and-curses.

@ire-and-curses ire-and-curses removed this from the Horizon 1.0.1 milestone Mar 17, 2020
@bartekn bartekn self-assigned this May 13, 2020
@bartekn bartekn added this to the Horizon 1.5.0 milestone Jun 4, 2020
@bartekn bartekn removed this from the Horizon 1.5.0 milestone Jul 1, 2020
@bartekn
Copy link
Contributor

bartekn commented Jul 24, 2020

Is instrumenting the application with https://github.com/prometheus/client_golang an option ?

It's done in #2846. It should help adding more metrics soon.

@stellar/horizon-committers if you have ideas regarding new metrics please add them as a comment here. Here's my list:

  • Duration of the processing time for each ingestion processor. Per change/transaction breakdown.
  • Counter for each tx/op error type returned by txsub.
  • Duration of the order book graph state update per ledger.
  • LedgerEntryChangeCache compression ratio stats.

When it comes to SQL queries stats, I'm wondering if we should do it. First, majority of endpoints send a single SQL query to get results so we can easily track this using HTTP stats. Second, often we modify SQL query string for the same query type. Obvious example is inserts batch builders. We'd need to name each query and probably have a second param explaining the number of rows being added.

@2opremio
Copy link
Contributor

2opremio commented Jul 24, 2020 via email

@bartekn
Copy link
Contributor

bartekn commented Jul 24, 2020

If it's not already there. How about ingestion throughput (ledgers/time)
and captive core stats (CPU and memory consumption of captive core). Also,
the reingestion status (how many workers, what ledger ranges are being
reingested, what's the progress in each of them).

I think you're talking about reingestion, right?

We already have a summary for processed ledgers (that includes a counter) but throughput in the online mode will be stable at 1 ledger per 5 seconds on average. I don't think we have Captive Core CPU and memory stats available via Go so it should be done at OS level. For reingestion stats (# of workers, throughput - makes sense here, progress per worker, etc.) 👍.

@bartekn bartekn modified the milestones: Horizon 1.7.0, Horizon 1.8.0 Aug 11, 2020
@bartekn
Copy link
Contributor

bartekn commented Aug 17, 2020

Added one more metric here: #2921. Closing this, let's open a separate issue for each metric when it's really needed.

@bartekn bartekn closed this as completed Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants